How To Program a Cybersecurity Loss Exceedance Curve in Python

Tim Layton
13 min readNov 27, 2023

Creating a Loss Exceedance Curve (LEC) in Python is a valuable cybersecurity risk analysis tool, especially when estimating and predicting future loss events.

The LEC curve helps business leaders understand the potential financial impact of cyber incidents over a certain threshold.

In this article, I will walk through the process of programming a LEC curve, considering a scenario where the probability of a cybersecurity event is known, and we have industry benchmark data providing a 90% confidence interval for potential losses.

Jupyter Notebook

I highly recommend using Anaconda and Jupyter Notebook to create and manage your Python code for cybersecurity risk analysis projects. Once you install and launch Anaconda, Jupyter Notebooks is one of the development environments you will have available.

Jupyter Notebook is a popular tool for data science and cybersecurity risk analysis projects, especially when working with Python. Its benefits include:

Interactive Environment: Jupyter Notebooks offer an interactive coding environment, enabling you to execute code in segments (cells) and see the results immediately. This is ideal for exploratory data analysis and iterative testing in cybersecurity projects.

Documentation and Visualization: It allows for the integration of live code, visualizations, and explanatory text in a single document. This makes it easier to understand the analysis process and results, particularly useful for complex cybersecurity risk analyses.

Ease of Sharing and Collaboration: Notebooks can be easily shared among team members or stakeholders, facilitating collaboration. They can be viewed and edited by others, aiding in collaborative development and review of analysis.

Support for Multiple Languages: Although primarily used for Python, Jupyter supports various programming languages, making it versatile for different kinds of data science projects.

Integration with Data Science Tools: Jupyter integrates well with popular data science libraries like Pandas, NumPy, Matplotlib, and TensorFlow, which are essential in data manipulation, statistical analysis, and visualization in cybersecurity risk analysis.

Learning and Teaching Tool: Its format is conducive to teaching and learning Python programming and data science concepts, making it a great tool for educational purposes.

Overall, Jupyter Notebook is a powerful and user-friendly tool that enhances the efficiency, collaboration, and communication of data science and cybersecurity risk analysis projects.

I am committed to equipping cybersecurity professionals with the robust capabilities of quantitative Bayesian statistical methods. By leveraging these mathematical and statistical tools, we can enhance our current risk assessment techniques and present risks in terms that business leaders can understand. Bayesian methods allow us to prioritize cybersecurity risks and communicate them with their potential economic impact, ensuring clarity for business professionals.

You can connect with me on LinkedIn and follow my articles here on Medium. Get notified via email every time I publish a new article.

In other articles, I discuss using the Beta Distribution and Laplace’s Rule to calculate the probability of an event occurring based on industry breach data from sources like the annual Verizon DBIR report.

You can read those articles here on Medium or my website at https://bayescybercoder.com/

Background Information

In this example, I am creating this code for the financial industry, where there are 4237 commercial banks in the reference class.

Two hundred thirteen of those organizations realized a cyber breach during the previous year via web application attacks. Their losses ranged from $360,000 to $16,800,000. This range of losses becomes the lower and upper bounds of our 90% confidence interval.

This is real data that I had to research to be able to use it for this project. You can do the same thing for your industry and specific scenario.

In my case, since the organization had not experienced a data breach from the type of attack method, I needed to create a reference model and define a starting place.

LEC and Monte Carlo Simulations

In Bayesian statistics, the loss exceedance curve relates closely to predictive distributions, where you estimate the probability of future outcomes based on current data. The Monte Carlo simulations help to integrate the uncertainty in these predictions.

This loss exceedance curve (LEC) is a powerful tool in cybersecurity risk management. It allows you to visualize and quantify the risk of financial losses due to cyber incidents. Organizations can use this curve to make informed decisions about risk mitigation strategies, control investments, and cyber insurance coverage.

Monte Carlo simulations are a statistical technique used to understand the impact of risk and uncertainty in prediction and forecasting models.

Here’s how they work and why they are instrumental in cybersecurity risk analysis:

Random Sampling: Monte Carlo simulations use random sampling to model scenarios for various types of risks. In cybersecurity, this could involve simulating different types of cyber attacks or security breaches.

Complex Systems Modeling: They are ideal for complex systems where the interactions between different elements are unpredictable or have a wide range of potential outcomes. Cybersecurity systems often have this level of complexity.

Loss Exceedance Curves: In cybersecurity, loss exceedance curves represent the probability of exceeding different loss levels due to cyber incidents. Monte Carlo simulations can be used to estimate these probabilities by repeatedly simulating different scenarios and tracking the resulting losses.

Risk Analysis: These simulations help quantify the risk by providing a probabilistic distribution of possible outcomes. They can model the financial impact of cyber attacks, considering the frequency, severity, and interdependencies of different risk events.

Decision Making: By providing a more comprehensive view of potential risks and their impacts, Monte Carlo simulations support better-informed decision-making in cybersecurity strategies and insurance policy design, such as determining appropriate levels of cyber insurance coverage and other related business actions such as resource prioritization and return on control investment.

Monte Carlo simulations are a powerful tool in cybersecurity risk analysis for their ability to model complex, uncertain systems and to provide a detailed understanding of potential losses, aiding in effective risk management and strategic planning.

Baseline LEC Curve

Before we dive into the steps to program the LEC curve, I will show you the basic LEC curve the Python program generates and then a second, more advanced version, which is possible when you fine-tune the base program for specific use cases.

The diagram below is the stock baseline LEC curve the program generates by default. I will walk you through basic modifications in the programming sections below to make this curve more viewer-friendly.

Basic LEC Curve — © Tim Layton, All Rights Reserved, 2023 — bayescybercoder.com

Advanced LEC Curve For a Specific Use Case

The LEC curve below is an example of an advanced version of the baseline curve shown above, but with multiple additions and modifications to meet my specific use case.

In this case, you will notice the organization has cybersecurity insurance that begins when a $5.5M loss or greater is realized, and this is visually reflected in the LEC curve to help stakeholders quickly understand their financial exposure (the white area left of the red line).

I also made several other small tweaks to make the LEC curve easier to view and understand for stakeholders, such as adding data labels and tilting the loss values on the x-axis to help with readability.

I won’t dive into the code to do this in the current article because those modifications are more code than the original program. If your organization is interested in creating customized loss exceedance curves, you can connect with me on LinkedIn, and I am happy to assist you with a project like this.

Custom LEC Curve — © Tim Layton, All Rights Reserved, 2023 — bayescybercoder.com

Steps to Program a Loss Exceedance Curve in Python

A loss exceedance curve plots the probability that a loss due to a cybersecurity event will exceed a certain value. It’s crucial in risk management to estimate the likelihood and impact of potential losses.

A loss exceedance curve is a graphical tool used in risk management, particularly cybersecurity, insurance, and finance. It visualizes the probability that losses will exceed various financial thresholds due to specific events or risks.

Here’s a brief overview of its purpose and significance:

Visual Representation of Risk: The curve plots potential loss amounts on one axis (typically the x-axis) against the probability of exceeding those amounts on the other axis (y-axis).

Risk Assessment: It helps businesses understand the likelihood and severity of potential losses. For example, cybersecurity professionals can illustrate the potential financial impact of data breaches or cyber-attacks.

Decision Making: By providing a clear picture of potential losses and their probabilities, the curve aids stakeholders in making informed decisions about risk management strategies, such as investing in security measures or purchasing insurance.

Financial Planning: It assists in budgeting and financial planning by highlighting potential high-impact risks, allowing businesses to allocate resources effectively for risk mitigation.

Insurance and Investment: For insurance companies and investors, the curve is crucial in pricing policies and evaluating the risk-return profile of investments.

The loss exceedance curve is a key tool for business stakeholders to quantify and visualize risks, enabling more effective risk management and strategic decision-making.

Gather Data

Probability of the event: This is your estimated likelihood of the cybersecurity event occurring. When I don’t have empirical data from existing losses, I use industry breach data from sources like the annual Verizon DBIR report. Then, I use Laplace’s Rule to compute the probability of whatever breach event scenario I am working on. In this case, I am computing the probability of a cyber breach occurring for an organization in the financial industry via a web application attack.

Confidence interval for losses: Use industry benchmarks to gather data on potential losses, focusing on the 90% confidence interval. This interval provides an upper and lower bound on potential losses.

Write the Python Code

With this baseline information, we can create the new loss exceedance curve in Python using a Monte Carlo simulation and the relevant Python libraries.

LEC vs. Risk Matrix

The Loss Exceedance Curve (LEC) and the Risk Matrix are tools used in risk management, but they serve different purposes and offer different insights.

I have written a detailed article on the specific dangers and issues with using the common risk matrix; however, I will provide a simplified list of common issues.

Typical Risk Matrix

Loss Exceedance Curve (LEC) Overview

Quantitative Analysis: The LEC provides a probabilistic, quantitative analysis of potential losses. It shows the probability of losses exceeding various financial thresholds.

Financial Focus: It focuses on financial implications, offering a detailed view of potential loss magnitudes and their likelihoods.

Dynamic Range: LEC can handle a wide range of loss values and probabilities, offering a continuous spectrum of risk assessment.

Complexity: Generally, it requires more data and is more complex to construct, often involving statistical models and simulations.

Risk Matrix Overview

Qualitative or Semi-Quantitative: This matrix is a simpler tool, often qualitative, categorizing risks based on their severity and likelihood.

Broad Overview: It provides a quick, high-level overview of various risks, making it easy to understand and communicate.

Limited Range: The matrix typically has limited categories (like ‘High’, ‘Medium’, and ‘Low’) for impact and probability, which can oversimplify risk assessment.

User Bias: It’s more prone to subjective judgments, leading to inconsistent or biased risk assessments.

Downsides of the Risk Matrix

  1. Oversimplification: Categorizing risks into broad bands can oversimplify complex risks.
  2. Subjectivity: The classification of risks can be highly subjective, leading to inconsistencies.
  3. Limited Nuance: It lacks the nuance to effectively differentiate between risks with similar probabilities and impacts but different natures.
  4. False Precision: It can give a false sense of precision and certainty in risk assessments.

While the risk matrix offers a simple, user-friendly approach, it lacks the detailed, quantitative analysis provided by the Loss Exceedance Curve, which is crucial for in-depth financial risk assessment.

Python Code For The LEC

#############################################################################
# © Tim Layton, All Rights Reserved, 2023
# Developer: Tim Layton
# Purpose: Run Monte Carlo simulations on a set of loss events described
# by their lower and upper limits in the events data frame.
# It generates lognormal values for each event that occurs, sums them up,
# and runs this process for the prescribed amount of trials to plot the
# distribution of the results.
#############################################################################

#############################################################################
# Import required libraries
# matplotlib.pyplot: Used for creating visualizations.
# numpy: Provides support for arrays and mathematical functions.
# pandas: Used for data manipulation and analysis.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
#############################################################################

#############################################################################
# This function decides whether an event happens based on its probability.
# It uses numpy's rand() to generate a random number between 0 and 1.
# If this number is less than the event's probability, the function returns
# True, indicating the event occurs.
#
def event_happens(probability):
return np.random.rand() < probability
############################################################################## This function generates a lognormal distribution result for an event,

#############################################################################
# This function generates a lognormal distribution result for an event,
# given its lower and upper bounds. It calculates the mean and standard
# deviation of the logarithms of these bounds, then returns a random value
# from the lognormal distribution.

def lognormal_event_result(lower, upper):

# Calculate the mean of the logarithms of the bounds
mean = (np.log(upper) + np.log(lower)) / 2.0

# Calculate the standard deviation of the logarithms of the bounds
stdv = (np.log(upper) - np.log(lower)) / 3.29
return np.random.lognormal(mean, stdv)

#############################################################################
# This function sums up the results of all events (if they occur) in a given
# scenario. It uses the lognormal_event_result and event_happens functions.

def simulate_scenario(events):
return sum(lognormal_event_result(event['Lower'], event['Upper']) for _, event in events.iterrows() if event_happens(event['Probability']))
#############################################################################

##############################################################################
# This function runs the Monte Carlo simulation. It repeatedly calls
# simulate_scenario for the specified number of rounds, accumulating the
# results in a list.

def monte_carlo_simulation(events, rounds=10000): # Number of MC Trials
return [simulate_scenario(events) for _ in range(rounds)]
#############################################################################

##############################################################################
# This is a helper function for formatting numbers as currency in plots.
# It's not used directly in the main function but can be useful for enhancing
# visualizations.

def format_currency(x, pos):
return '${:,.0f}'.format(x)
#############################################################################
# Control your curve
# Read my comments to the right of each line of code to gain insights on
# how to control and modify your results.

def plot_results(bin_centers, cumrev):
# x_min, x_max = min(bin_centers), max(bin_centers) #this version will show the far right end of the tail with outliers.
x_min, x_max = min(bin_centers), 10e6 #this version controls max loss visualization on the curve's tail.
x_values = np.linspace(x_min, x_max, 10) # control the number of bins on the chart. Not <10 or >20 typically.
probabilities = [cumrev[np.argmin(np.abs(bin_centers - x))] for x in x_values]
#############################################################################

############################################################################# plt.figure(figsize=(10, 6)) # size of the plot
# Plotting the curve and Setting Titles, etc.

plt.plot(bin_centers, cumrev, label='Probability of Loss')
max_prob = max(cumrev)
plt.xlabel("Loss Estimates", labelpad=15)
plt.ylabel("Chance of Loss or Greater (%)")
plt.title("Loss Exceedance Curve - System X - XYZ Attack Scenario", y=1.05, fontweight='bold')
plt.grid(True, which='both', linestyle='--', color='gray', linewidth=0.5) #vertical grid lines
plt.tight_layout()
plt.legend()
plt.show()
#############################################################################
def main():

# The main function of the program. It sets up the data via the hardcoded list
# in events, runs the Monte Carlo simulation, and then plots the results.

# Data For Loss Events (probability, lower bound, upper bound)
events = [
{"Probability": 0.05, "Lower": 360800, "Upper": 16800000},
# Loss Event # 1 For Web App Attack Method
# Add more events as needed
# My advanced version reads the values from a .csv file
]

# Convert events to DataFrame
events_df = pd.DataFrame(events)
results = monte_carlo_simulation(events_df)

hist, edges = np.histogram(results, bins=150) # I normally use 150, but you can adjust as desired.
cumrev = 100 * np.cumsum(hist[::-1])[::-1] / len(results)
cumrev = cumrev[1:]
bin_centers = 0.5 * (edges[:-1] + edges[1:])[1:]
plot_results(bin_centers, cumrev)

# This is a Python idiom that ensures the main() function is called only
# when the script is executed as a standalone file, not when it's imported
# as a module.

if __name__ == "__main__":
main()

When you run this code, you get the basic LEC curve I showed at the top of this article.

Basic LEC Curve — © Tim Layton, All Rights Reserved, 2023 — bayescybercoder.com

We can make some quick and easy adjustments to the code to make this plot a lot more user-friendly for stakeholders.

At a bare minimum, I suggest making the following adjustments to produce a new LEC curve that should be more user-friendly for your stakeholders.

Add the following improvements after your plotting code but before your main() function as follows:

#############################################################################
# Existing Plotting code section
plt.xlabel("Loss Estimates", labelpad=15)
plt.ylabel("Chance of Loss or Greater (%)")
plt.title("Loss Exceedance Curve - System X - XYZ Attack Scenario", y=1.05, fontweight='bold')
plt.grid(True, which='both', linestyle='--', color='gray', linewidth=0.5) #vertical grid lines
#################################################################################################################

# Add These new improvements to make the chart more user-friendly

plt.xlim(x_min, x_max)
plt.ylim(0, max_prob + 1) #adjust y axis to be 1% larger than max_prob and eliminate space on x axis

# X axis in Currency Format for each vertical line in the plot.
ax = plt.gca()
ax.xaxis.set_major_formatter(format_currency)
ax.set_xticks(x_values)
ax.set_xticklabels([format_currency(val, None) for val in x_values], rotation=45)

for x, y in zip(x_values, probabilities):
plt.annotate(f'{y:.2f}%', (x, y), textcoords="offset points", xytext=(10,10), ha='left', arrowprops=dict(arrowstyle='-', color='gray', linewidth=1))
plt.tight_layout()
plt.legend()
plt.show()
#################################################################################################################

# Your main() function code...
def main():

Run the updated program, and you get a much nicer-looking LEC curve to help your stakeholders consume the information more friendly and intuitively.

Summary

I hope you have found this information helpful in quantifying cybersecurity risks using a loss exceedance curve (LEC).

You may also want to read my article about how to compute probability distributions that you can use to create the probability input into the events function in this program. The article's name is “How To Calculate The Probability of Cyber Breach Attack Methods Using Python and Bayesian Statistics.”

I am committed to equipping cybersecurity professionals with the robust capabilities of quantitative Bayesian statistical methods. By leveraging these mathematical and statistical tools, we can enhance our current risk assessment techniques and present risks in terms that business leaders can understand. Bayesian methods allow us to prioritize cybersecurity risks and communicate them with their potential economic impact, ensuring clarity for business professionals.

You can connect with me on LinkedIn and follow my articles here on Medium. Get notified via email every time I publish a new article.

--

--

Tim Layton

Cybersecurity Risk Analysis Using Python and Bayesian Statistics.