Estimating Phishing Email Frequency with the Poisson Distribution in Python: A Practical Guide for Cybersecurity Risk Assessment
In my last article, I demonstrated how to use Python and the Beta distribution to estimate the probability of users clicking on internal phishing campaign tests. We also explored how this data could be combined with industry benchmark data to estimate the probability of a cyber breach, especially if your organization has not yet experienced a breach via phishing.
You can connect with me on LinkedIn and join my professional network.
You can view the distributions directly below from that Python program.
In this new article, I will show you how to use the Poisson distribution to estimate the number of phishing emails your organization receives per day. Understanding the frequency of these phishing attempts can help you adjust your incident response planning measures accordingly.
In today’s digital landscape, cybersecurity threats are a significant concern for businesses of all sizes. Phishing attacks, where malicious actors attempt to deceive employees into revealing sensitive information or clicking on harmful links, are particularly prevalent. The latest Verizon Data Breach Investigations Report (DBIR) highlights the alarming effectiveness of phishing attacks across all industries. The report underscores that phishing remains one of the most prevalent and successful methods for compromising organizational security. This finding emphasizes that phishing is a highly likely scenario for breaches, affecting companies regardless of their size or sector.
Despite advancements in security technology, the human element remains a critical vulnerability. Securing the human aspect of cybersecurity is an ongoing challenge, as no amount of technology alone can fully address this issue.
The Poisson distribution models the number of events occurring within a fixed interval of time or space. It is suitable for scenarios where events occur independently and at a constant average rate.
Use Case for Phishing Attacks: Modeling the number of phishing emails received per day helps in understanding the frequency of these attacks and planning for appropriate response measures. This proactive approach can enhance your organization’s preparedness and resilience against phishing threats.
Poisson Distribution
The Poisson distribution is a probability distribution that describes the number of events occurring within a fixed interval of time or space. These events must occur independently, and the average rate (mean number of events) must be constant. The Poisson distribution is particularly useful for modeling rare events or occurrences over a specific period or area.
Key Characteristics:
- Discrete Distribution: The Poisson distribution is discrete, meaning it counts occurrences of events.
- Parameter: The distribution is characterized by a single parameter, λ (lambda), which represents the average rate of occurrence of the events.
- Formula: The probability of observing ( k ) events in a given interval is given by:
P(X = k) = (λ^k * e^(-λ)) / k!
where:
- P(X = k) is the probability of observing k events.
- λ (lambda) is the average rate of occurrence of the events.
- e is the base of the natural logarithm (approximately equal to 2.71828).
- k is the number of occurrences.
- k! (k factorial) is the product of all positive integers up to k.
Applications:
The Poisson distribution is widely used in various fields, including:
- Cybersecurity: To model the number of phishing emails received in a day.
- Telecommunications: To predict the number of phone calls received at a call center.
- Healthcare: To estimate the number of patient arrivals in an emergency room.
By leveraging the Poisson distribution you can better understand and predict the frequency of certain events, allowing for more effective planning and resource allocation.
Example Program in Python
#############################################
# Poisson Distribution Example for Phishing Frequency
# © Tim Layton, 2024, All Rights Reserved
# timlayton.cloud
#############################################
import numpy as np
import matplotlib.pyplot as plt
def get_user_input():
# Prompts the user to input the average number of phishing emails # # received per day.
# Returns:
# lambda_rate (float): The average number of phishing emails # received per day.
#
lambda_rate = float(input("Enter the average number of phishing emails received per day: "))
return lambda_rate
def main():
# Get user input for the average number of phishing emails per day
lambda_rate = get_user_input()
# Generate Poisson distribution data
phishing_emails = np.random.poisson(lambda_rate, 30) # simulate for 30 days, but any value could be used.
# Plot the data
plt.figure(figsize=(10, 5))
bars = plt.bar(range(1, 31), phishing_emails, color='blue', alpha=0.7)
plt.axhline(y=lambda_rate, color='r', linestyle='--', label=f'Average (λ={lambda_rate})')
# Add the number of emails above each bar
for bar in bars:
height = bar.get_height()
plt.annotate('{}'.format(height),
xy=(bar.get_x() + bar.get_width() / 2, height),
xytext=(0, 3), # 3 points vertical offset
textcoords="offset points",
ha='center', va='bottom')
plt.xlabel('Day')
plt.ylabel('Number of Phishing Emails')
plt.title('Number of Phishing Emails Received Each Day')
plt.legend()
plt.show()
if __name__ == "__main__":
main()
This example simulates the number of phishing emails received per day over a period of 30 days, given an average rate λ (lambda) of 5 phishing emails per day.
In this example:
lambda_rate
is set to 5 via the user input, meaning on average, 5 phishing emails are received per day.- The
np.random.poisson
function is used to generate the number of phishing emails received each day over a period of 30 days. - The generated data is printed and visualized using a bar chart.
This simulation helps cybersecurity leaders understand the variability and distribution of phishing emails received over time, which can be useful for planning and resource allocation in cybersecurity measures.
The simulated data provides insights into the frequency and distribution of phishing attacks over a given period. By analyzing this data, cybersecurity professionals can identify patterns and trends, such as peak days or periods with higher phishing activity. This helps in understanding the overall threat landscape and preparing for potential surges in phishing attacks.
You can connect with me on LinkedIn and join my professional network.
Additional Use Cases for the Poisson Distribution
In addition to the modeling phishing email frequence example, there are seveal other use cases in cybersecurity risk analysis using the Poisson distribution. I will provide a few examples below to get you thinking about other possibilities.
Intrusion Detection Alerts:
- Scenario: An IT team monitors the number of alerts generated by an intrusion detection system (IDS) each hour.
- Application: The Poisson distribution can model the number of alerts per hour, helping the team identify abnormal activity levels and potential security breaches.
Firewall Log Entries:
- Scenario: A network administrator wants to track the number of blocked access attempts logged by the firewall daily.
- Application: Using the Poisson distribution, the administrator can analyze the log data to detect unusual spikes in access attempts, which might indicate a targeted attack.
Security Patch Requests:
- Scenario: An organization tracks the number of security patch requests submitted by different departments weekly.
- Application: Modeling the number of patch requests with a Poisson distribution helps in forecasting the demand for IT support and ensuring timely application of security patches.
By leveraging the Poisson distribution, cybersecurity professionals can gain valuable insights into the frequency and patterns of various security-related events, enabling them to enhance their threat detection, response strategies, and overall security posture.
You can connect with me on LinkedIn and join my professional network.
You can also connect with me on my website at http://timlayton.cloud