A Financial Model for Application Security Debt

Last week I described the concept of application security debt and application interest rates. I promised that I would follow-up with a financial model that could translate these concepts in to real money.

Recap

Here’s a quick recap of the initial concept. Security debt is similar to technical debt. Both debts are design and implementation constructions that have negative aspects that aggregate over time and the code must be re-worked to get out of debt. Security debt is based on the latent vulnerabilities within an application. Application interest rates are the real world factors outside of the control of the software development team that lead to vulnerabilities having real cost. These factors include the cost of a security breach and attacker motivation to discover and exploit the latent vulnerabilities.

Basic Financial Model

The basic financial model for security debt is monetary risk which can be expressed as expected loss. The formula for expected loss is event likelihood X impact in dollars. Event likelihood is based on the makeup of vulnerabilities in the application and the likelihood that the vulnerabilities will be discovered and exploited. The impact is the cost of a security breach based on an exploit of one of those vulnerabilities.

Security Debt is a Liability to Operators

Now you may be saying to yourself, “I’m a software vendor. I don’t have breach costs associated with the software I am developing. How am I going to calculate my security debt using this formula?” You are right. Because you don’t have breach costs or liability you have transferred most of your security debt to your customers. We are going to have to use a somewhat different model for vendors. For now I am going to concern myself with operators of software who bear the risk of the applications they run and therefor have real breach costs.

Getting the Data

There are three sets of information we need to gather:

  1. Information about the vulnerabilities in the application
  2. Information about the vulnerabilities that are being exploited
  3. The cost of an application security breach

Data Precision

Each of these three sets of information is going to have different and varying levels of precision. The possibilities ranges from relatively precise test results or accounting data to rough estimates from available industry reports based on surveys. The more precise our data, the more precise our security debt calculation will be. My hope is improvements will be made over time to the rough data sources I am using to make them more precise for security debt calculations.
Security debt in monetary terms will have business value once we have the precision to compare it meaningfully to the development costs of reducing security debt. If I can calculate that an application has $5M in security debt and it will cost $100,000 to greatly reduce that debt, it becomes a prudent decision to do so.

Data Set 1: Application Vulnerability Data

There are multiple ways of determining an application’s vulnerability data set. In an ideal world we have highly accurate and low cost methods of determining the location and properties of each vulnerability in an application. This would give us a precise count and categorization (CWE ID) of the vulnerabilities. Unfortunately the state of the art is less accurate and more expensive. Today, security design flaws can only be found by expensive humans performing manual threat modeling or architectural risk analysis. Less expensive automated testing in the form of static and dynamic analysis can find non-design related vulnerabilities, albeit with some precision error due to false negatives and false positives. Cheaper still may simply be an estimate of the vulnerability data set based on properties of the development process, the language and platforms used for development, and code properties such as KLOCs, complexity, and attack surface measurement.

Let’s keep the vulnerability data set we require for our model simple to start. For each important vulnerability category, which I will define as the CWE/SANS Top 25, we will assign a prevalence qualifier of none (0), low (1-9), medium (10-99), and high (100+). We should be able to get this data from application security testing and manual analysis. Further research will need to be done to make estimates of the vulnerability data set from the development process used and code properties. For now we are going to have to do actual testing.

Data Set 2: Threat Space Data

The best threat space data I have found, that has details of the application vulnerabilities that are causes of data breaches, is the Verizon Data Breach Incident Report (DBIR) . The data in the report is collected from data breaches that Verizon investigates. The report includes the root causes that lead to the data breaches and are categorized to a fine enough detail that we can map them to our application vulnerability data set.

The Verizon DBIR first breaks down the root causes into Attack Type. These are the different types of attacks that were used by the attacker that led to the data breach. You can see that they sum up to more than 100% because a typical attack is made up of a few different attack stages. For instance an attacker may use social engineering to plant malware on an internal workstation and then use the subsequent internal access to “hack” an internal application to access valuable data. This scenario would be counted as social, malware, and hacking.

For our application security debt model we are concerned with the “Hacking” attack type which is highlighted in the chart above. These are the percentage of data breaches Verizon investigated that had application vulnerabilities exploited. Verizon determined this to be 40% of the time, but that number isn’t very helpful to us. We need to know the likelihood that a particular vulnerability category is the root cause when an application exploit leads to a data breach.

Thankfully the Verizon DBIR does have the category information. The report breaks the “Hacking” attack type data down further into “Hacking Root Cause”. We can extract the hacking root causes that are application vulnuerabilities. The following chart depicts this data which will map nicely to our application vulnerability data set using the CWE/SANS Top 25.

This data gives us the likelihood that a particular vulnerability category is the root cause if an application vulnerability is exploited as part of the attack. We still need to multiply this category number by the likelihood that an application will get breached at all.

In 2009 Forrester conducted the Application Risk Management and Business Survey and determined that that 62% of organizations surveyed experienced breaches in critical applications in 12 month period. So if we multiply an application breach root cause likelihood by the likelihood an organization will have a critical application breached in a 12 month period we will get the likelihood that that root cause will be the cause of a breach at the average company.We end up with this table of root cause likelihoods:

Vulnerability Category Application breached by root cause Likelihood
Backdoor/Control Channel 18%
SQL Injection 15.5%
Command Injection 7%
XSS 6%
Insufficient Authentication 4%
Insufficient Authorization 4%
Remote File Include 1%

Now you are probably thinking that this is getting a little tenuous and it is. We need better data on likelihood type and likelihood of an application breach by industry and other factors like company size. To just use 62% is like saying the average couple has 2.4 children. It tells us something but isn’t a good predictor of the number of children a couple from a certain country, ethnic background or economic group is likely to have. We need much better data on what types of apps and from what type of organizations are getting breached and how. It would be great if Verizon and Forrester sliced their numbers by factors relating to the organization. That way you could map your company to breach data of companies like yours.

Data Set 3: Breach Cost

So in an ideal world you have breach costs diligently recorded from previous breaches that effected your organization. Since every organization will have different costs and no to organizations are alike this will give the most accurate data to feed into our security debt calculation.

Most organization don’t have this information so we are going to need to rely on survey data. The best survey data I have found is the April 2010 Ponemon Institute Report. There have been plenty of criticisms of the accuracy of the survey methodology and data but it is the best we have to work with. Thankfully it is broken down by industry which helps in precision. You can select what the survey says a breach will cost the average company in your industry.

The Ponemon survey collected data on these individual costs:

  • Detection & Escalation
  • Notification
  • Ex-Post Response
  • Lost business

They then divided the total cost by the number of records breached to arrive at a cost per record for each industry vertical.

Debt Formula

So now we can come up with an expected loss for the average organization in an industry vertical from a vulnerability category per record. So we can take the numbers from above and multiply it by the number of records to get the average expected loss (for an individual vulnerability category).
So for instance lets take a financial organization with 100,000 records in a critical app. What is their expected loss from SQL Injection this year:

15.5% X $248 * 100,000 = $3,844,000

It may seem like a high number. It would be good to sanity check this against losses at financial institutions due to SQL injection from another study. More data would be very helpful.
So we can run through all these other categories we end up with a table I call baseline expected loss:

Tie it all together

To tie it all together we need a way of relating the vulnerabilities in your application which was data set 1 to the vulnerabilities in the average application that ended up getting breached. That was we would know if your expected loss was higher or lower than the baseline expected loss. This would enable us to finally calculate the security debt for a company like yours with your application.

This is still a work in progress. I am looking at mining the Veracode data that we have from testing over 3000 applications across different industry verticals to solve this final piece of the puzzle. Some of the data we have published to date can be seen in the Veracode State of Software Security Report. We do have data on the prevalence of particular vulnerability categories within applications. Look for a future post to add this data.

I would also like to work with the collectors and publishers of very useful data such as Verizon and Ponemon to come up with the right data slices to obtain a precise application security debt calculation. I hope this way of looking at application security risk stirs some discussion and hope that we can calculate this security debt to enough precision to make business decisions.

Veracode Security Solutions
Veracode Security Threat Guides

Adam | March 5, 2011 11:37 am

Hi Chris,

This is really exciting work, thanks for doing it and sharing it!

If you want to depend on the Ponemon numbers, I would urge you to take advantage of the way they break out the costs, and re-combine them differently.

In particular, I’d suggest a focus on (Detection & Escalation, Ex-Post Response) and moving away from cost per record to what we might call “hard security costs per incident.” This leaves notification as a hard cost for privacy breaches, but not all security incidents have privacy implications.

Adam

Mark Linton | March 6, 2011 10:34 am

Chris, first I commend you for even trying to correlate this data to this degree, two comments that I hope will help improve the model event further.

1) the quality and accuracy of the statistics you are using are going to wildly swing the end results, any time you are using a multiplier or factor in the end calculation its going to be sketchy. The more you can do to validate the accuracy using independent data that will help.

2) I would recommend that you try to tap into the data that’s being collected at the Veris project by Verizon. This will help with the sources of data, as this is real-world data.

3) We need to lobby our administrations to provide more access to the intelligence data on threats. They have way better data than we do, we just need to convince them that we are more protected by them sharing the data with the private sector than we are with them keeping it and protecting us in secret.

Regards

Please Post Your Comments & Reviews

Your email address will not be published. Required fields are marked *

RSS feed for comments on this post