14 June 2024| Blog

Exploring the Exploit Prediction Scoring System

Author: Wicus Ross - Senior Security Researcher

In the Orange Cyberdefense (OCD) Security Navigator 2024 (SN24) we published a short piece on the Exploit Prediction Scoring System (EPSS) [1]. This high-level overview aimed to introduce the audience to the potential of using EPSS as an additional metric to prioritize vulnerabilities for remediation.

What sets EPSS apart from CVSS, is that it is an algorithm that takes exploitation data into account, thus resulting in a dynamic score that may change from time to time, acting as a barometer for exploitation activity of respective Common Vulnerabilities and Exposure (CVE). A CVSS score is much more static and likely to remain unchanged. The EPSS algorithm does not consider active exploitation activity as part of its calculation

A CVSS score is typically assigned by a CVE Numbering Authority, and then later re-revaluated and enriched. The CVSS score provides an indication of how easy a vulnerability is to exploit, its impact, and scope. A CVSS score can be augmented based on temporal features. This temporal feature of CVSS is intended to make the CVSS more responsive to what attackers are doing. In other words, increased exploitation of a vulnerability or freely available proof of concept exploit code can all be used to ramp up the CVSS score. CVSS score can also be lowered if remediation is available. Other CVSS features also exist that can further augment the CVSS score. The temporal CVSS features are mostly used ad hoc and are normally considered specific to a company or industry.

By design, EPSS is more like a weather forecast in that it predicts the “cyber weather” for the immediate short term. An EPSS score communicates how likely a vulnerability is to be exploited globally within the next 30 days and does not make any statement about the impact or how easy a vulnerability is to exploit. Also, EPSS cannot predict the volume of exploitation attempts.

In this blog post we revisit the findings from the SN24, but introduce new, analogue visualizations that allow us to surface further insights from the application of EPSS in vulnerability management and open the door to potentially modeling the effect of different patching strategies that are based on EPSS. With this possibility in mind, we are also releasing a simple web-based tool that will allow vulnerability managers to experiment with the effect of selecting different EPSS levels as a patching strategy within their own environments.

The tool consists of an HTML file with JavaScript to manipulate uploaded CSV files and plot a chart, plus allow for calculating certain supporting metrics about Effort, Efficiency, and Coverage. Everything is performed in the browser.

Here is a link to a version of the tool hosted on the web that does not require you to download the file:

https://wicusross.github.io/

You can find the GitHub repository here, which also includes CSV example files:

https://github.com/Orange-Cyberdefense/epss_evaluations

To recap

In our Security Navigator '24 piece, we attempted to gauge whether the findings reported in the FIRST paper (which considered all known CVE) would hold when applied to a subset of vulnerabilities recorded on our customers’ estates. We also used our own database of known, exploited vulnerabilities, as the set used in the FIRST research was not available to us. The EPSS examples described in the SN24 were thus based on the datasets from our Vulnerability Operation Center (VOC) scanning data, Penetration Testing (PT) data, and our Computer Emergency Response Team’s (CERT) VulnWatch Exploit Database (EDB).

Using these datasets, we endeavored to reproduce part of a paper[2] assessing the capabilities of EPSSv3. This enabled us to demonstrate metrics that the EPSS paper highlighted, namely Coverage (Recall), Efficiency (Precision), and Effort.

These three features can be derived by selecting all the CVEs with an EPSS score equal to or greater than the chosen EPSS threshold value for a given population of vulnerabilities. These CVEs are referred to as our “remediation dataset”. To calculate the three features, we need to measure whether our remediation dataset intersected with a set of exploited vulnerabilities.

The number of vulnerabilities intersecting between the remediation dataset and the exploitation dataset is considered the True Positives (TP). I.e. The EPSS score correctly identified vulnerabilities as exploited and thus must be patched. Those that were selected to remedy but are not present in the pool of exploited vulnerabilities are False Positives (FP). The vulnerabilities that are being exploited but were not included in the remediation set are labelled False Negatives (FN), I.e. We would expect the EPSS score to highlight them, but it did not. Those vulnerabilities that are not exploited and were excluded from the remediation set are counted as True Negatives (TN). We are now able to calculate Efficiency, Coverage, and Effort.

“Efficiency” is the ratio of how many vulnerabilities were patched that were exploited (TP) versus the total number of patched vulnerabilities (TP + FP). Efficiency, also called Precision, is calculated as TP / (TP + FP).

“Coverage” is determined by how many vulnerabilities were patched that were exploited (TP) versus the total number of exploited vulnerabilities (TP + FN). Coverage, more formally Recall, is calculated as TP / (TP + FN).

“Effort” is the ratio of vulnerabilities selected given a specific threshold in terms of the total number of vulnerabilities that can be patched. Put differently, the subset of vulnerabilities that will be remediated is expressed as a fraction or percentage of the total number of vulnerabilities that could be patched.

Expanding the analysis

Selecting an EPSS score as a threshold result in a set of CVEs to remediate. I.e. All CVEs with a score above the threshold are considered at risk of exploitation and therefore a priority for patching. For this set of vulnerabilities, we can calculate the Coverage, Effort, and Efficiency features discussed earlier. In the SN24 section where we introduced EPSS we used a Venn-like diagram to visualize the remediation set relative to the total vulnerability set and known exploited sub-set. The Venn format restricted us to presenting a prescribed EPSS score and the resultant Coverage, Effort, and Efficiency, or a specific Coverage, Effort or Efficiency target and the other correlated variables. For example, 15% Effort might be chosen based on a paper asserting that organizations manage to patch 10-15% percentage of vulnerabilities on average[3]. This visualization format is thus less than optimal.

By changing the visualization so that we plot the Coverage, Effort, and Efficiency features as a series, we see how the EPSS scores affect these features. That way we can easily identify the EPSS threshold that produces the optimal balance between Effort, Efficiency and Coverage.

Depicted below are some of the original Venn-like diagrams from the SN24 along with the respective line plots and an accompanying table to present the values at the selected EPSS value.

Vulnerability population (n = 24,165) is the collection of all vulnerabilities that require consideration. Jacobs et al. used the entire CVE dataset. For our purposes we use all the CVEs present in the dataset of unpatched client vulnerability findings we reported on in SN24. We will refer to these as the VOC results and this dataset is a subset of the total CVE pool of vulnerabilities.

Target exploit group is the collection of vulnerabilities believed to be exploited and must be patched. We derive these subsets by matching our client’s vulnerabilities with either:

Our own internal ‘VulnWatch’ Exploit Database (EDB) (n = 439), or
The Pentest EDB is list of CVEs reported by our Ethical Hackers on clients' estates (n = 482)

The VulnWatch EDB is a collection of CVEs that the Orange Cyberdefense CERT have identified as being exploited in the wild. The CERT tracks specific products based on criteria, for example the product is part of services delivered by Orange Cyberdefense. The CERT will scrutinize the products by reviewing release documentation of software updates, perform code review where possible, use public sources to correlate their observations, use their own threat intelligence to make an assessment about the nature of the exploitation, etc.

The vulnerabilities in the Pentest EDB were identified by our Ethical Hackers as part of their assessment of a client’s environment or application. We treat these vulnerabilities as a special class of “exploited in the wild” since the “hypothetical” attacker, being the pentester, exploited these vulnerabilities to demonstrate risk.

Note: The VOC dataset in the SN24 referenced 24,177 vulnerabilities while the new line charts reference 24,165. This adjustment was made to filter out rejected CVEs. For our purposes, the 12 fewer CVEs are insignificant (<0.05%) given the total number.

Note: The line charts used to represent the Coverage, Efficiency, and Effort plots, have a specific layout. EPSS Threshold (x-axis) is represented using logscale. Logscale is used here since the EPSS value can span rational numbers between 0 and 1, of which many may reside at values below 0.01 or 1%. Using logscale here provides a uniform spread, at the cost of possibly distorting the true “shape” of the respective plots. The Value of Metric (y-axis) represents the percentage of CVEs at a given x-axis (EPSS Threshold). This scale is linear.

The first example uses CVE from our Pentest findings as the database of known exploited vulnerabilities.

A close-up of a hexagon Description automatically generated

A graph of a graph with text Description automatically generated with medium confidence

Figure 1- OCD Pentest Vulns

EPSS	8.5%	EPSS Value
Remediation Set Total	3,631	No of CVEs in scope for patching
Exploit DB Total	482	No CVE in our set of known exploited
Total Possible Vulnerabilities	24,165	Total CVE in our vulnerability population
True Positive	131	Correctly identified as exploited
False Positive	3,500	Identified but not actually in the exploited set
True Negative	20,183	Not identified and not exploited
False Negative	351	Not identified, but in the exploited set
Coverage	27.18%
Efficiency	3.61%
Effort	15.03%

The next example uses CVEs from our “VulnWatch” vulnerability research service as the database of known exploited vulnerabilities.

A white and black text with a hexagon Description automatically generated

A graph of efficiency and efficiency Description automatically generated

Figure 2 - OCD VulnWatch Exploit DB

EPSS	8.5%	EPSS Value
Remediation Set Total	3,631	No of CVEs in scope for patching
Exploit DB Total	439	No CVE in our set of known exploited
Total Possible Vulnerabilities	24,165	Total CVE in our vulnerability population
True Positive	253	Correctly identified as exploited
False Positive	3,378	Identified but not actually in the exploited set
True Negative	20,348	Not identified and not exploited
False Negative	186	Not identified, but in the exploited set
Coverage	57.63%
Efficiency	6.97%
Effort	15.03%

For example, this kind of visualization enables us to see at a glance how EPSS scores are behaving between distinct CVE sets. In this case EPSS is more accurate at predicting vulnerabilities that will be publicly recorded in the wild (e.g. in our VulnWatch database) then those that might be used by penetration testers that are operating inside a target network (e.g. in our pentest findings database).

Extracting value from charts

What follows is an example of how an EPSS line chart of this kind could be used, combined with the appropriately selected EDB, to compare and select an EPSS value that produces an optimized patching strategy.

The previous section introduced the line charts and highlighted interesting data points such as the intersections of Efficiency and Effort, as well as the intersection point of Efficiency and Coverage.

Using our internal exploit database as control

The charts below once again depict these values for each possible EPSS threshold score, using our own internal penetration testing EDB to represent the set of known exploited vulnerabilities.

We suggest that the EPSS score at the visual intersection of the Efficiency/Effort (EE) values is a suitable candidate value for an “optimal” patching strategy. This is represented by the crossing of the purple and orange lines below. Any EPSS below the Efficiency/Effort (EE) intersection promises diminishing returns, where the additional Effort will be wasted with no gain in Efficiency, although Coverage will increase.

Any EPSS greater than or equal to the EPSS at the Efficient / Coverage intersection (EC) has negligible impact with respect to Coverage, albeit at a “high” level of Efficiency. This is represented by the crossing of the grey and orange lines below. We consider this the practical upper bound for EPSS threshold selection, but generally should not be considered as this will sacrifice coverage.

Any EPSS between the EE and EC intersections can be considered within the range of “sensible” options.

A graph of a graph showing the difference between a number of individuals Description automatically generated with medium confidence

Figure 3 - Pentest Exploit Dataset Efficiency/Effort Intersection

EPSS	81.9%	EPSS Value
Remediation Set Total	1,347	No of CVEs in scope for patching
Exploit DB Total	482	No CVE in our set of known exploited
Total Possible Vulnerabilities	24,165	Total CVE in our vulnerability population
True Positive	75	Correctly identified as exploited
False Positive	1,272	Identified but not actually in the exploited set
True Negative	22,411	Not identified and not exploited
False Negative	407	Not identified, but in the exploited set
Coverage	15.56%
Efficiency	5.57%
Effort	5.57%

An EPSS of 81.9% in the example above encompasses 75 (or approximately 16%) of all vulnerabilities present in the exploit dataset. The number of CVEs that will be selected for patching is 1347 (TP + FP) or 5.57% of all the CVEs we have in our population set. Because the Effort and Efficiency plots intersect at an EPSS of 81.9%, we have an Efficiency of 5.57%.

NOTE: Figure 3 is a partial chart as we zoomed in to pinpoint EPSS 81.9%, compared to Figure 1 which is the same chart at 100% zoom.

Now let us consider the EC intersection, which is at the upper bound of the “sensible” range we identified earlier.

A graph with orange and blue lines Description automatically generated

Figure 4 - Pentest Exploit Dataset Efficiency/Coverage Intersection

EPSS	95.8%	EPSS Value
Remediation Set Total	486	No of CVEs in scope for patching
Exploit DB Total	482	No CVE in our set of known exploited
Total Possible Vulnerabilities	24,165	Total CVE in our vulnerability population
True Positive	54	Correctly identified as exploited
False Positive	432	Identified but not actually in the exploited set
True Negative	251	Not identified and not exploited
False Negative	428	Not identified, but in the exploited set
Coverage	11.20%
Efficiency	11.11%
Effort	2.01%

For an EPSS of 95.8% a total number of 486 (TP + FN) CVEs will be included in the remediation set, which is just four more than the total number of vulnerabilities in the entire penetration testing exploit dataset. The total Effort was reduced by more than half while Efficiency doubled, but Coverage drops by a third. The EPSS score of 95.8% represents the bare minimum of effort that could be considered “effective”. A lower EPSS value may be worth evaluating given the low corresponding Coverage, Efficiency, and Effort ratios.

NOTE: Figure 4 is a partial chart as we zoomed in to pinpoint EPSS 95.8%, compared to Figure 2 which is the same chart at 100% zoom.

Using our VulnWatch exploit database as control

We can redraw the aforementioned charts, but this time using the VulnWatch EDB in our dataset:

A graph of a graph showing the difference between coverage and efficiency Description automatically generated with medium confidence

Figure 5 - VulnWatch Exploit Dataset Efficiency/Effort Intersection

EPSS	37.8%	EPSS Value
Remediation Set Total	2,316	No of CVEs in scope for patching
Exploit DB Total	439	No CVE in our set of known exploited
Total Possible Vulnerabilities	24,165	Total CVE in our vulnerability population
True Positive	221	Correctly identified as exploited
False Positive	2,095	Identified but not actually in the exploited set
True Negative	21,631	Not identified and not exploited
False Negative	218	Not identified, but in the exploited set
Coverage	50.34%
Efficiency	9.54%
Effort	9.58%

An EPSS of 37.8% will achieve a Coverage of 50.34% or 221 of the 439 VulnWatch EDB total. Put differently, to achieve an “optimum” Efficiency and Effort balance for this dataset, a total of 2,316 vulnerabilities must be patched. This leaves 21,631 vulnerabilities unpatched (though not considered exploited) whereas 218 vulnerabilities would be considered exploited but left unpatched.

A graph of efficiency and efficiency Description automatically generated

Figure 6 - VulnWatch Exploit Dataset Efficiency/Coverage Intersection

EPSS	96.1%	EPSS Value
Remediation Set Total	447	No of CVEs in scope for patching
Exploit DB Total	439	No CVE in our set of known exploited
Total Possible Vulnerabilities	24,165	Total CVE in our vulnerability population
True Positive	148	Correctly identified as exploited
False Positive	299	Identified but not actually in the exploited set
True Negative	23,427	Not identified and not exploited
False Negative	291	Not identified, but in the exploited set
Coverage	33.71%
Efficiency	33.11%
Effort	1.85%

A bare minimum approach would require that only 447 vulnerabilities be patched if an EPSS of 96.1% is selected, giving an Efficiency and Coverage of approximately 33%. The Effort required will be less than 2% of the total 24,165 vulnerabilities.

Using all the exploit database we have as control.

If we expand the size of the exploitation dataset to include all the penetration testing vulnerabilities, the OCD VulnWatch EDB, and the Cybersecurity & Infrastructure Security Agency (CISA) Known Exploited Vulnerabilities (KEV), then the resulting dataset contains 2,945 unique vulnerabilities.

A graph of different colored lines Description automatically generated

Figure 7 - Efficiency/Effort Intersection for the combined exploit datasets.

EPSS	2.2%	EPSS Value
Remediation Set Total	5,770	No of CVEs in scope for patching
Exploit DB Total	2,945	No CVE in our set of known exploited
Total Possible Vulnerabilities	24,165	Total CVE in our vulnerability population
True Positive	1,392	Correctly identified as exploited
False Positive	4,378	Identified but not actually in the exploited set
True Negative	16,842	Not identified and not exploited
False Negative	1,553	Not identified, but in the exploited set
Coverage	47.27%
Efficiency	24.12%
Effort	23.88%

At the (lower) EE convergence threshold of 2.2%, almost double the number of vulnerabilities (5,770) present in the Exploit DB (2,945) will be remediated, resulting in 1,392 exploited vulnerabilities to be patched. This still results in patching 4,378 vulnerabilities that were not exploited and missing 1,553 vulnerabilities that were exploited.

Figure 8 - Efficiency/Coverage Intersection for the combined exploit datasets.

EPSS	19.9%	EPSS Value
Remediation Set Total	2,945	No of CVEs in scope for patching
Exploit DB Total	2,945	No CVE in our set of known exploited
Total Possible Vulnerabilities	24,165	Total CVE in our vulnerability population
True Positive	890	Correctly identified as exploited
False Positive	2,055	Identified but not actually in the exploited set
True Negative	19,165	Not identified and not exploited
False Negative	2,055	Not identified, but in the exploited set
Coverage	30.22%
Efficiency	30.22%
Effort	12.19%

At the (higher) EC threshold, the Remediation and Exploit DB sets are coincidentally the same number of elements, but there is only a 30% overlap with 890 vulnerabilities. The intersections for EE and EC are at low EPSS compared with our other datasets. The combined exploit dataset is nearly 7 times larger than the VulnWatch EDB we used in the SN24 example. The Effort is also much higher for the combined exploitation set since there are obviously many more exploited vulnerabilities to target.

Conclusion

We do not really know which vulnerabilities will be exploited by attackers in any environment. However, tools like EPSS use metrics that can predict possible exploitation candidates.

EPSS can be used to prioritize CVEs in the absence of general threat intelligence, but that means selecting an EPSS value based on available resources and risk appetite. EPSS can be incorporated into existing vulnerability management efforts to enrich the process that prioritizes patching workloads.

The examples we have used in this piece are based on actual vulnerabilities discovered as part of our Vulnerability Operations Center’s service provided to clients. The exploited vulnerabilities we are aiming to prioritize are a combination of vulnerabilities used by malicious attackers and ethical hackers. Combining these vulnerabilities to evaluate the performance of EPSS is a theoretical exercise, but in our experiments, we are trying to assess the relative efficiency and effectiveness of patching strategies resulting from various EPSS levels on the actual CVE we discover on our enterprise clients’ networks.

Of course, in most environments there are multiple occurrences of the same vulnerability, meaning that one vulnerability must be patched several times on different hosts. Only unique vulnerabilities were handled in the examples discussed, but a vulnerability could occur multiple times in an environment thus amplifying the efforts further straining the process. We hope to consider this factor in future work on this topic.

Links

[1] https://www.orangecyberdefense.com/global/security-navigator

[2] https://arxiv.org/pdf/2302.14172.pdf

[3] J. Jacobs, S. Romanosky, I. Adjerid and W. Baker, "Improving vulnerability remediation through better exploit prediction," Journal of Cybersecurity, vol. 6, no. 1, 2020.

24/7 incident hotline