Author: Wicus Ross - Senior Security Researcher
In the Orange Cyberdefense (OCD) Security Navigator 2024 (SN24) we published a short piece on the Exploit Prediction Scoring System (EPSS) [1]. This high-level overview aimed to introduce the audience to the potential of using EPSS as an additional metric to prioritize vulnerabilities for remediation.
What sets EPSS apart from CVSS, is that it is an algorithm that takes exploitation observations and activity into account, thus resulting in a dynamic score that may change from time to time, acting as a barometer for exploitation activity of respective Common Vulnerabilities and Exposure (CVE). A CVSS score is much more static and likely to remain unchanged.
A CVSS score is typically assigned by a CVE Numbering Authority, and then later re-revaluated and enriched. The CVSS score provides an indication of how easy a vulnerability is to exploit, its impact, and scope. A CVSS score can be augmented based on temporal features. This temporal feature of CVSS is intended to make the CVSS more responsive to what attackers are doing. In other words, increased exploitation of a vulnerability or freely available proof of concept exploit code can all be used to ramp up the CVSS score. CVSS score can also be lowered if remediation is available. Other CVSS features also exist that can further augment the CVSS score. The temporal CVSS features are mostly used ad hoc and are normally considered specific to a company or industry.
By design, EPSS is more like a weather forecast in that it predicts the “cyber weather” for the immediate short term. An EPSS score communicates how likely a vulnerability is to be exploited globally within the next 30 days and does not make any statement about the impact or how easy a vulnerability is to exploit. Also, EPSS cannot predict the volume of exploitation attempts.
In this blog post we revisit the findings from the SN24, but introduce new, analogue visualizations that allow us to surface further insights from the application of EPSS in vulnerability management and open the door to potentially modeling the effect of different patching strategies that are based on EPSS. With this possibility in mind, we are also releasing a simple web-based tool that will allow vulnerability managers to experiment with the effect of selecting different EPSS levels as a patching strategy within their own environments.
Details of the tool, where to get it and how to use it are described at the end of this post.
In our Security Navigator '24 piece, we attempted to gauge whether the findings reported in the FIRST paper (which considered all known CVE) would hold when applied to a subset of vulnerabilities recorded on our customers’ estates. We also used our own database of known, exploited vulnerabilities, as the set used in the FIRST research was not available to us. The EPSS examples described in the SN24 were thus based on the datasets from our Vulnerability Operation Center (VOC) scanning data, Penetration Testing (PT) data, and our Computer Emergency Response Team’s (CERT) VulnWatch Exploit Database (EDB).
Using these datasets, we endeavored to reproduce part of a paper[2] assessing the capabilities of EPSSv3. This enabled us to demonstrate metrics that the EPSS paper highlighted, namely Coverage (Recall), Efficiency (Precision), and Effort.
These three features can be derived by selecting all the CVEs with an EPSS score equal to or greater than the chosen EPSS threshold value for a given population of vulnerabilities. These CVEs are referred to as our “remediation dataset”. To calculate the three features, we need to measure whether our remediation dataset intersected with a set of exploited vulnerabilities.
The number of vulnerabilities intersecting between the remediation dataset and the exploitation dataset is considered the True Positives (TP). I.e. The EPSS score correctly identified vulnerabilities as exploited and thus must be patched. Those that were selected to remedy but are not present in the pool of exploited vulnerabilities are False Positives (FP). The vulnerabilities that are being exploited but were not included in the remediation set are labelled False Negatives (FN), I.e. We would expect the EPSS score to highlight them, but it did not. Those vulnerabilities that are not exploited and were excluded from the remediation set are counted as True Negatives (TN). We are now able to calculate Efficiency, Coverage, and Effort.
“Efficiency” is the ratio of how many vulnerabilities were patched that were exploited (TP) versus the total number of patched vulnerabilities (TP + FP). Efficiency, also called Precision, is calculated as TP / (TP + FP).
“Coverage” is determined by how many vulnerabilities were patched that were exploited (TP) versus the total number of exploited vulnerabilities (TP + FN). Coverage, more formally Recall, is calculated as TP / (TP + FN).
“Effort” is the ratio of vulnerabilities selected given a specific threshold in terms of the total number of vulnerabilities that can be patched. Put differently, the subset of vulnerabilities that will be remediated is expressed as a fraction or percentage of the total number of vulnerabilities that could be patched.
Selecting an EPSS score as a threshold result in a set of CVEs to remediate. I.e. All CVEs with a score above the threshold are considered at risk of exploitation and therefore a priority for patching. For this set of vulnerabilities, we can calculate the Coverage, Effort, and Efficiency features discussed earlier. In the SN24 section where we introduced EPSS we used a Venn-like diagram to visualize the remediation set relative to the total vulnerability set and known exploited sub-set. The Venn format restricted us to presenting a prescribed EPSS score and the resultant Coverage, Effort, and Efficiency, or a specific Coverage, Effort or Efficiency target and the other correlated variables. For example, 15% Effort might be chosen based on a paper asserting that organizations manage to patch 10-15% percentage of vulnerabilities on average[3]. This visualization format is thus less than optimal.
By changing the visualization so that we plot the Coverage, Effort, and Efficiency features as a series, we see how the EPSS scores affect these features. That way we can easily identify the EPSS threshold that produces the optimal balance between Effort, Efficiency and Coverage.
Depicted below are some of the original Venn-like diagrams from the SN24 along with the respective line plots and an accompanying table to present the values at the selected EPSS value.
Vulnerability population (n = 24,165) is the collection of all vulnerabilities that require consideration. Jacobs et al. used the entire CVE dataset. For our purposes we use all the CVEs present in the dataset of unpatched client vulnerability findings we reported on in SN24. We will refer to these as the VOC results and this dataset is a subset of the total CVE pool of vulnerabilities.
Target exploit group is the collection of vulnerabilities believed to be exploited and must be patched. We derive these subsets by matching our client’s vulnerabilities with either:
The VulnWatch EDB is a collection of CVEs that the Orange Cyberdefense CERT have identified as being exploited in the wild. The CERT tracks specific products based on criteria, for example the product is part of services delivered by Orange Cyberdefense. The CERT will scrutinize the products by reviewing release documentation of software updates, perform code review where possible, use public sources to correlate their observations, use their own threat intelligence to make an assessment about the nature of the exploitation, etc.
The vulnerabilities in the Pentest EDB were identified by our Ethical Hackers as part of their assessment of a client’s environment or application. We treat these vulnerabilities as a special class of “exploited in the wild” since the “hypothetical” attacker, being the pentester, exploited these vulnerabilities to demonstrate risk.
Note: The VOC dataset in the SN24 referenced 24,177 vulnerabilities while the new line charts reference 24,165. This adjustment was made to filter out rejected CVEs. For our purposes, the 12 fewer CVEs are insignificant (<0.05%) given the total number.
Note: The line charts used to represent the Coverage, Efficiency, and Effort plots, have a specific layout. EPSS Threshold (x-axis) is represented using logscale. Logscale is used here since the EPSS value can span rational numbers between 0 and 1, of which many may reside at values below 0.01 or 1%. Using logscale here provides a uniform spread, at the cost of possibly distorting the true “shape” of the respective plots. The Value of Metric (y-axis) represents the percentage of CVEs at a given x-axis (EPSS Threshold). This scale is linear.
The first example uses CVE from our Pentest findings as the database of known exploited vulnerabilities.
Figure 1- OCD Pentest Vulns
EPSS | 8.5% | EPSS Value |
Remediation Set Total | 3,631 | No of CVEs in scope for patching |
Exploit DB Total | 482 | No CVE in our set of known exploited |
Total Possible Vulnerabilities | 24,165 | Total CVE in our vulnerability population |
True Positive | 131 | Correctly identified as exploited |
False Positive | 3,500 | Identified but not actually in the exploited set |
True Negative | 20,183 | Not identified and not exploited |
False Negative | 351 | Not identified, but in the exploited set |
Coverage | 27.18% |
|
Efficiency | 3.61% |
|
Effort | 15.03% |
|
The next example uses CVEs from our “VulnWatch” vulnerability research service as the database of known exploited vulnerabilities.
Figure 2 - OCD VulnWatch Exploit DB
EPSS | 8.5% | EPSS Value |
Remediation Set Total | 3,631 | No of CVEs in scope for patching |
Exploit DB Total | 439 | No CVE in our set of known exploited |
Total Possible Vulnerabilities | 24,165 | Total CVE in our vulnerability population |
True Positive | 253 | Correctly identified as exploited |
False Positive | 3,378 | Identified but not actually in the exploited set |
True Negative | 20,348 | Not identified and not exploited |
False Negative | 186 | Not identified, but in the exploited set |
Coverage | 57.63% |
|
Efficiency | 6.97% |
|
Effort | 15.03% |
|
For example, this kind of visualization enables us to see at a glance how EPSS scores are behaving between distinct CVE sets. In this case EPSS is more accurate at predicting vulnerabilities that will be publicly recorded in the wild (e.g. in our VulnWatch database) then those that might be used by penetration testers that are operating inside a target network (e.g. in our pentest findings database).
What follows is an example of how an EPSS line chart of this kind could be used, combined with the appropriately selected EDB, to compare and select an EPSS value that produces an optimized patching strategy.
The previous section introduced the line charts and highlighted interesting data points such as the intersections of Efficiency and Effort, as well as the intersection point of Efficiency and Coverage.
The charts below once again depict these values for each possible EPSS threshold score, using our own internal penetration testing EDB to represent the set of known exploited vulnerabilities.
We suggest that the EPSS score at the visual intersection of the Efficiency/Effort (EE) values is a suitable candidate value for an “optimal” patching strategy. This is represented by the crossing of the purple and orange lines below. Any EPSS below the Efficiency/Effort (EE) intersection promises diminishing returns, where the additional Effort will be wasted with no gain in Efficiency, although Coverage will increase.
Any EPSS greater than or equal to the EPSS at the Efficient / Coverage intersection (EC) has negligible impact with respect to Coverage, albeit at a “high” level of Efficiency. This is represented by the crossing of the grey and orange lines below. We consider this the practical upper bound for EPSS threshold selection, but generally should not be considered as this will sacrifice coverage.
Any EPSS between the EE and EC intersections can be considered within the range of “sensible” options.
Figure 3 - Pentest Exploit Dataset Efficiency/Effort Intersection
EPSS | 81.9% | EPSS Value |
Remediation Set Total | 1,347 | No of CVEs in scope for patching |
Exploit DB Total | 482 | No CVE in our set of known exploited |
Total Possible Vulnerabilities | 24,165 | Total CVE in our vulnerability population |
True Positive | 75 | Correctly identified as exploited |
False Positive | 1,272 | Identified but not actually in the exploited set |
True Negative | 22,411 | Not identified and not exploited |
False Negative | 407 | Not identified, but in the exploited set |
Coverage | 15.56% |
|
Efficiency | 5.57% |
|
Effort | 5.57% |
|
An EPSS of 81.9% in the example above encompasses 75 (or approximately 16%) of all vulnerabilities present in the exploit dataset. The number of CVEs that will be selected for patching is 1347 (TP + FP) or 5.57% of all the CVEs we have in our population set. Because the Effort and Efficiency plots intersect at an EPSS of 81.9%, we have an Efficiency of 5.57%.
NOTE: Figure 3 is a partial chart as we zoomed in to pinpoint EPSS 81.9%, compared to Figure 1 which is the same chart at 100% zoom.
Now let us consider the EC intersection, which is at the upper bound of the “sensible” range we identified earlier.
Figure 4 - Pentest Exploit Dataset Efficiency/Coverage Intersection
EPSS | 95.8% | EPSS Value |
Remediation Set Total | 486 | No of CVEs in scope for patching |
Exploit DB Total | 482 | No CVE in our set of known exploited |
Total Possible Vulnerabilities | 24,165 | Total CVE in our vulnerability population |
True Positive | 54 | Correctly identified as exploited |
False Positive | 432 | Identified but not actually in the exploited set |
True Negative | 251 | Not identified and not exploited |
False Negative | 428 | Not identified, but in the exploited set |
Coverage | 11.20% |
|
Efficiency | 11.11% |
|
Effort | 2.01% |
|
For an EPSS of 95.8% a total number of 486 (TP + FN) CVEs will be included in the remediation set, which is just four more than the total number of vulnerabilities in the entire penetration testing exploit dataset. The total Effort was reduced by more than half while Efficiency doubled, but Coverage drops by a third. The EPSS score of 95.8% represents the bare minimum of effort that could be considered “effective”. A lower EPSS value may be worth evaluating given the low corresponding Coverage, Efficiency, and Effort ratios.
NOTE: Figure 4 is a partial chart as we zoomed in to pinpoint EPSS 95.8%, compared to Figure 2 which is the same chart at 100% zoom.
We can redraw the aforementioned charts, but this time using the VulnWatch EDB in our dataset:
Figure 5 - VulnWatch Exploit Dataset Efficiency/Effort Intersection
EPSS | 37.8% | EPSS Value |
Remediation Set Total | 2,316 | No of CVEs in scope for patching |
Exploit DB Total | 439 | No CVE in our set of known exploited |
Total Possible Vulnerabilities | 24,165 | Total CVE in our vulnerability population |
True Positive | 221 | Correctly identified as exploited |
False Positive | 2,095 | Identified but not actually in the exploited set |
True Negative | 21,631 | Not identified and not exploited |
False Negative | 218 | Not identified, but in the exploited set |
Coverage | 50.34% |
|
Efficiency | 9.54% |
|
Effort | 9.58% |
|
An EPSS of 37.8% will achieve a Coverage of 50.34% or 221 of the 439 VulnWatch EDB total. Put differently, to achieve an “optimum” Efficiency and Effort balance for this dataset, a total of 2,316 vulnerabilities must be patched. This leaves 21,631 vulnerabilities unpatched (though not considered exploited) whereas 218 vulnerabilities would be considered exploited but left unpatched.
Figure 6 - VulnWatch Exploit Dataset Efficiency/Coverage Intersection
96.1% | EPSS Value | |
Remediation Set Total | 447 | No of CVEs in scope for patching |
Exploit DB Total | 439 | No CVE in our set of known exploited |
Total Possible Vulnerabilities | 24,165 | Total CVE in our vulnerability population |
True Positive | 148 | Correctly identified as exploited |
False Positive | 299 | Identified but not actually in the exploited set |
True Negative | 23,427 | Not identified and not exploited |
False Negative | 291 | Not identified, but in the exploited set |
Coverage | 33.71% |
|
Efficiency | 33.11% |
|
Effort | 1.85% |
|
A bare minimum approach would require that only 447 vulnerabilities be patched if an EPSS of 96.1% is selected, giving an Efficiency and Coverage of approximately 33%. The Effort required will be less than 2% of the total 24,165 vulnerabilities.
If we expand the size of the exploitation dataset to include all the penetration testing vulnerabilities, the OCD VulnWatch EDB, and the Cybersecurity & Infrastructure Security Agency (CISA) Known Exploited Vulnerabilities (KEV), then the resulting dataset contains 2,945 unique vulnerabilities.
Figure 7 - Efficiency/Effort Intersection for the combined exploit datasets.
EPSS | 2.2% | EPSS Value |
Remediation Set Total | 5,770 | No of CVEs in scope for patching |
Exploit DB Total | 2,945 | No CVE in our set of known exploited |
Total Possible Vulnerabilities | 24,165 | Total CVE in our vulnerability population |
True Positive | 1,392 | Correctly identified as exploited |
False Positive | 4,378 | Identified but not actually in the exploited set |
True Negative | 16,842 | Not identified and not exploited |
False Negative | 1,553 | Not identified, but in the exploited set |
Coverage | 47.27% |
|
Efficiency | 24.12% |
|
Effort | 23.88% |
|
At the (lower) EE convergence threshold of 2.2%, almost double the number of vulnerabilities (5,770) present in the Exploit DB (2,945) will be remediated, resulting in 1,392 exploited vulnerabilities to be patched. This still results in patching 4,378 vulnerabilities that were not exploited and missing 1,553 vulnerabilities that were exploited.