Black Hat - OpenSSF CVE Benchmark

What I learned at Black Hat Europe 2020!

Every December, many of our customers start winding down operations. A few members of the CDA used this downtime to attend training courses. We attended Black Hat Europe, which was 100% virtual in 2020. We were looking for inspiration for the upcoming year. Black Hat conferences are known for their technical presentations driven by top security experts. It’s a forum to present their research and community contributions. Research often results in the public release of a Common Vulnerabilities and Exposures (CVE) discovered and responsibly reported throughout the year.

During the first two days of Black Hat, I attended the DevSecOps Master Class created by we45. This course focused on different tools and tactics used to implement security in your CI/CD pipeline.I enjoyed the course and learned a lot in just a few days.The class inspired me to dig a little deeper and left me with lots of homework. I’m still working through much of it in my lab with topics such as Terraform and GitHub actions.The lab exercises demonstrated several application security tools used throughout the industry. I wrote this blog to draw attention to the OpenSource Security Foundation (OSSF) CVE Benchmark project. This tool helps rate other security tools based on accuracy and reduces “False Positives.”

Static Application Security Testing (SAST)

Static Application Security Testing (SAST) can be a bit overwhelming. The first challenge is sorting through the many tools available. All the tools seem to leverage a similar approach with different functions. Learning to interpret the output of each can be daunting. The goal is to determine if weaknesses exist in your code while limiting false positives or false negatives. When considering a tool, you could test against known vulnerable codebases such as OWASP© JuiceShop, DVWA, WebGoat, or the NIST SAMATE project. Vulnerable code datasets usually do not include the patched version of code, leading to false positives.

Tools such as CodeQL, ESLint, and NodeJSscan are standard tools for SAST. The real question is, what is the next step? How do you choose which tools you want to run? A few questions come to mind:

  • How can you evaluate tools against software that is not intentionally vulnerable?

  • How can you tell if tools can accurately detect a vulnerability based on CVE?

  • Is patched code still creating noise with false positives after someone fixed it?

  • How can you evaluate the results from the SAST tools to establish a benchmark?

Figure 1: Screenshot of OSSF CVE Benchmark report depicting both detection of CVE and patch in code. This sample report tested ESLint’s ability to validate CVE-2018-16492 and CVE-2020-4066.

OpenSSF CVE Benchmark

Bas van Schaik and Kevin Backhouse answered these questions at the Black Hat 2020 Europe briefing entitled, “FPs are Cheap. Show me the CVEs!”. The Open Source Security Foundation (OpenSSF) unveiled the CVE Benchmark tool during the presentation. The creators found it challenging to evaluate tools based on intentionally vulnerable code. Generally, training web apps or datasets are unpatched, making it difficult to see how SAST tools would re-evaluate patched code. Thanks to the many open-source projects, better samples are available for the next phase of tool evaluation.

Although released in December 2020, the OpenSSF CVE Benchmark tool already contains over 200 CVEs for JavaScript and TypeScript testing. The OpenSSF CVE Benchmark tool comes with drivers for ESLint, NodeJSscan, and CodeQL testing.

How does it work?

OpenSSF CVE Benchmark is a framework to evaluate SAST tools using actual code. Known CVEs are tested by cloning 200+ repos from GitHub. These repos contain both vulnerable and patched codebases. GitHub fix commits, and vulnerability location within the metadata allows the CVE Benchmark tool to test a specific location to see if a CVE is detected. Not detecting a vulnerability in specific locations results in a false negative. Then it checks which rule was responsible for generating the alert and tests that location in the patched code. When patched code is detected as a risk, it is a false positive. Scores are calculated based on vulnerability detection in unpatched and patched code.


The installation process was simple. I started by reviewing the GitHub documentation (Found here). I then built a Ubuntu 20.10 virtual machine and installed the dependency requirements (npm ≥ 7 and Node.js ≥ 12). Next, I followed the online guides to configure the tool testing drivers. Once complete, you choose to test a single tool or run their benchmark against multiple tools for comparison. I started my preliminary testing with ESLint to understand how the tool operates and get familiar with the output. There are options to run against a specific CVE, range, or all the CVEs. Running against all CVEs, which I did, takes several hours to clone the GitHub repos (18GB) and run the tests. “Inquiring minds want to know,” so it was worth the wait.


After running the CVE Benchmark tool, you can view the results in text format or use the built-in web reporting. The web reporting displays an interactive scorecard; green indicates the tool found CVEs and recognized the patch, orange indicates a found vulnerability but not a patch, and red indicates that neither were detected. The report provides a testing summary grouped by CWE, then sub-divided into CVE. Direct links are provided within the report to drill down into additional information about the CVE. Links are provided to unpatched source code, followed by the GitHub commit for the patch. Overall, I am impressed with the output and quality of the tool. It can compare and evaluate different security tools… pretty cool! Using CVE Benchmark gives us the ability to evaluate single or multiple tools and provides insight into their effectiveness with “real world” code and patches.

Figure 2: Overview of web-based reporting, where green indicates the tool has found CVEs and recognized the patch, orange indicates a discovered vulnerability but not the patch, and red indicates that neither were detected.

Next Steps

The OpenSSF Security Tooling Working Group is actively looking for community support to help build more integrations with other security tools. You can create an integration driver with less than 200 lines of code! Their roadmap includes expanding on the CVE dataset and growing to support additional programming languages.


Blog: Introducing the OpenSSF CVE Benchmark - Open Source Security Foundation


Website: OpenSSF Security Tooling Working Group

Send your suggestions on Twitter: @bas_van_schaik