• Home
  • /Learn
  • /How LLMs Are Enabling Automated Vulnerability Discovery
background image


How LLMs Are Enabling Automated Vulnerability Discovery


Vulnerability discovery and disclosure is a critical cybersecurity activity that takes place at the National and Global level with institutions leading the way. Offerings such as MITRE CVE and NIST NVD ensure that organizations can compare their IT operations against the most current information and remediate any known weaknesses. This upstream process of vulnerability discovery and disclosure is notably different from the downstream processes of vulnerability management that users need to manage to discover publicly available information about vulnerabilities, identify assets in their infrastructure that are affected, and apply mitigation strategies. 

The traditional means of vulnerability discovery (AKA bug hunting) is left up to software and hardware vendors themselves via internal security auditing and public bug-bounty programs, while independent researchers also operate in this space to ensure some accountability and third-party oversight. The most popular methods of discovering bugs include Static Application Security Testing (SAST) and Dynamic Application Security Testing (DAST).  The latter process - DAST - executes software with processes such as Fuzzing and Monkey Fuzzing to identify vulnerabilities and bugs. 

Out of both convenience and necessity, automated tools are increasingly being deployed to fuzz software for vulnerabilities, giving researchers an upper hand to identify questionable code among an increasingly deep ocean of open-source software packages. 

In today's article, we define fuzz testing, identify distinct types of fuzzing, and describe the benefits that fuzzing bestows on software supply chain security.  Finally, we will examine some next-gen Large Language Model (LLM) enabled fuzzing techniques that leverage the power of LLMs for more efficient automated vulnerability discovery and analysis.

What is Fuzz Testing?

Fuzz testing (also known as "Fuzzing"), is an automated software testing technique used to discover bugs, vulnerabilities, and weaknesses in software applications. It involves feeding invalid, unexpected, or random data, called "fuzz," into a program and monitoring its behavior for abnormalities, crashes, or security issues. Fuzzing aims to uncover hidden software defects that could potentially be exploited by attackers or cause system failures. The applied scope of fuzz testing is wide with many sub-categories and techniques.

The most common categories of fuzzing include:

  • Mutation-based Fuzzing: This approach involves modifying existing inputs or generating new inputs by randomly mutating valid input data (known as monkey fuzzing). It typically focuses on exploring different code paths and edge cases within the program.

  • Generation-based Fuzzing: In contrast to mutation-based fuzzing, generation-based fuzzing creates inputs from scratch based on defined input specifications or models. It can generate inputs that adhere to specific formats or structures, enabling more targeted testing.

  • Coverage-guided Fuzzing: Coverage-guided fuzzing, also known as feedback-driven fuzzing, uses feedback from the target program's execution to guide the generation of new inputs. It aims to maximize code coverage by prioritizing inputs that lead to unexplored code paths or branches.

  • Protocol Fuzzing: Protocol fuzzing focuses on testing network protocols, such as TCP/IP, HTTP, or proprietary protocols, by sending malformed or unexpected data packets to network services or applications. It aims to identify vulnerabilities in protocol implementations that could be exploited over a network.

  • Black-box vs White-box Fuzzing: Black-box fuzzing is done without any prior knowledge of an application's internal workings, while white-box fuzzing involves deep knowledge of the software's internal code and structure, allowing for more targeted and efficient testing.

  • Taint Analysis Fuzzing: Taint analysis fuzzing is a technique that combines the principles of taint analysis and fuzz testing to identify vulnerabilities in software applications. Taint analysis tracks the flow of potentially untrusted or malicious data (tainted data) through the program to see how it influences the application.

  • Execution-based Fuzzing (Dynamic Fuzzing): A method of fuzzing done while executing the program (as opposed to isolated unit tests), and supplying a wide range of inputs, including malformed or unexpected data, to find bugs and vulnerabilities that only manifest during runtime.

How Does Fuzz Testing Help Identify Software Vulnerabilities?

Fuzzing is a powerful and effective technique for improving software quality, enhancing security, and mitigating the risk of potential attacks by identifying bugs and vulnerabilities in software applications.

Fuzzing offers several benefits for software development and security testing:

  • Security Assurance: Fuzzing plays a crucial role in identifying security vulnerabilities and weaknesses in software, helping organizations proactively address potential threats and protect against exploitation by malicious actors. Fuzzing can uncover a wide range of bugs, including memory corruption issues, input validation errors, buffer overflows, and security vulnerabilities, that may go unnoticed by other testing methods.

  • More Efficient Use Human Resources: Fuzzing automates the process of testing software applications, reducing the need for manual effort and enabling continuous testing throughout the development lifecycle.

Examples Of LLM Enabled Fuzzing to Identify Software Vulnerabilities

Vulnerability discovery is an important part of keeping software supply chains secure, researchers are constantly working to develop novel and more effective processes.  Since the release of ChatGPT, Large Language Models (LLMs) have been on the forefront of efforts to improve the scalability of automated fuzzing techniques beyond the current state-of-the-art capabilities. 

One of the earliest examples of automated fuzzing, Google's OSS-Fuzz (since 2016) allows developers to automated bug discovery at scale for free. However, new LLM-based techniques are emerging.  LLMs can primarily benefit from these initial requirements by identifying fuzz targets automatically and dynamically. Simple LLM prompts such as "Fuzz the software package for me" are proving effective to automate the last existing manual processes of software Fuzzing. 

Let's examine some examples of how LLMs are enabling more efficient Fuzzing. 

  • Google Tests LLM with OSS-Fuzz: Google's research (oss-fuzz-gen) begins by leveraging LLM in tandem with OSS-Fuzz’s Fuzz Introspector tool. One limitation to OSS-Fuzz is the need for maintainers to make an initial time investment to integrate their projects into the infrastructure and then add fuzz targets. To solve this issue, the Fuzz Introspector pinpoints portions of a project’s code that are under-fuzzed yet hold high potential for uncovering vulnerabilities. This selected code is then fed into an evaluation framework, which constructs a detailed LLM prompt along with project-specific information. The LLM then crafts new fuzz targets, and feeds them into the OSS-fuzz evaluation framework. If the OSS-Fuzz encounters errors, the evaluation framework re-engages the LLM, requesting it, modifies the target and conducts the test again. Google expects the productivity benefits of this new technique to shave days off the required time to manually generate the same information for a single software package

  • KARTAL - Web App Fuzzing Using LLM: KARTAL applies LLM to dynamic web-application fuzzing. KARTAL has 3 core components: Fuzzer, Prompter, and Detector. Has 3 core components: Fuzzer, Prompter, and Detector. The Fuzzer collects application behavior from its source code, and passes it to the Prompter, which processes the data and generates an LLM prompt. Finally, the Detector passes the prompt to an LLM for a dynamic fuzzing analysis. The KARTAL model was reported to have attained an accuracy rate of 87.19%

  • Llm4Fuzz - LLM Guided Fuzzing of Smart Contracts: Llm4Fuzz can reduce human workload by using LLMs to direct the ItyFuzz smart contract fuzzer towards high-value code regions and generate input sequences using power scheduling to prioritize likely exploitation vectores. Evaluations showed substantial gains in efficiency, coverage, and vulnerability detection and uncovered five critical vulnerabilities with an expected exploitation cost of $247k

  • ChatAFL - LLM Protocol Guided Fuzzing: Many protocols are described in natural language documents, making automatic test generation difficult. The researchers and creators of ChatAFL proposed a novel approach that uses large language models (LLMs) to interpret natural language specifications and conduct predict message sequences and protocol fuzzing thereby enhancing the generation of test inputs. In comprehensive experiments involving real-world protocols, ChatAFL identified nine new vulnerabilities in widely-used protocols and outperformed leading fuzzers, achieving higher coverage of state transitions, states, and code

  • ParaFuzz - Detecting Poisoned NLP Inputs: ParaFuzz is a model for distinguishing between normal and poisoned text samples using predictive. The model works as a "trigger-removal" tool by comparing the semantic meaning of text and slight alterations that don't dramatically change the meaning, yet cause a noticeable shift in the target AI model's classification

  • Fuzz4All - Language Agnostic LLM-based Fuzzing: Fuzz4All is a universal LLM-based fuzzing application that can target many different input languages and many different features across supported languages. The researchers developed an "autoprompting" technique, to generate iterative fuzzing LLM prompts to create new fuzzing inputs for the target function. So far, the Fuzz4All model has found 98 bugs across target languages

  • Python Fuzz Forest: fuzz-forest uses OpenAI's ChatGPT to automate the creation, adjustment, and triaging of fuzz tests for short Python code segments across an entire software repository and integrates with the coverage-guided Atheris Python fuzzer


This article delved into the concept of fuzz testing, highlighting its various forms and their significance in uncovering software vulnerabilities. By automating the testing process, fuzzing not only aids in the early detection of bugs but also improves the overall quality and security of software applications. Automated fuzz testing represents a significant leap forward. For instance, Google's OSS-Fuzz project has been pivotal in automated bug discovery since 2016. 

However, the introduction of LLMs has further streamlined the automated fuzzing process by refining the fuzz testing procedures by leveraging optimizations that replace the need for human analysis and input. Examples like Google's oss-fuzz-gen, a collaboration with OSS-Fuzz, KARTAL's web application fuzzing, and the Llm4Fuzz initiative for smart contracts highlight the effectiveness of LLMs in enhancing fuzz testing's scope and precision. Notably, the ChatAFL project demonstrated the potential of LLMs in interpreting natural language specifications to improve test input generation, uncovering nine new vulnerabilities in widely-used protocols, while other novel approaches to LLM enhanced fuzzing offer unique efficiency benefits to traditional fuzzing efforts.

Looking for more cybersecurity updates and news? Sign up for our informational zero-spam newsletter.

Download our Free Buyer's Guide

Whether you are looking to complete Penetration Testing to manage risk, protect your data, comply with regulatory compliance standards or as a requirement for cyber insurance, selecting the right company is crucial.

Download our buyer’s guide to learn everything you need to know to successfully plan, scope and execute your penetration testing projects.