As Large Language Models (LLMs) such as ChatGPT and its rivals grow in popularity, it's important to understand the unique cybersecurity challenges they present. The incredibly fast pace of LLM development promises big benefits to organizations in the way of cost savings and expedited productivity. This is primarily driven by LLM API integration and LLM build tools that support the lateral growth and adoption of LLMs, allowing many organizations to integrate LLM-based services into both internal and customer-facing architecture.
Robust cybersecurity measures are imperative to maintain user trust and prevent the exploitation of LLMs for malicious purposes, and protect against financial losses. Securing LLM environments includes the same core defensive cybersecurity principles as traditional infrastructure; protecting sensitive data and user privacy, maintaining network access controls and system integrity, monitoring, detecting, responding, and so on. However, LLM technology does include unique challenges to cybersecurity that must be thoroughly understood and mitigated by companies seeking to benefit from LLM technologies.
OWASP Top Ten Large Language Model Vulnerabilities
OWASP's new Top Ten Vulnerabilities For Large Language Model Applications aims to educate stakeholders, including developers and organizations, on the security risks and mitigation strategies for LLM security. Some of OWASP's new LLM-specific vulnerabilities mirror OWASP's other advisories such as OWASP Web Application Top Ten, but the list also includes some novel attack tactics and techniques to consider.
The Complete List
LLM01:2023 - Prompt Injections: Supplying malicious input to bypass filters or perform unintended actions.
LLM02:2023 - Data Leakage: Revealing sensitive or proprietary information unintentionally
LLM03:2023 - Inadequate Sandboxing: Allowing for unauthorized access to the underlying IT environment
LLM04:2023 - Unauthorized Code Execution: Using malformed natural language inputs to execute malicious code or commands
LLM05:2023 - SSRF Vulnerabilities: Accessing restricted resources, internal services, APIs, or data causing the target server to make HTTP requests to an arbitrary domain
LLM06:2023 - Over-reliance on LLM-generated Content: Using LLM-generated content without proper verification and oversight
LLM07:2023 - Inadequate AI Alignment: Inducing behaviour that does not align with the LLM's intended use case
LLM08:2023 - Insufficient Access Controls: Circumvention of access controls allowing unauthorized users to interact with the LLM
LLM09:2023 - Improper Error Handling: Reveal sensitive information in exposed error or debugging output
LLM10:2023 - Training Data Poisoning: Manipulating training data or other configuration of an LLM's training to introduce backdoors or other bugs
Let's review the most novel items in more detail to understand the risks most specifically associated with LLM adoption:
LLM01:2023 - Prompt Injections
In a Prompt Injection attack, an attacker constructs an input to manipulate the LLM model into bypassing filters or instructions to perform unintended or unauthorized actions. For instance, an attacker might try to "convince" the LLM to divulge sensitive data with specially crafted misleading natural language or use malformed language that can compromise the LLM's intended functions or limitations.
Defending against Prompt Injection requires implementing strict input sanitization to prevent malicious incoming data from being processed, employing context-aware input filtering to prevent the LLM subsystem from processing requests that violate the system's intended use, and proper output encoding to prevent an LLM from returning a response that could initiate a client-side attack. Overall, LLM01:2023 is similar to SQL Injection and XSS attacks but fundamentally applies to the internal protocols and functions of LMM software architecture.
LLM06:2023 - Overreliance on LLM-generated Content
Anyone who has spent enough time submitting seriously complex questions to an LLM AI will have first-hand experience with their imperfections. Information provided by ChatGPT can range anywhere from surprisingly insightful to wildly inaccurate with serious security consequences. To be fair, LLMs are not the only unreliable source of information online. The Internet is inherently an untrustworthy place where users navigate everything from spuriously placed malicious content, and social engineering attacks to misinformation and amateur speculation. In the same way as untrustworthy Internet content, LLM-generated content has the potential to introduce cybersecurity vulnerabilities into an organization, especially via generated content such as software code.
Mitigating over-reliance on LLM-generated content requires policies and oversight to ensure that LLM-generated content is verified by professional human-based review prior to implementation, and that alternative sources of information are consulted prior to making decisions or accepting LLM responses as fact. These policies and oversight should clearly communicate the limitations and risks of LLM-generated content and provide standard-operating procedures to ensure the use of appropriate skepticism and review.
LLM07:2023 - Inadequate AI Alignment
An LLM’s objectives and behaviour must align with the developer's intended use case. Failure to properly scope or implement an LLM's capabilities could lead to undesired consequences or vulnerabilities and organizations may even incur penalties for breaching national or regional laws. It's important to consider the regulations that apply in all locations an LLM is made available and ensure that LLMs confirm with legal requirements to avoid penalty. One example is an AI chatbot that may be programmed to increase user engagement by suggesting controversial topics, making inflammatory statements, or using discriminatory language.
To mitigate against Inadequate AI Alignment it's important to ensure that training data and reward mechanisms are appropriate for the LLM's intended use cases and applicable restrictions. Monitoring and feedback can also be used to detect non-conforming behaviour and automated testing of an LLM can help to uncover cases where a trained LLM AI model responds outside of its intended scope.
LLM10:2023 - Training Data Poisoning
If an attacker is able to supply malicious AI training data or training configuration used to train an LLM they can potentially introduce vulnerabilities. The resulting compromised LLM may have backdoors or biases known to the attackers that could impact the LLM's security in a number of ways. For example, Training Data Poisoning attacks could be perpetrated by an insider of the organization, or via trojanized publicly available training intended to induce vulnerabilities when used.
Protecting against this vulnerability requires scrutinizing the source of training data and verifying its integrity before it is used. Data sanitization techniques should be used to remove potential vulnerabilities or biases from data before it is used to train LLM models. Also, regular monitoring and automated testing and alerting can be employed to detect unusual behaviour or performance issues in a trained LLM allowing defenders to respond.
As LLMs continue to gain traction in various industries, understanding and mitigating the security challenges they pose is critical. The OWASP Top Ten Large Language Models (LLM) security risks are a good starting point for both developers and organizations utilizing LLMs like ChatGPT and developing and implementing their own LLM applications. While some of the vulnerabilities on the list mirror already well-known vulnerabilities that impact traditional IT infrastructure and web applications, there are also novel security concerns to consider.
The top novel vulnerabilities highlighted by OWASP include Prompt Injections (LLM01:2023), where attackers manipulate LLMs to bypass filters or perform unauthorized actions; Overreliance on LLM-generated Content (LLM06:2023), which warns against trusting LLM output without verification due to its potential inaccuracies; Inadequate AI Alignment (LLM07:2023), emphasizing the importance of aligning LLMs with the developer's intended use case and legal requirements; and Training Data Poisoning (LLM10:2023), where the integrity of the training data is compromised, potentially leading to backdoors or biases.
Mitigation strategies involve stringent input sanitization, human verification of LLM-generated content, ensuring alignment with intended use cases and legal compliance, and carefully scrutinizing and sanitizing training data. By understanding and addressing these vulnerabilities, organizations can leverage the benefits of LLMs while minimizing security risks.
Ready to keep up-to-date on continuous cybersecurity industry updates like the ones we've covered today? Sign up to our newsletter for more free, zero-obligation content and resources.
Sign up for our newsletter
Get the latest blog posts in your inbox biweekly!