• Home
  • /Learn
  • /Using LLMs Like ChatGPT To Support OSINT Campaigns
background image

Blog

Using LLMs Like ChatGPT To Support OSINT Campaigns

certification

OSINT stands for Open Source Intelligence and refers to the collection and analysis of information that is gathered from public, freely available sources to be used in an intelligence context. The term "open" refers to the fact that the data is harvested from publicly accessible sources (as opposed to covert or clandestine sources). OSINT is used for a range of activities, from national security to private sector risk management, individual security, and penetration testing. Key sources of OSINT include search engines, social media platforms, websites, online forums, job listings, public records, and databases and this data can be used for a range of attacks from identity and intellectual property theft, to cyberattacks if sensitive data such as cryptographic keys or usernames and passwords are exposed publicly.

Penetration testers also collect and scrutinize publically available technical data from network tools such as IP addresses and domains, APIs, and source code from repositories like GitHub which may inadvertently expose secrets or active authentication credentials or otherwise allow pentesters to identify and exploit vulnerabilities. Additionally, regulatory filings, patent details, and white papers provide insights into a company’s structure, and culture which can be used to craft more effective social engineering attacks.

Collecting OSINT is an important security activity on many levels. Read on to find out some tricks on how Large Language Model (LLM) technologies such as ChatGPT and other open source LLM models can be used to support and enhance OSINT collection operations. 

Using ChatGTP For OSINT

Generative AI, particularly through advancements in Natural Language Processing (NLP) technologies like Generative Pre-trained Transformers (GPT), are well suited for enhancing OSINT collection. On the most fundamental level LLMs can analyze data in many forms, analyze it contextually for meaning and automatically produce complex reports in sophisticated human-readable formats. This enables rapid synthesizing and interpreting vast amounts of data collected during an OSINT collection process. But LLMs can not only speed up the OSINT process but also increase the depth and breadth of intelligence gathering.

Here are some key strategies for using LLMs to conduct OSINT data collection:

Automating Simple OSINT Tasks 

LLMs can be seamlessly integrated into systems to automate various manual OSINT tasks, enhancing efficiency and allowing analysts to focus on more complex analysis and interpreting the findings. Organizations can use LLMs to gather large volumes of data quickly, and automatically extract data, classify findings, and automatically identify high value content such as sensitive information. This drastically reduces the time and effort spent on routine data collection. 

Another common task is continuously monitoring specific keywords or phrases across social media platforms and news outlets to track brand mentions or emerging threats. These tasks, once performed manually through regular searches and constant monitoring, can now be handled by LLMs programmed to continuously scan, detect, and report findings in real-time, thus streamlining the initial stages of data collection in OSINT operations.

Using LLM To Generate Google Dorks And OSINT Tool Commands

LLMs can be highly effective in assisting with the generation of complex Google queries (known as Google dorks, or Google dorking) and commands for various open-source OSINT tools. Google dorks, specialized search queries that use advanced search operators, allow researchers to find specific information about a target that is indexed by search engines. LLMs can help construct these queries by understanding and incorporating the appropriate syntax and operators needed for detailed searches.

Here are some examples of LLM prompts and their results to support learning Google dorking:

Prompt: "Write a Google dork to find public documents that might unintentionally contain sensitive information on the example.com domain"

Result: site:example.com intitle:"index of" (doc | xls | pdf)

Prompt: "Write a Google dork to find email addresses associated with the domain example.com"

Result: site:example.com intext:@"example.com"

Prompt: "Write a Google dork to find subdomains associated with the main domain.com site"

Result: site:*.example.com -www

LLMs can also generate commands and usage examples for various open-source OSINT tools. Here are some tools and how LLMs might assist:

  • TheHarvester: TheHarvester is used for gathering email accounts, subdomain names, hosts, and employee names from different public sources (like search engines). LLMs can help craft specific command-line arguments based on the target and the information sought

  • Maltego: Maltego is an interactive data mining tool that graphs relationships. LLMs can suggest transforms to apply based on the type of investigation (e.g., finding relationships between found entities)

  • Shodan: Shodan is a search engine for finding specific types of computers connected to the internet. LLMs can assist in formulating search queries to identify vulnerable systems within a target's infrastructure

  • Recon-ng: Recon-ng is a full-featured OSINT collection and pentesting reconnaissance framework. LLMs could help by generating scripts or commands to automate data collection from various modules

Local LLM Applications Keep OSINT Collection Private

Using a locally installed LLM such as GPT-J 6B, GPT-neox, Bloom, or Fairseq (including models like BART and RoBERTa) offers significant privacy advantages for OSINT collection compared to using web-based applications provided by LLM services such as OpenAI's ChatGPT. When OSINT tasks require handling sensitive information or operating under strict confidentiality, local LLM installations become particularly beneficial.

For OSINT, where the sensitivity of information gathering is paramount, keeping the search process confined to a private network enhances security. By utilizing a local LLM, organizations retain full control over their data storage and management. There is no need to rely on external providers' assurances regarding data security practices. This control is crucial in environments where data governance and compliance with regulatory standards are required. A local LLM installation also ensures that no external entity has access to logs or the details of intelligence campaigns, thus protecting operational secrets and strategies.

OpenAI's ChatGPT-4 Can Monitor The Internet In Real-Time

One of the significant enhancements in ChatGPT-4 compared to its predecessors, like ChatGPT-3.5, is its ability to integrate real-time internet browsing. Instead of relying solely on pre-trained knowledge, ChatGPT-4 can fetch and incorporate the latest data from the web. Especially when used with the OpenAI API, ChatGPT-4 can become a powerful tool for real-time intelligence monitoring and collection for organizations.

Real-time continuous OSINT monitoring provides benefits above and beyond a single point-in-time test.  Real-time monitoring provides a true assessment of the current information available online and is crucial for OSINT tasks where timeliness can be as critical as the information itself. This is particularly beneficial in scenarios like monitoring emerging threats, tracking ongoing incidents, or implementing a continuous monitoring program to prevent leakage of sensitive proprietary information.  ChatGPT-4 can even be used to monitor dark web chat forums where underground cybercriminals trade stolen data and plan attacks. 

Conclusion

ChatGPT and similar LLMs represent transformative tools for many OSINT operations. These models facilitate a deeper, more efficient collection and analysis of data available from public sources. They can be employed from the initial data-gathering phase to deep analysis, enhancing both the speed and quality of intelligence.

For penetration testing, LLMs can aid in crafting precise Google dorks to uncover leaked data and potential vulnerabilities within a domain. Also, LLMs can support OSINT efforts by generating commands or filters for tools like TheHarvester, Shodan, and other tools. With capabilities like real-time internet monitoring in ChatGPT-4, OSINT practitioners can access publicly accessible information in real-time. Finally, local installations of LLMs like GPT-J and Bloom can further secure OSINT activities by keeping sensitive search operations confined to private networks, thus ensuring data privacy and control

Overall, leveraging LLM across all stages of an OSINT campaign streamlines processes and significantly increases the operational efficacy of intelligence gathering and threat analysis, ensuring robust security measures are in place and up-to-date.

Ready to learn more? Reach out to our team today.

Download our Free Buyer's Guide

Whether you are looking to complete Penetration Testing to manage risk, protect your data, comply with regulatory compliance standards or as a requirement for cyber insurance, selecting the right company is crucial.

Download our buyer’s guide to learn everything you need to know to successfully plan, scope and execute your penetration testing projects.