Trending

Audio-Based Keylogging Attacks

Authored By Packetlabs

Last Updated June 10, 2026|Published September 13, 2023

Did you know? Side Channel attacks are a class of threats that don't target software or algorithmic weaknesses directly. Recent research has demonstrated a highly effective and novel audio-based keylogging attack using sampled audio.

What does this mean for your organization's cybersecurity efforts? Let's explore:

An Overview of Cybersecurity Attack Surfaces (and Side Channel Attacks)

Attack surface refers to the sum of all potential vulnerability points within a system, application, or network that could be exploited by malicious actors. The most common subcategories of attack surface considered by IT defenders are network attack surface and system attack surface.

Network attack surface refers to active services on endpoints accessed from other devices on the same network. In contrast, the term system attack surface refers to applications installed on devices that require user access to attack. Traditional vulnerability management dictates that scanning for vulnerabilities within a network attack surface can prevent attackers from gaining initial access by closing security gaps that can be exploited from the outside. While scanning system attack surfaces can also help to prevent initial access, it is also associated with "defence in depth" - providing a layered approach to security that prevents attackers who have gained initial access from extending the reach of their compromise.

However, there is another type of attack surface that is much less considered in traditional vulnerability management and rightfully deserves its own classification altogether. Side Channel attack surface deserves attention as much research has been published about various Side Channel attacks that can compromise even the most highly defended devices and circumvent the most stringent security controls.

This article will discuss Side Channel attacks and how they represent a different type of attack surface that traditional IT security does not often directly address. Furthermore, this article will explain one of several recent Side Channel attacks disclosed by security researchers: the audio-based keylogging attack.

What Are Side Channel Attacks?

For a simple analogy, imagine if you could determine the content of a letter not by opening it, but by observing how long it took someone to write it, or by the sounds of the pen strokes. In a similar way, Side Channel attacks gather indirect information to deduce sensitive data.

More technically, Side Channel attacks are a class of threats that don't target software or algorithmic weaknesses directly, such as encryption-based attacks, and protocol based attacks). Instead, these attacks target other physical phenomena of computational processes, seeking to exploit other forms of information emission to deduce secrets. In other words, Side Channel attacks rely on information gathered from the physical implementation of a system, rather than weaknesses in the implemented algorithm itself.

Side channel attacks may seek to monitor an unaltered system for information leaks or explicitly attack a system to adjust its functionality to create new non-standard methods of outputting information from the victim. Using non-standard output methods may allow an attacker to exfiltrate information from a compromised system that has extensive security controls in place to prevent data loss. For example, researchers have found that an attacker can exfiltrate data from an air-gapped computer without standard output forms such as USB ports by triggering a screen flicker that can be recorded by a smartphone camera and converted into binary data.

Side Channel attacks may include analyzing patterns of power consumption, electromagnetic emissions, video flicker, time taken for computations, and acoustic sounds (among other techniques.) These information emissions, often considered extraneous or overlooked, can sometimes be exploited to extract sensitive information like cryptographic keys, passwords, or other private data.

Side Channel attacks require specialized equipment and expertise, and countermeasures have been developed to guard against many of them. However, the ongoing evolution of technology and the creativity of attackers mean that the landscape of Side Channel attacks is continually shifting. As a result, understanding and guarding against these threats is crucial for maintaining robust cybersecurity in high-security contexts.

Broad Classifications of Side Channel Attacks

Simple Power Analysis (SPA): This involves directly interpreting power consumption patterns to understand what operations a device is performing. For instance, a sudden spike in power might indicate a CPU is beginning a cryptographic operation.
Differential Power Analysis (DPA): Rather than looking at absolute power values, DPA examines how power consumption changes over time or across multiple operations, aiming to find patterns or correlations.
Electromagnetic Attacks (EMA): These are similar to power analysis attacks but focus on picking up and interpreting electromagnetic emissions from a device.
Timing Attacks: By measuring the amount of time a system takes to perform certain operations, attackers can sometimes deduce valuable information. For instance, if a system checks passwords one character at a time and stops checking upon finding a wrong character, an attacker might be able to deduce the correct password one character at a time by observing how long the system takes to reject each guess.
Acoustic Attacks: In some cases, the sounds produced by a device can provide clues about its operations. For example, different CPU operations might produce subtly different noises.
Optical Attacks: These attacks rely on visual information, such as the blinking of LEDs on a device or even, in some sophisticated cases, the minute vibrations of components under certain operations.
Cache Attacks: This type of attack targets the cache of a CPU, exploiting patterns in how data is fetched and stored to glean information about the operations being performed.

A Novel Audio-Based Attack For Stealing Keystrokes

Researchers from British universities developed a deep learning model capable of deciphering data from keyboard keystrokes via audio recordings, achieving an impressive accuracy rate of 95%. The method's accuracy dips slightly to 93% when the sounds are recorded through Zoom, but this is still alarmingly effective.

Researchers transformed keyboard keystroke recordings into waveforms and spectrograms, highlighting unique differences for each key. After enhancing these signals to identify keystrokes better, they used the spectrogram images to train 'CoAtNet,' an image classifier. To optimize prediction accuracy, they adjusted parameters like epoch, learning rate, and data splitting. This method led to an impressive 95% accuracy from smartphone recordings, 93% from Zoom, and a slightly lower 91.7% from Skype.

Key technical details:

Research Objective: Extract sensitive information like passwords, messages, or other confidential data from the sounds of keyboard keystrokes.
Deep Learning Model: The heart of the attack is a trained sound classification algorithm that can interpret the distinct sounds of various keystrokes.
Data Collection: For training, sounds of 36 keys from a MacBook Pro were recorded, with each key pressed 25 times.
Data Source: The primary methods for sound capture included direct recording using a nearby microphone, possibly from an infected phone's microphone. Recording during a Zoom call, where a malicious participant correlates the sounds with the typed content.
Threat Amplifiers: The ubiquity of high-quality microphone and rapid advancements in machine learning techniques increase the potency and feasibility of such sound-based Side Channel attacks.

How Can Audio-Based Keylogging Attacks be Exploited?

Since this attack has been proven an effective means for stealing keystrokes, it's worthwhile asking: how could an attacker gain access to this type of sensitive audio for analysis?

Covert recording instruments within a physical premises: by using a planted device within a physical environment, an attacker could record sensitive keystrokes, especially those that must be typed in manually, but may not have to be entered often. This may include master passwords for password managers, screen lock passwords, or other passphrases
Recording the audio in an online meeting: watch out for live-streaming / online meetings when typing in passwords to sensitive accounts. Recording and publishing meetings/livestreams mean they are available to attackers perpetually and could be used retroactively. Periodically changing passwords ensures that accounts are not vulnerable.
MFA one-time codes: Because MFA tokens are time sensitive, they would not be vulnerable from recorded material, but are potentially vulnerable in real time.
Exploits that give an attacker access to a device’s microphone: Malware that gains access to system may be able to covertly monitor audio for keystrokes and exfiltrate that data to attackers for analysis. One consideration with this type of attack is the permission levels on any peripheral devices including microphones and IoT devices that may have vulnerabilities to be exploited.

Potential Mitigation Techniques

Sound dampeners on keyboards or switching to membrane keyboards may not be effective: Physical measures such as sound dampeners or membrane keyboards might not be sufficient to counter this specific acoustic Side Channel attack, as the model can still discern subtle audio cues.
Alter typing styles or use randomized passwords: By changing how you type or using complex, randomized passwords, you can make it more challenging for the deep learning model to accurately decipher keystrokes from audio recordings.
Use software to reproduce keystroke sounds: Employing software that generates artificial keystroke sounds can introduce noise into audio recordings, making it harder for attackers to extract meaningful information.
Implement white noise or software-based keystroke audio filters: White noise or software-based audio filters can obscure the distinctive acoustic patterns of keystrokes, rendering them less susceptible to deciphering by the deep learning model.
Employ biometric authentication where possible: Biometric authentication methods like fingerprint or facial recognition can add an additional layer of security, reducing the need to manually type passwords.
Utilize password managers: Password managers can help prevent the need for manual keyboard input of sensitive data, reducing exposure to acoustic Side Channel attacks by removing the need to type sensitive information into the computer manually.

Conclusion

Side Channel attacks exploit physical phenomena and unintended information leaks from computational processes to steal sensitive data, such as cryptographic keys and passwords. A novel acoustic attack has been discovered, where researchers have developed a deep learning model that can discern keyboard keystrokes from audio recordings with remarkable accuracy, reaching up to 95% precision.

Although many attackers may not currently have this capability, the attack is daunting considering that if a working proof-of-concept is publicly released, audio recordings made in the past could be leveraged in an attack.

Mitigating this acoustic Side Channel threat proves challenging, as physical measures and simple adjustments may not suffice. Strategies such as altering typing styles, using randomized passwords, employing artificial keystroke sounds, implementing audio filters, and embracing biometric authentication and password managers can also reduce the chances of inadvertently exposing passwords through keystrokes by removing the need for manual keyboard input.

Looking for more free cybersecurity news and tips from our ethical hackers? Sign up for our newsletter today to get it all straight to your inbox.