This new hacking method is mind-blowing – Akami DNS data exfiltration
An in-depth look at data exfiltration
Every great heist has two stages: getting in and getting out. Whether we're talking about pilfering diamonds from a vault, stealing stacks of cash from a casino, or exfiltrating thousands of credit card numbers from a corporate data server, criminals spend just as much time and effort thinking about how to get away with the goods as they do about getting them in the first place.
When we think about cybersecurity, we often solely focus on preventing attackers getting in. This is important – but it's not the be-all and end-all of a comprehensive security strategy. Human errors happen, bad configurations pass without scrutiny, and zero-day exploits blow open defenses previously considered impregnable.
Your security strategy needs to account for what happens when a breach occurs. The methods of transmitting stolen data back out of a "hostile" network are frequently overlooked, but understanding and detecting these covert exits are just as vital as thwarting initial breaches. While it's impossible to predict every conceivable threat, it's critical to equip your organization with the means to swiftly detect and respond to suspicious activity.
Keep reading, and I'll go over the intricacies of data exfiltration, exploring both conventional methods and innovative approaches from the perspective of a hacker looking to steal data from a corporate network. Then, I'll cover a new method of DNS exfiltration – one that relies entirely on bouncing data off of publicly available web servers the attacker does not own.
Why exfiltration?
Black-hat hackers are mainly economically driven. Although the thrill of owning a website gives a black hat some cred in the underworld, most of the activities you see carried out (Cryptoransom, building botnets, developing zero-day exploits) are done to make money.
Stealing data isn't the easiest way for a hacker to cash in, but it can be hugely profitable. While estimates on the total value of the dark web are hard to come by, data gathered by PrivacyAffairs suggests that hacked social media accounts can range from $20 to $60 per person, while fresh sets of personal data command a much higher price.
Why? Because data is valuable. Just how valuable depends on the quality and quantity you have access to. There are dark web back-channels where hackers can sell everything from email and password pairs to credit card details, ranging in scale from thousands of records to millions. It's a huge economy, but how well you get paid depends on how much you're able to put on the table.
Are you a pro? Subscribe to our newsletter
Sign up to the TechRadar Pro newsletter to get all the top news, opinion, features and guidance your business needs to succeed!
Let's stay with the heist metaphor. The most basic form of data exfiltration conjures images of Hollywood films like "Mission Impossible" and "Ocean's Eleven". It's the classic quiet approach: gaining physical access to the target computer through a combination of stealth, persuasion, and a touch of luck. Once a hacker is inside the corporate office, it's just a matter of downloading the data onto a USB stick and making a swift exit, maybe even with a playful nod to the security camera while strolling out the front door.
It's a great image for a film, but it's not reflective of actual data exfiltration.
In reality, hackers are averse to unnecessary risks. Penetration testers can afford to be brazen during a physical penetration test, as the worst that can happen is spending a few hours in a security office while their Red Team lead tries to bail them out. For a hacker, the prospect of being apprehended while walking out of the door with millions of dollars worth of pilfered data is a harrowing thought, seeing as it can lead to a lengthy prison sentence. Therefore, the majority of data exfiltration takes place over the Internet, where anonymity and distance provide a layer of insulation from physical detection.
So, let's imagine that you're a hacker on the prowl for data. You might be on a military payroll, attempting to steal info for a nation-state or conducting espionage for a rival corporation – it doesn’t matter. You've spent days running scanners on public web servers, cramming injection attacks through web forms, and brute-forcing admin login pages. Then (eventually), you strike it lucky and discover a vulnerability in an enterprise's web-facing servers. This gives you the ability to spawn a shell on a machine inside their network. You get to work, using further exploits to escalate your privileges and establish a foothold deeper into their network.
After a few sleepless nights pivoting from machine to machine through the network, the next lateral maneuver you pull off lands you with paydirt. You stumble upon a database brimming with heaps of personally identifiable information – usernames, passwords, addresses, credit card details – essentially, a black hat hacker's jackpot.
In fact, there's way too much. The database is massive. It's several hundred gigabytes. You're not going to be able to note it all down with a pen and paper. Up until this point, you've skillfully evaded intrusion detection systems and vigilant system admins, but if you start uploading the database outside of the network you’re going to start tripping off alarms due to behavior monitoring services on a corporate firewall.
This is where the real challenge lies for a hacker.
An Intrusion Detection System, or "IDS", detects and prevents data exfiltration attempts within a network. It monitors network traffic and system activities for suspicious patterns that may indicate unauthorized attempts to extract sensitive data. To detect data exfiltration, an IDS employs various methods, including signature-based detection and anomaly detection.
Signature-based detection compares network traffic against a database of known attack signatures or patterns associated with data exfiltration techniques. Anomaly detection, on the other hand, identifies deviations from normal network behavior, like unusual data transfer volumes or unexpected connections to external servers.
When an IDS detects suspicious activity indicative of data exfiltration, it triggers alerts or alarms to notify security personnel, prompting them to investigate and respond to the potential breach. Depending on the configuration, an IDS may also take automated actions, such as blocking suspicious traffic or quarantining compromised systems to prevent further data loss.
Back to the hacking. You've done your homework. From reading some job listings for this corporation, you know that the network most likely has a fully-featured Cisco IDS in place. You know you've got your work cut out for you if you’re going to make off with a significant amount of the data you have access to without a network admin detecting your intrusion. It's time to use the bag of tricks you have up your sleeve to exfiltrate the data.
Data exfiltration and encryption
You don't know for certain, but you're pretty sure there's going to be signature-based rules in place on the database host. If you try to send packets that contain Personally Identifiable Information (PFI) you've accessed from the database, such as credit card numbers, or even the raw SQL file itself, you're going to trip a silent alarm and set the clock ticking. Remember, your goal is to get away with as much data as possible, not to perform the exfiltration as fast as possible. Real hackers are patient. They're happy to sit on a host for months slowly drawing data out.
The first thing you can do to exfiltrate data more silently is to disguise it by sending it outside of the network in a different format. Base64 is often used to do this, but it has a bunch of other advantages for network exfiltration I'll get into later.
Base64 encoding is a method used to represent binary data in an ASCII format, making it suitable for transmission over text-based protocols, such as email or HTTP. It works by converting binary data into a string of printable ASCII characters. The process begins by dividing the binary data into chunks of 6 bits each. Each 6-bit chunk is then mapped to a corresponding printable ASCII character from the Base64 alphabet, which consists of 64 characters.
These characters typically include uppercase and lowercase letters, digits, and two additional characters ('+' and '/'). The mapped characters are concatenated to form a string of ASCII characters. If the length of the original binary data is not a multiple of 3 bytes, padding characters ('=') are added at the end to ensure that the length of the Base64-encoded string is a multiple of 4 characters. To decode a Base64-encoded string back to its original binary form, the process is simply reversed, mapping each character in the Base64 string back to its corresponding 6-bit binary value and concatenating the binary values to form the original binary data.
By converting the data you're trying to exfiltrate into a condensed string of alphanumeric characters, it evades some of the simplistic pattern matching set up on an endpoint firewall designed to match against strings that look like credit card numbers.
However, this method is not foolproof. Although it takes extra processing power to do so, entropy-based IDS can identify Base64 traffic, and once identified, decryption filters can easily reverse the process. Although it makes detection more difficult, it's not encryption.
Exfiltration and filtering
Alright, you figure that it's a lot of effort to start breaking down the data you want to exfiltrate into another format just to send it out of the network. Maybe you've been working on this attack for weeks and you're getting a little bit lazy. Why not build an SSH tunnel out of the network? Encrypted communication protocols are designed for exactly that problem, after all.
There are two reasons why this isn't going to work.
A sufficiently well-tuned corporate firewall will block outbound encrypted protocols. Perhaps it's frustrating for employees, but no matter which encrypted protocol you try, they'll all be dropped at the firewall level.
Large enterprises usually have the majority of their unused ports locked down, aside from the select few they need for daily business operations. A Just-in-time access policy could be in place to occasionally allow employees to send outbound traffic through the firewall, but only for particular accounts, only for a small amount of time, and only after being authorized by a network admin.
Head on over to our best VPN rankings to see which services we think are the cream of the crop when it comes to boosting your digital privacy.
This means no SSH tunnel for you! Or SFTP, or RDP, or VPN, or HTTPS. Wait – HTTPS? Everyone has to be able to browse the web, and maybe that's the answer. You could build a tunnel over SSL. It'd provide all the encryption you need, and you can run any protocol you want over it, albeit with a slight hit to bandwidth and connectivity.
With that in mind, it's time to boot up Stunnel and start exfiltrating that data – and deal with the second problem.
Transport Layer Security, better known as "TLS" and mistakenly referred to as "SSL", is the cryptographic mechanism designed to ensure two parties can communicate without eavesdropping. Unfortunately, it's not foolproof.
Advanced IDS can still raise flags about the amount of traffic being sent through an encrypted tunnel, particularly when connecting to unfamiliar IP addresses with self-signed SSL certificates. IDS often uses reputation-based metrics that analyze DNS records, IP addresses, and domain names to evaluate if a network connection is suspicious.
Worse still, an IDS may employ man-in-the-middle attacks to intercept and inspect TLS-encrypted packets, decrypting them inside the network for Deep Packet Inspection before re-encrypting them and sending them to the intended recipient (or in our case, reading the encapsulated packet headers, dropping the packets and sending up an alert). This renders protocol-based encryption alone insufficient for evading detection.
Still, even if using TLS encryption isn't going to cut it, the idea of using HTTPS to get through the firewall is a solid one – after all, it's pretty much the only way you'll be able to send data out of the network.
At this point, I should mention that a lot of hackers use cloud services to upload data. While a suspicious domain registered a week ago might flag up on an IDS, it's very unlikely an organization is treating uploads to Mega, Google Drive, or OneDrive as suspicious.
Even if an IDS can commit an MITM attack to read the contents of the upload, it's useless if the file has been encrypted locally. Some hackers even use Discord (which, yes, does run over port 443) to upload scrambled text which can be decrypted afterward into the original file. I've even seen instances of hackers doing this over Facebook, Twitter, and Instagram. Any service that allows you to upload data is a potential target.
Still, let's say that you're dealing with a paranoid nightmare of a company. All known file upload sites are blocked and you can't send encrypted files through the firewall. If it can't be read, it's not getting sent out. Encryption isn't good enough anymore: you decide you need to obfuscate the data if you want to sneak it under the network admin’s nose.
Data exfiltration and obfuscation
Obfuscation refers to the deliberate act of disguising network traffic to evade detection by security mechanisms. Encryption is a type of obfuscation, but there are techniques that can identify when a data stream is encrypted by looking at file headers.
You might be familiar with obfuscation if you've ever used a VPN that offers the ability to bypass the Great Firewall of China or other Deep Packet Inspection-based network filtering techniques. Many secure VPNs work over SSL, stopping the ISP you're using from looking inside the data stream and identifying the headers that mark it as VPN traffic.
Let's think about this in terms of the Hollywood heist once again. You've taken a painting from the back room of an art gallery and you're about to walk out of the front door with it. As you're preparing to leave, one of your associates tells you an alarm has gone off and there's a security guard on the door. Everyone's luggage is being checked, so you need to hide the painting as you walk it out. You could put it inside a lockable suitcase, but that's going to tip off the guard, and if they can't look inside it, they're just going to stop you from leaving until they can check inside.
Instead, you decide to strip the painting out of the frame and put it into a secret compartment in your specially prepared heist-case. You help the guard open your suitcase and search for it but, when they can't find anything odd, they wave you through the exit. You continue on your merry way with your prize.
Yes, you really can hide things in other things! It might seem obvious, but it's far more devious when you hide data in other data. In this case, the lockable suitcase is encryption and the hidden compartment suitcase is obfuscation. In the examples I've talked about so far, TLS is encryption, whereas Base64 is obfuscation.
The best example of obfuscation is the age-old art of image steganography – a method of concealing data within innocuous images. It involves embedding data within existing images, creating a slightly altered version that appears unchanged to the naked eye but contains hidden information within its binary code.
While there are various methods for implementing image steganography, the specifics are not crucial for our discussion. What's important to grasp is that this method serves as a means of obfuscation rather than encryption. By replacing the least important data in an image with the data we want to exfiltrate, it becomes significantly more challenging for an IDS to accurately identify it as a data exfiltration attempt.
However, image steganography does have its limitations, primarily stemming from its relatively low bandwidth. While it may suffice for hiding small pieces of information like private keys or passwords within an image, transmitting a several-gigabyte database in this manner would necessitate an impractical amount of dummy images.
Eventually, the sheer volume of data being transmitted would raise suspicion. You would question why someone on the network is sending countless JPEGs of sunflowers outside of a corporate firewall. On the other hand, sending a few family photos over Skype probably wouldn't raise any suspicion, even if an administrator's password was embedded in the data.
The true advantage of image steganography lies in its complexity and variability – there's no singular method for embedding data into an image, making it incredibly challenging to detect without access to the original file for comparison. Detection often boils down to statistical analysis, essentially informed guesswork on the part of security systems.
While image steganography may be suitable for advanced persistent threat groups seeking to quietly exfiltrate select pieces of data over time, it's less suited to rapid, large-scale data breaches. You need to figure out another method that obfuscates data over HTTP without tipping off the monitoring systems if you want to get all this data out.
Data exfiltration and DNS
This is where it all comes together. DNS forms the backbone of the Internet, and hosts inside the network need to be able to make DNS requests to external servers to resolve domain names to IP addresses.
There’s no way around this – it’s just how the Internet works. When a domain request is made by an internal DNS resolver, it forwards the request to the DNS server that owns the authoritative records for that zone.
Check out our guide to DNS and how it works for an in-depth explanation of the process and how it impacts our digital privacy.
You figure that this is the right way out. You can’t make your own DNS requests directly from any of the hosts you’ve compromised to a server you own because the firewall drops them. However, you can send them to the network’s local DNS server which processes them and resolves the request for you. It’s a little slower, because the attacker’s exfiltration domain takes up more of the DNS request, but it’s better than nothing.
You register a couple of new domains and set up some DNS servers that can be monitored for incoming DNS requests. You write a quick script that encrypts the data you want to exfiltrate using a symmetric key, converts it into Base64, so the characters conform to what a DNS request expects, and chunk it down into fragments that fit inside the maximum size for a URL request alongside some ordering metadata so the exfiltrated can be reconstructed from the DNS log files. Time to kick back and wait.
Unfortunately for you, the DNS requests aren't being processed. Yes, they're filtered out because they're asking for the resolution of a new domain that doesn’t pass the reputation-based filters the Firewall relies upon. Time to give up and go home? Not quite. Enter Data Bouncing.
Data bouncing
Let’s look back on the criteria we need for successful data exfiltration:
- We need a route that allows us to send encrypted and obfuscated data, over a protocol that most organizations have open 24/7, and don't pay much attention to because of the volume of traffic that would normally pass through it.
- We also need our exfiltration target to be a host that the organization already trusts.
Data Bouncing meets both of these criteria. It's a technique for data exfiltration that uses external, trusted web hosts to carry out DNS resolution for you. Credit for the initial research on this technique goes to John Carroll and Dave Mound, as well as Proof of Concept codebases by Nick Dunn and @IAmJakoby.
Without getting too much into how HTTP works, every HTTP request a browser asks for a resource from a web server and provides some meta-data in the request's headers to facilitate this request. This includes stuff like whether you're requesting the desktop or mobile version of a site, which language you expect the response to be in, and other more technical aspects of HTTP.
One of these header fields is the "Host" field, which specifies which domain is being requested if a single IP address hosts multiple websites. Put simply, if "example.com" resolved to an IP that also hosted "example2.com", sending a request to "example.com" with "example2.com" in the Host field would return the response for "example2.com". You can forge a HTTP header to contain whatever data you want, but if you try and request a domain the IP doesn’t own it'll return an error message.
But here's the mindblowing part. Akami Ghost HTTP servers are configured to send a DNS request to resolve the domain you've requested, even if it's outside of their network. Let that sink in for a second. You can send a HTTP request to a completely trusted domain, such as "bbc.co.uk", with a Host header for "encryptedfilechunk.attackerdomain.com". The trusted domain carries out the DNS resolution for you.
It gets better (or worse, depending on your perspective). It's not just the Host header, there's a bunch of different HTTP headers that can be used to force DNS resolution by a third party, speeding up the data exfiltration as you can send multiple DNS requests per single HTTP request sent out. It varies by provider, but needless to say, it's a problem.
Several avenues open up once you consider how this technique works. Once you consider how this technique works, there’s several other avenues that open up. Some social media sites and collaborative tools do this thing where they resolve a URL typed into a submission field to give you a link preview – it's another DNS resolution being tunneled through the network on behalf of a third-party tool.
You can also do this with URL lookup tools like "isitdownrightnow.com", as well as email signup forms that look up a domain for validation.
Akami may not be the only CDN vulnerable to this attack but, for now, it’s one of the biggest CDNs out there. Thousands of high-profile domains are vulnerable to this attack. Defending against it is quite difficult, as it requires extra profiling of every HTTP request sent for reputation-based scanning of domains inside HTTP headers, pulling a bunch of extra CPU power out of already overworked firewalls.
Utilizing this new exfiltration method, you chop up all the data as you intended to previously and begin sending it out stuffed into HTTP requests sent out to a myriad of popular sites, blending in with other users' traffic on the network.
Even better, because of the upgrade in speed from this new technique, what would have taken you several years to fully exfiltrate now only takes you a matter of months. You've gotten away with your data heist, and a few months later an incredibly overworked system admin wakes up to an angry phone call from their line manager, asking them to turn on the TV.
That data has been sold, sold again, and disseminated amongst various shady hacker sites, and eventually, the public has been made aware that they’re the victim of another identity theft campaign enabled by poor corporate security. Credit scores have been ruined, bank accounts emptied, jobs lost, embarrassing photos sent out, and reputations ruined. They stick on the kettle. Figuring out how you pulled this off is going to take a while.
The bottom line
Tackling data exfiltration in detail is tricky, because a well-thought-out security strategy makes it incredibly difficult for a hacker to actually achieve their goal.
Getting into a system is only half the mission – it's making out with the goods that really seals the deal.
Using a multi-layered network defense strategy forces a hacker to slow down and rely on increasingly more difficult techniques to carry out their objective. This, in turn, gives your network security team extra time to catch them in the act and prevent a worse incident. It's better not to have an intrusion in the first place, obviously but catching an intruder two weeks into a 1KB/s exfiltration attempt is better than waking up and finding out your entire customer database has been stolen overnight, right?
Sam Dawson is a cybersecurity expert who has over four years of experience reviewing security-related software products. He focuses his writing on VPNs and security, previously writing for ProPrivacy before freelancing for Future PLC's brands, including TechRadar. Between running a penetration testing company and finishing a PhD focusing on speculative execution attacks at the University of Kent, he still somehow finds the time to keep an eye on how technology is impacting current affairs.