Exploring the Effectiveness of Modern Defensive Measures Against Stack-Based Buffer Overflows

In the last post, we have explored the exploitation of buffer overflows on the stack. We have seen how these attacks are performed and where issues in our code can arise to make them vulnerable. Today we will focus on defensive measures against these attacks. Measures such as ASLR and DEP have been developed to stop these attacks. Today’s questions are:
“How effective are current defensive measures, like ASLR and DEP, in preventing memory manipulation exploits?” as well as “How do different software architectures and operating systems influence the success rate of memory manipulation exploits?”

In the last post I have demonstrated the way in which someone would perform a buffer overflow attack. We have seen what vulnerabilities in an application are exploited in order to achieve the desired outcome… Opening a calculator app! Joking aside, we are aware that an attacker could go further than this and just as easily run a reverse shell, code that runs and allows us to connect and do whatever we desire with the current permissions we have. Letting us go as far as gathering info, but even going as far as removing the entire OS, if we wanted to do so.
I don’t have to mention that we don’t want this to happen, of course. And understanding why these attacks are possible is perhaps the most important remediation one can do. But that will not always suffice. And we can write our code as securely as we want. But where there is a will, there is also a way. An attack will do as they please, regardless.

However, some smart minds figured that they weren’t wanting to sit around making their code absolutely secure all day, and so, defensive measures were created. Various solutions have been created, and all have their own reasons of existing and ways of functioning.
Let’s take a look into these different defensive and preventative measures and see how they function, what they can do, and how well they hold up.

Defensive Measures

There is a good amount of different tools we can use to secure our application against buffer overflow attacks. Let’s look at a few of these, starting with the most common ones. ASLR and DEP.

ASLR:

ASLR is short for Address Space Layout Randomization. As the name suggests, it introduces randomization to the formerly static addresses used in a program. We know that an attack looks into what addresses offer a method for us to exploit a JMP ESP call to run our own code. But this attack depends on the attacker knowing what address is exploitable and making it so that their exploit attacks that specific address.

Now, if that vulnerable address they found was different every time the program ran, it would be nigh impossible for them to run their exploit, because they cannot feasibly predict what the address will be, next time.
And since this information realistically only becomes visible once an overflow has already been caused and the application crashes, they cannot undo that action and modify their code. Thus, their entire attack falls flat. Essentially, the entire attack becomes a guessing game for the attacker.

One can increase the security granted by ASLR, by either increasing the amount of memory dedicated to randomization, or decreasing the amount of time between which randomization occurs. We can call these factors the Entropy of ASLR. Effectively balancing these so that the most entropy is created through keeping the search area and time period as small as possible without compromising functionality is key.

ASLR has found its implementation into every mainstream operating system, throughout the years. Having first been implemented by default in OpenBSD 3.4 back in 2003, followed by Linux in 2005. After this, many more operating systems had ASLR implemented as well. Android (and possibly other OSes) deviates somewhat from this, in that it instead employs a process called Library Load Order Randomization. This instead randomizes the order in which libraries are loaded. This does, however, offer a negligible amount of added entropy, although this can deviate on a case by case basis.

DEP:

Another implementation of defensive measures would be DEP (Data Execution Prevention). DEP is an implementation of the Executable-Space Protection concept. This is, however, more commonly known as DEP, as this is the name coined by Microsoft for their Windows operating systems. However, different operating systems have an implementation under different names, most commonly just referred to as Executable-Space Protection.

The way DEP and similar implementations protect the system from being exploited, is by enabling the possibility to mark specific memory regions as data-only in the stack or heap, meaning no executable code can be written or executed from these regions. And if one were to attempt execution of code in these regions, an exception would be thrown, forcing the application to crash in most instances.
Because no code can be ran from these memory regions, the amount of addresses in it able to be exploited becomes significantly lower.

This is not a complete solution, of course, and it has its downsides as well. Microsoft themselves recommend that this functionality be combined with ASLR to provide an even more secure application and environment. In this way, the best way to look at DEP and similar methods, is to have them add to the existing security and have an added layer of security.
Nowadays, these features are enabled by default on the OS, and are created to run on system level.
Even hardware that does not have support for an NX-Bit (No Execution) on the CPU can receive the functionality through emulation of the feature. Although this is largely only seen and necessary on older hardware.

Noteworthy mentions

Stack Canaries are implementations within the code that check for integrity of data in the memory. There will be an offset of data known by the application between the buffer and control data of the stack that will be used to monitor for overflows.
If the data in this offset is checked and deviates from what it should have, an alert will be thrown regarding this and the software will handle it.

Bounds checking is a method that makes use of active checking of the size of an array. The known size the array should be, for example, is stored as information. If a check determines that the value of the array is either lower or higher than it should be, an error is thrown or a solution is created to remediate the issue and prevent an exploit.
Languages such as Java and Python have built-in implementation of this, where an error is thrown when the bounds are incorrect. An example of this in Python can be seen below:

my_list = [1, 2, 3]
print(my_list[5])  # Raises IndexError: list index out of range

Effectiveness of Defenses

We have taken a look at how these security measures work in broad terms. These protections have been with us on our devices for a good amount of years, now. The question of import however, is whether or not they work well. These protections all have their own strengths, but they are not completely secure on their own. They have their weaknesses as well. Let’s take a look at these strengths and weaknesses.

Strengths and weaknesses

ASLR is one of the more early-developed solutions to the problem of buffer overflows and other memory corruption attacks. But the implementation and workings hold strong, regardless. The randomization of data addresses helps a lot to decrease the chance of an attack succeeding.
So well, in fact, that if we were to systematically brute-force the application with the same address space every single time on a 64-bit system, the amount of attempts necessary would be… Nigh impossible.
The amount of attempts necessary would be so insanely huge that my trustworthy TI-84 calculator threw an overflow error (ironically) because it could not handle the size of the number of attempts it would output.
However, this all assumes that the implementation of ASLR works well enough to make the entropy as large as it is.
A paper on empirical analysis from 2024 on major operating systems, including Windows, MacOS and Linux, has shown that the robustness between systems vastly differs. While Linux has a very solid implementation, creating a robust solution for randomness, systems like Windows and MacOS fall flat when trying to properly randomize important areas like executable code and libraries, among other things.

The most important assumption here, is that the implementation of ASLR is actually as robust as possible. After all, there are various methods of defeating ASLR implementations that exist. Entropy in randomness can be reduced by methods such as heap spraying and information leaks.

Heap spraying involves the process of filling the heap (the portion of memory used for dynamic memory allocation) with a predictable pattern of malicious payloads. This will, in turn, increase the chance of the program landing on one of these exploits during an attack. An attacker can utilize a NOP-Slide to point the execution of code towards their own attack. And similarly, if the attacker wishes to make the program run /bin/sh, one can prefix it with a multitude of /‘s and fill the data, as //////////bin/sh will result in the same thing as /bin/sh.

Similarly, one can also exploit vulnerabilities in format string functions. These functions depend on format specifiers to decide in what format data should be used and displayed. An example of these modifiers are %d, %s, %x. If a user were to input these modifiers into their input, and this input was not being filtered correctly, the result would be that data that should not be displayed would be shown, regardless.
An example of this can be seen below, where input is entered into the vulnerable function, leading the specifiers entered to leak addresses and information. This can happen because the input is not filtered properly.

#include <stdio.h>
void vulnerable_function(char *input) {
    char buffer[100];
    snprintf(buffer, sizeof(buffer), input);  // Unsafe usage
    printf(buffer);  // Format string vulnerability
}

vulnerable_function("AAAA %x %x %x");

-------OUTPUT------

AAAA 403de0 30a1bf10 30a97040

If one were to automate the process, they could possibly automate it in such a way that the exploit is performed on its own, requiring no manual researching and the ability to modify address in the script as necessary. This defeats the randomization of the memory, since the data is retrieved dynamically with each individual run of the application.

Of course, DEP has its flaws, as well. While the

implementation works well, it does not suffice on its own. The memory is set to be data-only. But this can not be implemented everywhere. After all, we do need actual instruction that allow us to execute code. DEP can nowadays be defeated by ROP (Return-Oriented Programming). ROP is a method of exploitation where, instead of injecting malicious code directly, the attackers take small pieces of existing code, called gadgets, which they find in the program or its libraries. It can be seen as an advanced version of stack smashing.
Every one of these gadgets performs a small operation, such as moving memory data or calling a function, and ends with a RET (return) instruction. When the attackers chain together multiple of these gadgets, they create a ROP-Chain. This allows them to perform actions with a lot of detail, like spawning a remote shell or bypass checks performed by security software, such as an antivirus or XDR/EDR solution.

Source: Xiaofeng Wang on ResearchGate.net

One can think of this process similarly to taking a bunch of newspapers and making a custom message by cutting out characters from it and realigning them.
ROP works by exploiting memory vulnerabilities, such as buffer overflows, to overwrite the stack and control the return address. By pointing it to the first gadget, the program starts executing the attacker’s manually crafted sequence of instructions to achieve their goal. Tools such as ROPgadget also make the process a whole lot easier, allowing attackers to automatically find usable gadgets in the creation of their ROP-Chain. In this way, they can exploit the program and run their own code, despite the limit number of memory that actually allows execution.
Additionally, ROP has advanced so far nowadays, that it can just as much be performed without the need for returns. Though of course, this is even a step more advanced than standard return-oriented programming.

Combined efforts

As we can see, the methods on their own can be bypassed with some effort put into the execution of these attacks. Where both attacks can be viewed as advanced, where there is a will, there is a way. An attacker will not call off an attack because it is hard to perform, after all, when there is benefits to be gained.
Because these methods have flaws on their own, it falls to us to combine their efforts into one big security fix. Where the randomness of ASLR can be lacking, DEP makes up for this by limiting the attackable surface and thus increasing entropy. And similarly, where DEP can be exploited through ROP, ASLR can add randomness to the program, to make it so that the gadgets found in the program fall flat, since they change with every boot of the application, or a given amount of time.

When using the Windows C(++) compiler, cl.exe, the header is set to automatically compile apps with DYNAMICBASE and NXCOMPAT set to enabled. DYNAMICBASE tells Windows that the application is compatible with ASLR. A developer can turn this off, which would remove compatibility with this feature. Additionally, enabling DYNAMICBASE allows the specification of HIGHENTROPYVA in the header, which dictates whether or not ASLR can utilize the entire 64-bit address space.
NXCOMPAT tells the compiler whether or not there is compatibility with the NX-Bit functionality. Turning this off would disable DEP functionality for a program.

Additionally, Windows Defender has features that mandate Windows to only run apps that are compiled with ASLR and DEP, among other functionality, on. This means that Windows utilizes these functionalities by default, and combines them to create a secure environment by force.

Similarly, in most of the major Linux distributions, we can also see that these features are enabled by default. On Linux, the ASLR and DEP implementations run on kernel-level, meaning they run across the entire system. We can confirm as well that the default settings are set to fully randomize, although the user can change this if they desire to do so. Linux, interestingly enough, also has capabilities to perform what they call “conservative randomization”, which only fully randomizes the shared libraries, stack, mmap(), VDSO and heap.
For DEP, the process is similar. We can set noexec=off and noexec32=off to turn off the functionality. Though there is no reason to do so, as the functionality can be turned off on a process basis.

The short of this, is that using only one defensive measure is useless in practice and that it is key to use multiple features combined to assure a safe program. Administrators can also enforce these features on applications. And it is important that programmers understand these security flaws, even if ASLR, DEP or other measures are enabled.
After all, it wouldn’t be the first time that a buffer overflow has been exploited in an application that was formerly thought to be safe. We can see that various CVEs exist that exploit and bypass these measures. For example, take this CVE from 2024 that details exploitation of FreeRTOS, a real-time OS for microcontrollers. Or another CVE from 2024 that exploits the SonicWall VPN through usage of heap spraying, because the vulnerable strcpy function was used.

Does the OS and architecture matter?

We have seen that most major operating systems have some form of ASLR and DEP, as well as other implementations that can be performed on a code-level. But the question we can pose is whether or not the implementations work just as well on every platform. After all, not every platform uses the same kernel, nor the same programming language and are oftentimes built widely different. Think of how there is a significant lack of cross-support between Windows and Linux. Both of these structure things in their own way, and the biggest efforts in integrating them into one another usually lead into something close to emulation or simulation.

However, in a way, we can say the OS does matter when considering the effectiveness of implementations. But realistically, this is less a case of the OS, but of how these defenses are implemented. Windows handles ASLR differently from Linux, and MacOS is a whole different thing entirely from those two.

Nowadays, the biggest deciding factor in the effectiveness of these defensive methods is the architecture of a device, and how well the architecture’s capabilities are utilized in these implementations. Given the fact that 32-bit machines had significantly less memory than modern-day 64-bit devices, we couldn’t assign as much memory to ASLR as randomizable space.
I have spoken of the entropy created by the implementation of ASLR before. A research paper has released in 2014, researching the effectiveness of Full-ASLR implementations on 64-bit Linux systems. They have performed tests into the subject and found that ASLR on 32-bit systems is as good as useless. Commonly, within 32-bit systems, the average entropy we can expect is roughly 8-bits. Given that they cannot feasibly get more space to randomize data, they’re left with roughly 256 values to randomize on.
If tests conclude that 16 bits of entropy can be cracked within minutes, then half the amount of bytes in entropy can be cracked even faster.

Another security whitepaper, released in 2007 by Symantec, showed a similar research being performed. Windows Vista released with ASLR only enabled on programs and DLLs specifically linked to have it enabled. It was not enabled by default for other applications to retain compatibility. Although this was usually only the case for older software. The whitepaper stated that ASLR was not as robust as expected, and Microsoft has acknowledged this statement in the whitepaper, thus reinforcing the knowledge that the implementation was lacking.
It should be noted that ASLR can be enabled fully on Vista by editing the registry, or using the Enhanced Mitigation Experience Toolkit created by Microsoft. Windows versions before Vista, such as XP, has services available by third-parties. Both open-source tools and proprietary tools were available to provide solid ASLR implementations.

Testing also showed that low-memory situations made ASLR implementations on these devices, running Windows 7 or lower, significantly less successful. This was to the point that the same testing on other operating systems caused the same results on Linux, and caused Mac OS X 10.7.3 systems to crash and throw a kernel panic.

In turn, 64-bit systems allow more for more memory to be available in the implementation of ASLR. Whereas the average entropy that could be achieved on 32-bit systems was 8 bits, modern 64-bit systems could see these entropy bits reach heights of millions at minimum. The difference in entropy is so significant that there is no question that using this implementation on 64-bit is vastly superior than on a 32-bit system.

DEP is not flawless either, and early implementations were lacking. The first implementation on Windows systems were via The SecureStack software, created by SecureWave, whom based their software on the work of PaX.
From Windows XP Service Pack 2 and onward, first official support was added for x86 systems, protecting critical Windows Services by default. But this would remain disabled and give no protection if the CPU offered no support for these features. Early implementations of DEP on Windows also offered no support for ASLR, meaning that it was vulnerable to attacks that were discovered.
When Vista released, 32-bit devices received proper support through the PAE kernel, while 64-bit devices received native support through the 64-bit kernels.
Linux faced similar issues, where earlier implementations using 32-bit systems did not have support for DEP. By default, this support only came for 64-bit devices, where it was enabled by default. A patch would later arrive that applied to the kernel, to make it support 32-bit devices with support for the NX-bit.
Ubuntu 9.10 released in 2009 and held with it support for DEP on 32-bit devices without support for NX-bit, through emulation of this functionality.

Recently, the RISC and RISC-V architectures have seen significant improvements. These architectures have been built from the ground up with security in mind from the start. So we can expect support for defensive methods to be prominent in the development of these architectures.
The ARM architecture has also seen a lot of growth in the last few years. The ARM architecture has introduced the XN-bit (Execute Never), which does just about the same as the NX-bit on x86, to provide DEP.

It is clear to see that both technologies were bottlenecked by their limited memory within 32-bit architectures and systems. Whether this was because of support given, or the lack of this. The arrival of 64-bit CPU architectures provided a good step forward, and more improvements have come since then.
ARM devices, which are used primarily on mobile devices are also built with capabilities for security features and are actively developed these last few years, which can be seen as a very good development. And RISC architectures, while different, are built with features similar to DEP and ASLR, from the ground up.

Specialized usage and prevalence

Electronics are all around us, nowadays. We use small household IoT devices, we have our phones with us wherever we go and we have self-checkout in the grocery store for convenience and swift checkout.

All of these devices run some form of OS with their own architecture. Security is not just a spectrum of can and cannot. An attacker can hack your internet-connected fridge, but why would they?
There is also the question of should and shouldn’t. Should I attack this device? Will it provide me with something I can use? Devices are as secure as hackers make them. If a device is hacked often, it is paramount that more patches and updates are applied to fix vulnerabilties. This is not contained to just memory attacks, but to the digital world in general.

We can argue that a device is safe because it has ASLR, DEP and the safest software ever, but if this device holds sensitive information, then there is a reason to hack it and thus it will be less secure.
Prevalence in the digital world matters a lot with attacks. Windows is, by far, used on the majority of Desktop PCs and laptops. Because this demographic using Windows is so extraordinarily big, it makes sense that hackers want to exploit it. Windows is not a perfect system and it likely will never be, so there is always an exploit to be found.

Just as much, there is the question about Apple and Android. Android is objectively the more open ecosystem of the two, since it is open-source by nature. However, this being open-source allows hackers to look into the code and analyse it, making the detection of vulnerabilities to exploit that much easier. On the contrary, the iOS ecosystem by Apple is very tight and closed. This is also the reason that we hear about Android vulnerabilities more than we do Apple ones.

A lot of the IoT systems we use in our lives are embedded systems by nature. These devices often run on microcontrollers. They are made to be energy-efficient, memory-efficient and cost-efficient. This brings its own caveats, however. Since memory-efficiency is a priority, it means that memory is small by default. There is not a lot of space for security implementations because of this and so alternative, less secure methods of security have to be implemented. An example of this is this research on Foscam IP-cameras, performed by Josh Terrill. They proved that reverse-engineering was possible by reading the data from the script and reverse-engineering the encryption used on the OS, which appeared to be a light-weight version of Linux, made for IoT devices.

Just the same, research done on devices can be the make-it or break-it feature of security. Just like we can show that Foscam devices and their firmware are vulnerable to reverse-engineering, we know that this is the case because someone has researched it. Before that point, they were secure. Platforms such as Linux or Android are open-source. Hobbyists and enthusiasts, as well as big companies develop code for these projects because they enjoy doing so or benefit from doing so. And just the same, students also research vulnerabilities, universities test and develop exploits and publish these, so that our every day systems can become more secure in turn.

In the end, it is a cycle of who researches these platforms, with what goal in mind and for what ends. Security cannot exist without research. But just like that, vulnerabilities cannot exist without research, either. There is always more than one reason whether or not a system is vulnerable or not.

What does the future hold?

Research doesn’t end, research always moves on and repeats its cycle. We will likely not be done researching defenses and exploits against memory attacks in the near future. ASLR or DEP for example may be expanded upon, or perhaps phased out for something different entirely. I want to put the spotlight on some recent and future developments that we can expect to see.

Current developments

Enhanced ASLR implementations are in development. Right now, we are seeing plans for FGKASLR (Finer Grained Kernel ASLR). Where KASLR (Kernel ASLR) randomized down to the kernel level, FGKASLR plans to bring this randomization down to a function level.

Intel CET (Control-Flow Enforcement Technology) is a technology in development by Intel to counteract on exploits such as ROP or JOP (Jump-Oriented Programming). CET implements Shadow Stacks for secure return address storage and IBT (Indirect Branch Tracking) to prevent arbitrary jump instructions. More information on this can be found here. This technology is also supported in the newer Intel Tiger Lake and AMD Zen 3 CPUs.

(Generative) AI has been on the rise. We have seen a rapid growth in capabilities of these technologies and this will only become greater in the near future. Development is rapid and the possibilities are great. We have seen, however, that AI can be used for bad, as well. Voice cloning and deepfaking has been exploited to help in phishing campaigns or spreading of misinformation on the internet. But it can also be used for good. AI can be trained to find patterns in attacks being performed. And you should not be surprised if your antivirus, or even Windows, starts to implement AI as a means to counteract attacks from being performed.

Conclusion

We have taken a look at the effectiveness of defensive measures taken to prevent memory exploitation on various forms of devices.
Where in the past, these devices were limited in capabilities and defenses were lacking, these implementations have since kept on improving to provide further security.
However, research keeps progressing, attackers keep finding new ways, and developments will not cease any time soon. The best we can do is stay curious, keep following developments and keep on doing research and filling in the gaps in our code that systems might not fully fill.
After all, our program is only as safe as we make it!

Thank you for reading!

This also marks the end of this series into memory exploitation and attacks. I hope you have found this research interesting. If you would like to leave feedback or a comment, feel free to do so on any of my relevant posts!
I thank you for your time and spending the time to read my research, once again.

Thank you for coming on this ride with me! I appreciate it!