KASLR in the arm64 Linux kernel

Kernel Address Space Layout Randomization (KASLR) is a hardening feature that aims to make it more difficult to take advantage of known exploits in the kernel, by placing kernel data structures at a random address at each boot. The Linux kernel currently implements this feature for 32-bit and 64-bit x86, and an implementation for the 64-bit ARM architecture (arm64) is queued for the v4.6 release which is due in a couple of weeks.

For the arm64 implementation, the kernel address space layout is randomized in the following ways:

  • loading the core kernel at a random physical address
  • mapping the core kernel at a random virtual address in the vmalloc area
  • loading kernel modules at a random virtual address in the vmalloc area
  • mapping system memory at a random virtual address in the linear area

Physical address randomization

Since the physical address at which the kernel executes is decided strictly by the bootloader (or on UEFI systems, by the UEFI stub), and not by the kernel itself, implementing physical address randomization consists primarily of removing assumptions in the core kernel that it has been loaded at the base of physical memory. Since the kernel text mapping, the virtual mapping of RAM and the physical mapping of RAM are all tightly coupled, the first step is to decouple those, and move the kernel into the vmalloc region. Once that is done, the bootloader is free to choose any area in physical RAM to place the kernel at boot.

Note that this move of the kernel VA space into the vmalloc region is by far the most intrusive change in the KASLR patch set, and some other patch sets that were under review at the same time required non-trivial rework to remain compatible with the new VA layout configuration.

For v4.7, some enhancement work has been queued to relax the alignment requirement of the core kernel from ‘2 MB aligned base + 512 KB’ to any 64 KB aligned physical offset. The actual number of random bits in the physical address of the kernel depends on the size of system memory, but for a system with 4 GB, it adds up to around 15 bits.

Virtual randomization of the core kernel

The virtual address the core kernel executes at is typically fixed, and thus the kernel binary is a non-relocatable binary where all memory addresses are calculated and emitted into the executable image at build time. With virtual randomization, these memory addresses need to be recalculated at runtime, and updated inside the running image. This means the kernel binary needs to be converted into a relocatable binary, and one that is able to relocate itself (in the absence of a loader such as the one used under the OS to load shared libraries) When the random virtual mapping is created at early boot time, the self relocation routines can take this random virtual offset into account when applying the relocation fixups, after which the kernel will be able to execute from this random virtual address.

The above is supported by the standard binutils toolchain. By linking ordinary (non-PIC) small model code (i.e., relative symbol references with a +/- 4 GB range) in PIE mode, we end up with a binary that has a .rela section consisting of standard ELF64 RELA entries, which are processed by the early startup code.

The RELA relocation format keeps the addend in the relocation entry rather than in the memory location that the relocation targets, and for R_AARCH64_ABS64 relocations, this memory location itself is filled with zeroes until the relocation code executes. This has a couple of downsides:

  • The executable image needs to be relocated to its runtime address even if this address is equal to the link time address.
  • The EXTABLE entries cannot be sorted at build time. This was addressed by switching to relative EXTABLE entries, which -as a bonus- reduces the size of the exception table by 50%.
  • Static Image header fields can no longer rely on 64-bit relocations to be populated by the linker at build time. Instead, we need to split them into 32-bit halves.

Since the .rela section can grow fairly big, an additional optimization has been implemented that turns the kallsyms symbol table into a relative table as well. This saves a 24 byte RELA entry per kernel symbol, which adds up to around 1.5 MB for a arm64 defconfig kernel. Due to the obvious benefit, this optimization was enabled by default for all architectures except IA-64 and Tile in 64-bit mode (whose address space is too sparse to support this feature).

With the enhancement mentioned above, a 48-bit VA kernel (the default for arm64 defconfig) can reside at any 64 KB offset in the first half of the vmalloc space, which means the addresses allow for 30 bits of entropy to be used in randomization.

Virtual randomization of the module region

To prevent modules leaking the virtual address of core kernel data structures, the module region can be randomized fully independently from the core kernel. To this end, a 128 MB virtual region is chosen at boot time, and all module allocations are served from this area. Since the modules and the core kernel are likely to be loaded far away from each other (more than 128 MB, which is the maximum range of relative jump instructions), we also need to implement support for module PLTs, which contain veneers (i.e., trampolines) to bridge the distance between the jump instructions and their targets. Since module PLTs may have a performance impact, it is also possible to choose the module region such that it intersects the .text section of the core kernel, so that jumps via PLT veneers are only required in the unlikely event that the module region runs out of space.

Virtual randomization of the linear region

The linear mapping covers all RAM pages in the order that they appear in the physical address space. Since the virtual area reserved for the linear mapping is typically much larger than the actual physical footprint of RAM (i.e., the distance between the first and the last usable RAM pages, including all holes between them), the placement of those pages inside the linear mapping can be randomized as well. This will make heap allocations via the linear mapping (i.e., kmalloc()) less predictable. Since there is a performance concern associated with the relative alignment between physical and virtual mappings (e.g., on 4 KB pages, RAM will be mapped at 1 GB granularity if the virtual and physical addresses modulo 1 GB are equal), this randomization is coarse grained, but still an improvement over a fully deterministic one. (The size of the linear region is typically at least 256 GB)

How to enable it

Randomization requires a good source of entropy, and arm64 does not have an architected means of obtaining entropy (e.g., via an instruction), nor does its early execution environment have access to platform specific peripherals that can supply such entropy. This means it is left to the bootloader to generate a KASLR seed, and pass it to the core kernel via the /chosen/kaslr-seed DT property.

For platforms that boot via UEFI, the UEFI stub in the arm64 kernel will attempt to locate the EFI_RNG_PROTOCOL, and invoke it to supply a kaslr-seed. On top of that, it will use this protocol to randomize the physical load address of the kernel Image.
QEMU in UEFI mode supports this protocol if the virtio-rng-pci device is made available. Bare metal platforms like the Celloboard or QDF2432 implement this protocol natively as well.

To enable the KASLR feature, the kernel needs to be built with CONFIG_RANDOMIZE_BASE=y.