February 2017 – Work Of Ard

Rule #1 of crypto club: don’t roll your own

Kernel hackers are usually self-righteous bastards who think that they are smarter than everyone else (and I am no exception). Sometimes, it’s hard to fight the urge to reimplement something simply because you think you would have done a slightly better job. On the enterprise scale, this is usually referred to as not invented here syndrome (NIH), and in the past, I have worked for companies where this resulted in entire remote procedure call (RPC) protocol stacks being developed, including a home cooked interface definition language (IDL) and a compiler (yes, I am looking at you, TomTom).

The EDK2 open source project is also full of reinvented wheels, where everything from the build tools to string libraries have been implemented and even invented from scratch. But when it came to incorporating crypto into the code base, they did the right thing, and picked the OpenSSL library, even if this meant putting the burden on the developer to go and find the correct tarball and unpack it in the right place. (Due to license incompatibilities, merging the OpenSSL code into the EDK2 tree would render it undistributable.)

The bottom line, of course, is that you are not smarter than everyone else, and in fact, that there are very smart people out there whose livelihood depends on breaking your supposedly secure system. So instead of reimplementing existing crypto algorithms, or, god forbid, inventing ‘better’ ones, you can spend your time more wisely and learn about existing algorithms and how to use them correctly.

Rule #2 of crypto club: read the manual

Not all encryption modes are suitable for all purposes. For instance, symmetric stream ciphers such as RC4, or AES in CTR mode, should never reuse the same combination of key and initialization vector (IV). This makes stream ciphers mostly unsuitable for disk encryption, which typically derives its IV from the sector number, and sectors are typically written to more than once. (The reason is that, since the key stream is xor’ed with the plaintext to obtain the ciphertext, two ciphertexts encrypted with the same key and IV xor’ed with each other will produce the same value as the two plaintexts xor’ed together, which means updates to disk blocks are essentially visible in the clear. Ouch.)

Many other algorithms have similar limitations: DES had its weak keys, RSA needs padding to be safe, and DSA (as well as ElGamal encryption) should not reuse its k parameter, or its key can be trivially factored out.

Algorithm versus implementation

Unfortunately, we are not there yet. Even after having ticked all the boxes, we may still end up with a system that is insecure. One notable example is AES, which is superb in all other aspects, but, as Daniel J. Bernstein claimed in this paper in 2005, its implementation may be vulnerable to attacks.

In a nutshell, Daniel J. Bernstein’s paper shows that there is an exploitable correlation between the key and the response time of a network service that involves AES encryption, but only when the plaintext is known. This is due to the fact that the implementation performs data dependent lookups in precomputed tables, which are typically 4 – 8 KB in size (i.e., much larger than a typical cacheline), resulting in a variance in the response time.

This may sound peculiar, i.e., if the plaintext is known, what is there to attack, right? But the key itself is also confidential, and AES is also used in a number of MAC algorithms where the plaintext is usually not secret to begin with. Also, the underlying structure of the network protocol may allow the plaintext to be predicted with a reasonable degree of certainty.

For this reason, OpenSSL (which was the implementation under attack in the paper), has switched to time invariant AES implementations as much as possible.

Time invariant AES

On 64-bit ARM, we now have three separate time invariant implementations of AES, one based on the ARMv8 Crypto Extensions and two that are NEON based. On 32-bit ARM, however, the only time invariant AES implementation is the bit sliced NEON one, which is very inefficient when operating in sequential modes such as CBC encryption or CCM/CMAC authentication. (There is an ARMv8 Crypto Extensions implementation for 32-bit ARM as well, but that is currently only relevant for 32-bit kernels running on 64-bit hardware.)

So for Linux v4.11, I have implemented a generic, [mostly] time invariant AES cipher, that should eliminate variances in AES processing time that are correlated with the key. It achieves this by choosing a slightly slower algorithm that is equivalent to the table based AES, but uses only 256 bytes of lookup data (the actual AES S-box), and mixes some S-box values at fixed offsets with the first round key. Every time the key is used, these values need to be xor’ed again, which will pull the entire S-box into the D-cache, hiding the lookup latency of subsequent data dependent accesses.

So if you care more about security than about performance when it comes to networking, for instance, for unmonitored IoT devices that listen for incoming network connections all day, my recommendation is to disable the table based AES, and use the fixed time flavour instead.

# CONFIG_CRYPTO_AES_ARM is not set
CONFIG_CRYPTO_AES_TI=y

The priority based selection rules will still select the much faster NEON code when possible (provided that the CPU has a NEON unit), but this is dependent on the choice of chaining mode.

	Algorithm	Resolved as
Disk encryption	`xts(aes)`	`xts-aes-neonbs`
mac80211 CMAC	`cmac(aes)`	`cmac(aes-fixed-time)`
VPN	`ccm(aes)`	`ccm_base(ctr-aes-neonbs,cbcmac(aes-fixed-time))`

Update: 2019-02-15 – A full blown UEFI port for the Raspberry Pi 3 based on this code is now available in the Tianocore edk2-platforms repository.

Zen and the art of UEFI development

UEFI is an acquired taste. The EDK2 reference implementation has a very steep learning curve, and everything about it, from its build tools to the coding style, is eerily different, in a Twilight Zone kind of way.

UEFI as a firmware specification, however, has huge value: it defines abstractions for the interactions that occur between the OS and the firmware, which means [in theory] that development at either side can occur against the specification rather than against one of the many implementations. This allows things like universal OS installers, which is so common on x86 that people are sometimes surprised that this has always been a cause for headaches on ARM.

UEFI on the Pi

A prime example of this is the Raspberry Pi. It has a very peculiar hardware architecture, consisting of a Broadcom VideoCore 4 GPU, which is the primary core on the SoC, combined with one or more ARM CPUs. Revision 3 of the Raspberry Pi combines this GPU with 4 Cortex-A53 cores, which are low end 64-bit cores designed by ARM. The boot architecture matches the hardware architecture, in the sense that the GPU boots first, and reads a configuration file off a FAT partition on the SD card that describes how to proceed with booting the OS. A typical installation has a 32-bit ARM Linux kernel in the FAT partition, and a Raspbian installation (a variant of the Debian GNU/Linux distro compiled specially for the Raspberry Pi) on another SD partition, formatted as EXT4.

Using a standard Linux distro is impossible, which is unfortunate, given the recent effort in upstreaming SoC support for the Raspberry Pi 3. If we could only run UEFI on this board, we could boot a bog standard Ubuntu installer ISO and have an ordinary installation that boots via GRUB.

So over the past couple of months, I have been spending some of my spare cycles to porting EDK2 to the Raspberry Pi 3. This also involves a port of ARM Trusted Firmware, given that the Raspberry Pi 3 boots all its ARM cores in EL3 when configured to boot in 64-bit mode. It is a work in progress, and at the moment, it does little useful beyond booting the board into the UEFI Shell.

Building the secure firmware

Follow this link to my Raspberry Pi 3 branch of ARM Trusted Firmware, and build it using the following commands:

export CROSS_COMPILE=aarch64-linux-gnu-
export PRELOADED_BL33_BASE=0x20000
make PLAT=rpi3 fip all 

# add this so we can find the resulting image in the next step
export ATF_BUILD_DIR=$(pwd)/build/rpi3/release

This port is a minimal implementation of ARM Trusted Firmware, which pens up the secondary cores until the OS is ready to boot them via PSCI. It also implements PSCI System Reset via the SoC watchdog. Beyond that, it does the usual initialization of the secure world, and drops into EL2 to boot UEFI.

Building the UEFI firmware

Clone this repository and symlink it into an existing EDK2 working environment. Build it as follows:

build -a AARCH64 -t GCC5 -b DEBUG \
      -p OpenPlatformPkg/Platforms/RaspberryPi/RaspberryPi.dsc \
      -D ATF_BUILD_DIR=$ATF_BUILD_DIR

The resulting bootable image, containing both the secure and UEFI firmware, can now be found in the following file

Build/RaspberryPi-AARCH64/DEBUG_GCC5/FV/RPI_EFI.fd

Copy it to the FAT partition on the SD card, and update config.txt so it contains the following lines:

enable_uart=1
kernel=RPI_EFI.fd
arm_control=0x200

The DEBUG build of EDK2 is very noisy, so after inserting the SD and powering the device, you should see lots of output from the serial port (115200n8) in a matter of seconds.

Wishlist

For this UEFI port to do anything useful, we need driver support. In order of relevance, we need

USB host mode support
Graphics Output Protocol (GOP) support
wired Ethernet support (USB based)
SDHCI support
Random Number Generator (RNG) protocol [for KASLR]

The first two items would allow booting and installing a distro without use of the serial port at all, which would be a huge improvement to the user experience.

Contributions welcome!