UEFI driver pitfalls and PC-isms

Even though Intel created UEFI (still known by its TLA EFI at the time) for Itanium initially, x86 is by far the dominant architecture when it comes to UEFI deployments in the field, and even though the spec itself is remarkably portable to architectures such as ARM, there are a lot of x86 UEFI drivers out there that cut corners when it comes to spec compliance. There are a couple of reasons for this:

  • the x86 architecture is not as heterogeneous as other architectures, and while the form factor may vary, most implementations are essentially PCs;
  • the way the PC platform organizes its memory and especially its DMA happens to result in a configuration that is rather forgiving when it comes to UEFI spec violations.

UEFI drivers provided by third parties are mostly intended for plugin PCI cards, and are distributed as binary option ROM images. There are very few open source UEFI drivers available (apart from the _HCI class drivers and some drivers for niche hardware available in Tianocore), and even if they were widely available, you would still need to get them into the flash ROM of your particular card, which is not a practice hardware vendors are eager to support.
This means the gap between theory and practice is larger than we would like, and this becomes apparent when trying to run such code on platforms that deviate significantly from a PC.

The theory

As an example, here is some code from the EDK2 EHCI (USB2) host controller driver.

  Status = PciIo->AllocateBuffer (PciIo, AllocateAnyPages,
                     EfiBootServicesData, Pages, &BufHost, 0);
  if (EFI_ERROR (Status)) {
    goto FREE_BITARRAY;
  }

  Bytes = EFI_PAGES_TO_SIZE (Pages);
  Status = PciIo->Map (PciIo, EfiPciIoOperationBusMasterCommonBuffer,
                     BufHost, &Bytes, &MappedAddr, &Mapping);
  if (EFI_ERROR (Status) || (Bytes != EFI_PAGES_TO_SIZE (Pages))) {
    goto FREE_BUFFER;
  }

  ...

  Block->BufHost  = BufHost;
  Block->Buf      = (UINT8 *) ((UINTN) MappedAddr);
  Block->Mapping  = Mapping;

This is a fairly straight-forward way of using UEFI’s PCI DMA API, but there a couple of things to note here:

  • PciIo->Map () may be called with the EfiPciIoOperationBusMasterCommonBuffer mapping type only if the memory was allocated using PciIo->AllocateBuffer ();
  • the physical address returned by PciIo->Map () in MappedAddr may deviate from both the virtual and physical addresses as seen by the CPU (note that UEFI maps VA to PA 1:1);
  • the size of the actual mapping may deviate from the requested size.

However, none of this matters on a PC, since its PCI is cache coherent and 1:1 mapped. So the following code will work just as well:

  Status = gBS->AllocatePages (AllocateAnyPages, EfiBootServicesData,
                  Pages, &BufHost);
  if (EFI_ERROR (Status)) {
    goto FREE_BITARRAY;
  }

  ...

  Block->BufHost  = BufHost;
  Block->Buf      = BufHost;

So let’s look at a couple of ways a non-PC platform can deviate from a PC when it comes to the layout of its physical address space.

DRAM starts at address 0x0

On a PC, DRAM starts at address 0x0, and most of the 32-bit addressable physical region is used for memory. Not only does this mean that inadvertent NULL pointer dereferences from UEFI code may go entirely unnoticed (one example of this is the NVidia GT218 driver), it also means that PCI devices that only support 32-bit DMA (or need a little kick to support more than that) will always be able to work. In fact, most UEFI implementations for x86 explicitly limit PCI DMA to 4 GB, and most UEFI PCI drivers don’t bother to set the mandatory EFI_PCI_IO_ATTRIBUTE_DUAL_ADDRESS_CYCLE attribute for >32 bit DMA capable hardware either.

On ARM systems, the amount of available 32-bit addressable RAM may be much smaller, or it may even be absent entirely. In the latter case, hardware that is only 32-bit DMA capable can only work if a IOMMU is present and wired into the PCI root bridge driver by the platform, or if DRAM is not mapped 1:1 in the PCI address space. But in general, it should be expected that ARM platforms use at least 40 bits of address space for DMA, and that drivers for 64-bit DMA capable peripherals enable this capability in the hardware.

PCI DMA is cache coherent

Although not that common, it is possible and permitted by the UEFI spec for PCI DMA to be non cache coherent. This is completely transparent to the driver, provided that it uses the APIs correctly. For instance, PciIo->AllocateBuffer () will return an uncached buffer in this case, and the Map () and Unmap () methods will perform cache maintenance under the hood to keep the CPU’s and the device’s view of memory in sync. Obviously, this use case breaks spectacularly if you cut corners like in the second example above.

PCI memory is mapped 1:1 with the CPU

On a PC, the two sides of the PCI host bridge are mapped 1:1. As illustrated in the example above, this means you can essentially ignore the device or bus address returned from the PciIo->Map () call, and just program the CPU physical address into the DMA registers/rings/etc. However, non-PC systems may have much more extravagant PCI topologies, and so a compliant driver should use the appropriate APIs to obtain these addresses. Note that this is not limited to inbound memory accesses (DMA) but also applies to outbound accesses, and so a driver should not interpret BAR fields from the PCI config space directly, given that the CPU side mapping of that BAR may be at a different address altogether.

PC has strongly ordered memory

Whatever. UEFI is uniprocessor anyway, and I don’t remember seeing any examples where this mattered.

Using encrypted memory for DMA

Interestingly, and luckily for us in the ARM world, there are other reasons why hardware vendors are forced to clean up their drivers: memory encryption. This case is actually rather similar to the non cache coherent DMA case, in the sense that the allocate, map and unmap actions all involve some extra work performed by the platform under the hood. Common DMA buffers are allocated from unencrypted memory, and mapping or unmapping involve decryption or encryption in place depending on the direction of the transfer (or bounce buffering if encryption in place is not possible, in which case the device address will deviate from the host address like in the non-1:1 mapped PCI case above). Cutting corners here means that attempted DMA transfers will produce corrupt data, usually a strong motivator to get your code fixed.

Conclusion

The bottom line is really that the UEFI APIs appear to be able to handle anything you throw at them when it comes to unconventional platform topologies, but this only works if you use them correctly, and having been tested on a PC doesn’t actually prove all that much in this regard.