-
Notifications
You must be signed in to change notification settings - Fork 29
Enable propolis to generate ACPI tables #999
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
glitzflitz
wants to merge
16
commits into
oxidecomputer:master
Choose a base branch
from
glitzflitz:acpi_fwcfg_reord
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
+2,799
−15
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add a TableLoader builder that can be used to generate the etc/table-loader file to be passed to guest firmware via fw_cfg. The etc/table-loader file in fw_cfg contains the sequence of fixed size linker/loader commands that can be used to instruct guest to allcoate memory for set of fw_cfg files(e.g. ACPI tables), link allocated memory by patching pointers and calculate the ACPI checksum. Signed-off-by: Amey Narkhede <ameynarkhede03@gmail.com>
de06cf3 to
08bc195
Compare
Add builders to generate basic ACPI tables RSDP(ACPI 2.0+) that points to XSDT, XSDT with 64-bit table pointers and RSDT with 32-bit table pointers that would work with the table-loader mechanism in fw_cfg. These tables are used to describe the ACPI table hierarchy to guest firmware. The builders produce raw table data bytes with placeholder addresses and checksums that are fixed up by firmware using table-loader commands. Signed-off-by: Amey Narkhede <ameynarkhede03@gmail.com>
FADT describes fixed hardware features and points to the DSDT. The builder supports both standard and HW-reduced ACPI modes. DSDT contains AML bytecode describing system hardware. The builder provides methods to append AML data which could be populated by a AML generation mechanism in future. Signed-off-by: Amey Narkhede <ameynarkhede03@gmail.com>
08bc195 to
0ca9c2b
Compare
Add a builder for the Multiple APIC Description Table (MADT) that describes the system's interrupt controllers. Supports adding local APIC, I/O APIC and interrupt source overrides for describing processor and interrupt controller topology. Signed-off-by: Amey Narkhede <ameynarkhede03@gmail.com>
Add builder for the HPET table that describes the HPET hardware to the guest. The table uses the bhyve HPET hardware ID (0x8086a201) and maps to the standard HPET MMIO address at 0xfed00000. Signed-off-by: Amey Narkhede <ameynarkhede03@gmail.com>
Add the FACS table that provides a memory region for firmware/OS handshaking. The table includes the GlobalLock field used for mutual exclusion between the OS and firmware during ACPI operations but we don't have support for handling GBL_EN yet[1] but to match the behaviour of OVMF expose the table. [1]: oxidecomputer#837 Signed-off-by: Amey Narkhede <ameynarkhede03@gmail.com>
Define bytecode opcodes for AML generation per ACPI Specification Chapter 20 [1]. Includes namespace modifiers, named objects, data object prefixes, name path prefixes, local/argument references, control flow and logical/arithmetic operators. These constants will be used in next commits to generate AML byte code which would enable us to generate ACPI tables ourselves. [1]: https://uefi.org/specs/ACPI/6.5/20_AML_Specification.html Signed-off-by: Amey Narkhede <ameynarkhede03@gmail.com>
Implement NameSeg and NameString encoding per ACPI Specification Section 20.2.2 [1]. Single segments encode as 4 bytes padded with underscores, dual segments use DualNamePrefix and three or more use MultiNamePrefix with a count byte. Also implement EISA ID compression for hardware identification strings like "PNP0A08". [1]: https://uefi.org/specs/ACPI/6.4_A/20_AML_Specification.html#name-objects-encoding Signed-off-by: Amey Narkhede <ameynarkhede03@gmail.com>
Add AML bytecode generation to mainly support dynamically generating ACPI tables and control methods. The bytecode is built in a single pass by directly writing to the output buffer. AML scopes encode their length in a 1-4 byte PkgLength field at the start[1]. Since we don't know the final size until the scope's content is fully written, reserve 4 bytes when opening a scope upfront and splice in the actual encoded length when the scope closes. This avoids complexity of having to build an in memory tree and then walk it twice to measure and serialize. The RAII guards automatically close scopes and finalize the PkgLength on drop. Those guards hold a mutable borrow on the builder so the borrow checker won't let us close a parent while a child scope is still open. The limitation of this approach is that the content has to be written in output order but that is not a big issue for the use case of VM device descriptions. [1]: ACPI Specification Section 20.2.4 https://uefi.org/specs/ACPI/6.4_A/20_AML_Specification.html#package-length-encoding Signed-off-by: Amey Narkhede <ameynarkhede03@gmail.com>
Implement ResourceTemplateBuilder for constructing resource descriptors used in methods like _CRS. Supports QWord/DWord memory and I/O ranges, Word bus numbers and IRQ descriptors per ACPI Specification Section 6.4 [1]. [1]: https://uefi.org/specs/ACPI/6.4_A/06_Device_Configuration.html#resource-data-types-for-acpi Signed-off-by: Amey Narkhede <ameynarkhede03@gmail.com>
Export public API for AML generation AmlBuilder, AmlWriter trait, guard types (ScopeGuard, DeviceGuard, MethodGuard), EisaId and ResourceTemplateBuilder. This would enable generating the dynamic bytecode used in tables like DSDT. Signed-off-by: Amey Narkhede <ameynarkhede03@gmail.com>
Since now we have support to generate AML bytecode, add DSDT generation that provides the guest OS with device information via AML. The DSDT contains _SB.PCI0 describing the PCIe host bridge with ECAM configuration space and bus number resources, plus COM1-COM4 serial port devices with their IO ports and IRQs. Signed-off-by: Amey Narkhede <ameynarkhede03@gmail.com>
Add AT keyboard controller resources to allow guest to enumerate the i8042 controller. Only keyboard is added to match the OVMF's existing behaviour for now. Signed-off-by: Amey Narkhede <ameynarkhede03@gmail.com>
The OS calls _OSC on the PCIe host bridge to negotiate control of native PCIe features like hotplug, AER and PME. Without _OSC, Linux logs warning about missing capability negotiation(_OSC: platform retains control of PCIe features (AE_NOT_FOUND)) and as per [1] Windows as well won't enable any of the advanced PCI Express features through PCI Express Native Control. Control of the AER seems to be optional so its not handed over to the guest for now. Also to simplify the aml generation of _OSC itself introduce some high level wrappers around aml generation. [1]: https://learn.microsoft.com/en-us/windows-hardware/drivers/pci/enabling-pci-express-native-control Signed-off-by: Amey Narkhede <ameynarkhede03@gmail.com>
Combine all ACPI tables into the format expected by firmware(OVMF) by using fw_cfg's table-loader commands for address patching and checksum computation. Signed-off-by: Amey Narkhede <ameynarkhede03@gmail.com>
29dbb4f to
a568c53
Compare
Integrate the new ACPI table generation into propolis-standalone and propolis-server. Also replace hardcoded memory region addresses with constants that align with ACPI table definitions. The PCIe ECAM base is kept same as before at 0xe000_0000 (3.5GB) to match existing i440fx chipset ECAM placement. Guest physical memory map: 0x0000_0000 - 0xbfff_ffff Low RAM (up to 3 GiB) 0xc000_0000 - 0xffff_ffff PCI hole (1 GiB MMIO region) 0xc000_0000 - 0xdfff_ffff 32-bit PCI MMIO 0xe000_0000 - 0xefff_ffff PCIe ECAM (256 MiB, 256 buses) 0xfec0_0000 IOAPIC 0xfed0_0000 HPET 0xffe0_0000 - 0xffff_ffff Bootrom (2 MiB) 0x1_0000_0000+ High RAM + 64-bit PCI MMIO e820 map as seen by guest: 0x0000_0000 - 0x0009_ffff Usable (640 KiB low memory) 0x0010_0000 - 0xbeaf_ffff Usable (~3 GiB main RAM) 0xbeb0_0000 - 0xbfb6_cfff Reserved (UEFI runtime/data) 0xbfb6_d000 - 0xbfbf_efff ACPI Tables + NVS 0xbfbf_f000 - 0xbffd_ffff Usable (top of low memory) 0xbffe_0000 - 0xffff_ffff Reserved (PCI hole) 0x1_0000_0000 - highmem Usable (high RAM above 4 GiB) To stay on safe side only enable using new ACPI tables for newly launched VMs. Old VMs using OVMF tables would keep using the same OVMF tables throughout multiple migrations. To verify this add the phd test as well for new VM launched with native tables, native tables preserved through migration and VM launched from old propolis without native tables stays with OVMF through multiple future migrations. Signed-off-by: Amey Narkhede <ameynarkhede03@gmail.com>
a568c53 to
55bb207
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Background
As per #695
currently propolis relies on
edk2-stable202105version of EDK2 OVMF to provide the ACPI tables to the guest as it was the last version that has included static tables.Another limitation is the guest only sees whatever OVMF decided to generate rather than what the hypervisor knows about the virtual/emulated hardware.
In newer versions, OVMF expects the VMM to generate a set of ACPI tables and expose them via the
fw_cfgtable-loader interface. Being able to generate ACPI tables also unlocks other opportunities for features like being able to chose which tables and control methods to expose, PCIe host bridge and switch emulation, supporting native PCIe hotplug etc.This PR addresses that limitation and adds mechanism to let propolis generate its own ACPI tables.
Implementation
Oveview
The series starts with implementing fw_cfg's table-loader mechanism to enable passing static tables to guest firmware(OVMF). Then the basic static tables like
RSDT,XSDTandRSDPetc are added.After that we reach to second milestone that is generating the AML bytecode. This where some technical decision need to be made after evaluating different options and tradeoffs along with use case for how to go about generating bytecode without introducing too much complexity.
At the end everything is wired up to switch to using propolis generated tables.
Details
The fw_cfg Interface
QEMU's fw_cfg interface provides a mechanism for the hypervisor to expose files to guest firmware. Propolis already had basic fw_cfg support for the e820 memory map and bootrom. The ACPI implementation builds on that foundation.
OVMF expects three specific fw_cfg files for ACPI tables:
The table-loader file contains a sequence of fixed-size commands that instruct OVMF to allocate memory, patch pointer fields and compute checksums. This is necessary because the tables contain absolute addresses that are only known after OVMF allocates memory for them.
In the proposed implementation in Add fw_cfg table-loader helpers for ACPI table generation ,
TableLoadergenerates three command types:ALLOCATE - reserves memory for a fw_cfg file with specified alignment in a given zone
ADD_POINTER - patches an address field in one file to point at another file's allocated location. The command specifies source file, destination file, offset within source and pointer size
ADD_CHECKSUM - computes a checksum over a byte range and writes it to a specified offset. ACPI tables use a simple byte sum that must equal zero.
The commands are used in Prepare the ACPI tables for generation
Static Table Generation
The simpler static tables that don't require AML bytecode are implemented first.
Since propolis does not have hotplug support yet SSDT is not required as of now.
AML generation and usage
The DSDT contains AML bytecode for describing devices, methods and resources. AML has a hierarchical structure with scopes containing devices which contain named objects and methods. The encoding uses variable length packages.
Possible approaches
QEMU uses a C based approach with GArray buffers. Each AML construct is a function returning an Aml pointer that must be explicitly appended to its parent. The design is flexible but also has caveats for example, forgetting manual
aml_appendcall silently drops content and there is no type safety around what can be nested. Since we are not bound my limitations of C and have borrow checker with us, we can do better.crosvm defines a single
Amltrait with many implementing types. Each construct is a separate struct collecting children in a Vec. The usage pattern is usually a macro followed byto_aml_bytes()which recursively serializes the tree. Although this provides strong typing, its bit more complex and requires constructing the entire tree in memory before serialization. Package lengths use a two pass approach of first measuring then writing.Firecracker also follows a same pattern to crosvm with trait methods along with some additional error handling.
acpi_tables crate used by cloud-hypervisor: uses a dual trait design to split the problem into two traits:
Amlfor things that can be serialized andAmlSinkas the destination. The sink abstraction is nice because the same tree can write to a Vec or feed a checksum calculator without changing the serialization code. Its structurally similar to crosvm and the same two pass length encoding which gets bit complex when building nested hierarchies.Approach in this series
Introduce AML bytecode generation adds RAII guards that automatically finalize package lengths when dropped.
The core abstraction is an
AmlBuilderthat owns a single byte buffer plus guard types for Scope, Device and Method. Each guard holds a mutable borrow on the builder so we have compile time scope safety through the borrow checker. This way its impossible to miss closing any scope.Also using single buffer from
AmlBuilderavoids the overhead of dynamic dispatch as in crosvm and acpi_tables approach.Guards borrow the builder mutably and write content directly to its buffer. When a guard is created it writes the opcode, reserves 4 bytes for the package length (the maximum encoding size) and writes the name. When the guard drops it calculates the actual package length, encodes it in 1-4 bytes and splices out the unused reserved bytes.
Usage looks like
which looks structurally similar to ASL code that is compiled to AML bytecode.
The conditional content is simply an if statement due to RAII guards which avoids complexity of Option wrappers as needed in other cases mentioned above. The limitation in this design is that its less composable. There is no easy way to return a "partial device tree" from a function or store AML fragments for later use.
Note about Package Length Encoding
The ACPI specification Section 20.2.4 defines a variable length encoding for package sizes. A package length includes itself in the count which creates a circular dependency: the length must be known to encode it but the encoding affects the length. That is why two pass approach is often used as done by others.
The implementation in Introduce AML byte generation, simply reserves max 4 bytes when opening any scope and splices in the actual encoded length when the scope closes. This produces minimal output with a single pass through the data.
I'd be open to new ideas or going with another approach mentioned above as well :)
Wiring up new tables
The new table generation is controlled by a
native_acpi_tablesflag in the Board spec. Newly launched VMs have this set totrueand get new generated tables viafw_cfg. VMs migrating from older propolis versions won't have this field in their spec so it defaults tofalseand they keep using OVMF tables.So existing VMs can safely migrate to propolis generated tables without any guest visible changes to their ACPI tables. Only VMs launched with new version of propolis will use the new tables.
Testing
This is the dmesg of linux when using new tables. Now the standard OVMF bootrom can be used.
GlobalLockis not supported by propolis yet so the warning appears with OVMF tables as wellTODO: