Skip to content

Commit cc8a608

Browse files
authored
doc: walkthroughs/VM.start: Update the xenguest chapter (domain build) (xapi-project#6272)
Update: - No longer add the links to not "official" source repositories for xenguest. Update and extend the walkthrough of `VM.start`: - Update the links to `xenopsd` functions from [xapi-project/xenopsd.git](https://github.com/xapi-project/xenopsd) to the current [xenopsd](https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/) code in [xapi-project/xen-api.git](https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/) - Add the step for storing the platform data (vCPUs, vCPU affinity, etc.) in the domain's Xenstore tree.` xenguest then uses this in the `build` phase to build the domain. - Convert the paragraphs about `xenguest` into a dedicated chapter on `xenguest`. - Add a summary of the data that `xenopsd` passes to `xenguest` for the domain build - Add a summary of the steps that `xenguest` takes to build the domain.
2 parents cf38f99 + 9ef7e19 commit cc8a608

File tree

1 file changed

+99
-26
lines changed

1 file changed

+99
-26
lines changed

doc/content/xenopsd/walkthroughs/VM.start.md

Lines changed: 99 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -225,8 +225,8 @@ module and looks for scripts in the hardcoded path `/etc/xapi.d`.
225225
## 2. create a Xen domain
226226

227227
The `VM_create` micro-op calls the `VM.create` function in the backend.
228-
In the classic Xenopsd backend the
229-
[VM.create_exn](https://github.com/xapi-project/xenopsd/blob/b33bab13080cea91e2fd59d5088622cd68152339/xc/xenops_server_xen.ml#L633)
228+
In the classic Xenopsd backend, the
229+
[VM.create_exn](https://github.com/xapi-project/xen-api/blob/bae7526faeb2a02a2fe5b71410083983f4695963/ocaml/xenopsd/xc/xenops_server_xen.ml#L1421-L1586)
230230
function must
231231

232232
1. check if we're creating a domain for a fresh VM or resuming an existing one:
@@ -237,7 +237,13 @@ function must
237237
because domain create often fails in low-memory conditions. This means the
238238
"reservation" is associated with our "session" with squeezed; if Xenopsd
239239
crashes and restarts the reservation will be freed automatically.
240-
3. create the Domain via the libxc hypercall
240+
3. create the Domain via the libxc hypercall `Xenctrl.domain_create`
241+
4. [call](
242+
https://github.com/xapi-project/xen-api/blob/bae7526faeb2a02a2fe5b71410083983f4695963/ocaml/xenopsd/xc/xenops_server_xen.ml#L1547)
243+
[generate_create_info()](
244+
https://github.com/xapi-project/xen-api/blob/bae7526faeb2a02a2fe5b71410083983f4695963/ocaml/xenopsd/xc/xenops_server_xen.ml#L1302-L1419)
245+
for storing the platform data (vCPUs, etc) the domain's Xenstore tree.
246+
`xenguest` then uses this in the `build` phase (see below) to build the domain.
241247
4. "transfer" the squeezed reservation to the domain such that squeezed will
242248
free the memory if the domain is destroyed later
243249
5. compute and set an initial balloon target depending on the amount of memory
@@ -253,10 +259,16 @@ function must
253259

254260
## 3. build the domain
255261

256-
On a Xen system a domain is created empty, and memory is actually allocated
257-
from the host in the "build" phase via functions in *libxenguest*. The
258-
[VM.build_domain_exn](https://github.com/xapi-project/xenopsd/blob/b33bab13080cea91e2fd59d5088622cd68152339/xc/xenops_server_xen.ml#L994)
259-
function must
262+
On Xen, `Xenctrl.domain_create` creates an empty domain and
263+
returns the domain ID (`domid`) of the new domain to `xenopsd`.
264+
265+
In the `build` phase, the `xenguest` program is called to create
266+
the system memory layout of the domain, set vCPU affinity and a
267+
lot more.
268+
269+
The function
270+
[VM.build_domain_exn](https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/xenops_server_xen.ml#L2024)
271+
must
260272

261273
1. run pygrub (or eliloader) to extract the kernel and initrd, if necessary
262274
2. invoke the *xenguest* binary to interact with libxenguest.
@@ -266,25 +278,86 @@ function must
266278
you would use after a reboot because some properties (such as maximum memory
267279
and vCPUs) as fixed on create.
268280

269-
The xenguest binary was originally
270-
a separate binary for two reasons: (i) the libxenguest functions weren't
271-
threadsafe since they used lots of global variables; and (ii) the libxenguest
272-
functions used to have a different, incompatible license, which prevent us
273-
linking. Both these problems have been resolved but we still shell out to
274-
the xenguest binary.
275-
276-
The xenguest binary has also evolved to configure more of the initial domain
277-
state. It also [reads Xenstore](https://github.com/xapi-project/ocaml-xen-lowlevel-libs/blob/master/xenguest-4.4/xenguest_stubs.c#L42)
278-
and configures
279-
280-
- the vCPU affinity
281-
- the vCPU credit2 weight/cap parameters
282-
- whether the NX bit is exposed
283-
- whether the viridian CPUID leaf is exposed
284-
- whether the system has PAE or not
285-
- whether the system has ACPI or not
286-
- whether the system has nested HVM or not
287-
- whether the system has an HPET or not
281+
### 3.1 Interface to xenguest for building domains
282+
283+
[xenguest](https://github.com/xenserver/xen.pg/blob/XS-8/patches/xenguest.patch)
284+
was originally created as a separate program due to issues with
285+
`libxenguest` that were fixed, but we still shell out to `xenguest`:
286+
287+
- Wasn't threadsafe: fixed, but it still uses a per-call global struct
288+
- Incompatible licence, but now licensed under the LGPL.
289+
290+
The `xenguest` binary has evolved to build more of the initial
291+
domain state. `xenopsd` passes it:
292+
293+
- The domain type to build for (HVM, PHV or PV),
294+
- The `domid` of the created empty domain,
295+
- The amount of system memory of the domain,
296+
- The platform data (vCPUs, vCPU affinity, etc) using the Xenstore:
297+
- the vCPU affinity
298+
- the vCPU credit2 weight/cap parameters
299+
- whether the NX bit is exposed
300+
- whether the viridian CPUID leaf is exposed
301+
- whether the system has PAE or not
302+
- whether the system has ACPI or not
303+
- whether the system has nested HVM or not
304+
- whether the system has an HPET or not
305+
306+
When called to build a domain, `xenguest` reads those and builds the VM accordingly.
307+
308+
### 3.2 Workflow for allocating and populating domain memory
309+
310+
Based on the given type, the `xenguest` program calls dedicated
311+
functions for the build process of given domain type.
312+
313+
- For HVM, this function is `stub_xc_hvm_build()`.
314+
315+
These domain build functions call these functions:
316+
317+
1. `get_flags()` to get the platform data from the Xenstore
318+
2. `configure_vcpus()` which uses the platform data from the Xenstore to configure vCPU affinity and the credit scheduler parameters vCPU weight and vCPU cap (max % pCPU time for throttling)
319+
3. For HVM, `hvm_build_setup_mem` to:
320+
1. Decide the `e820` memory layout of the system memory of the domain
321+
including memory holes depending on PCI passthrough and vGPU flags.
322+
2. Load the BIOS/UEFI firmware images
323+
3. Store the final MMIO hole parameters in the Xenstore
324+
4. Call the `libxenguest` function
325+
[xc_dom_boot_mem_init()](https://github.com/xen-project/xen/blob/39c45caef271bc2b2e299217450cfda24c0c772a/tools/libs/guest/xg_dom_boot.c#L110-L126)
326+
to allocate and map the domain's system memory.
327+
For HVM domains, it calls
328+
[meminit_hvm()](https://github.com/xen-project/xen/blob/39c45caef271bc2b2e299217450cfda24c0c772a/tools/libs/guest/xg_dom_x86.c#L1348-L1648)
329+
to loop over the `vmemranges` of the domain for mapping the system RAM
330+
of the guest from the Xen hypervisor heap. Its goals are:
331+
332+
- Attempt to allocate 1GB superpages when possible
333+
- Fall back to 2MB pages when 1GB allocation failed
334+
- Fall back to 4k pages when both failed
335+
336+
It uses the hypercall
337+
[XENMEM_populate_physmap()](
338+
https://github.com/xen-project/xen/blob/39c45caef271bc2b2e299217450cfda24c0c772a/xen/common/memory.c#L1408-L1477)
339+
to perform memory allocation and to map the allocated memory
340+
to the system RAM ranges of the domain.
341+
The hypercall must:
342+
343+
1. convert the arguments for allocating a page to hypervisor structures
344+
2. set flags and calls functions according to the arguments
345+
3. allocate the requested page at the most suitable place
346+
347+
- depending on passed flags, allocate on a specific NUMA node
348+
- else, if the domain has node affinity, on the affine nodes
349+
- also in the most suitable memory zone within the NUMA node
350+
351+
4. fall back to less desirable places if this fails
352+
353+
- or fail for "exact" allocation requests
354+
355+
5. split superpages if pages of the requested size are not available
356+
357+
5. Call `construct_cpuid_policy()` to apply the `CPUID` `featureset` policy
358+
359+
For more details on the VM build step involving xenguest and Xen side see:
360+
https://wiki.xenproject.org/wiki/Walkthrough:_VM_build_using_xenguest
288361

289362
## 4. mark each VBD as "active"
290363

0 commit comments

Comments
 (0)