Skip to content

Commit f58c32a

Browse files
docs: Add dedicated walk-throughs for VM.build and xenguest
Signed-off-by: Bernhard Kaindl <[email protected]>
1 parent 66e9509 commit f58c32a

File tree

5 files changed

+436
-107
lines changed

5 files changed

+436
-107
lines changed
Lines changed: 137 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,137 @@
1+
---
2+
title: Domain.build
3+
description:
4+
"Prepare the build of a VM: Wait for scrubbing, do NUMA placement, run xenguest."
5+
---
6+
7+
## Overview
8+
9+
```mermaid
10+
flowchart LR
11+
subgraph xenopsd VM_build[
12+
xenopsd&nbsp;thread&nbsp;pool&nbsp;with&nbsp;two&nbsp;VM_build&nbsp;micro#8209;ops:
13+
During&nbsp;parallel&nbsp;VM_start,&nbsp;Many&nbsp;threads&nbsp;run&nbsp;this&nbsp;in&nbsp;parallel!
14+
]
15+
direction LR
16+
build_domain_exn[
17+
VM.build_domain_exn
18+
from thread pool Thread #1
19+
] --> Domain.build
20+
Domain.build --> build_pre
21+
build_pre --> wait_xen_free_mem
22+
build_pre -->|if NUMA/Best_effort| numa_placement
23+
Domain.build --> xenguest[Invoke xenguest]
24+
click Domain.build "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L1111-L1210" _blank
25+
click build_domain_exn "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/xenops_server_xen.ml#L2222-L2225" _blank
26+
click wait_xen_free_mem "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L236-L272" _blank
27+
click numa_placement "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L862-L897" _blank
28+
click build_pre "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L899-L964" _blank
29+
click xenguest "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L1139-L1146" _blank
30+
31+
build_domain_exn2[
32+
VM.build_domain_exn
33+
from thread pool Thread #2] --> Domain.build2[Domain.build]
34+
Domain.build2 --> build_pre2[build_pre]
35+
build_pre2 --> wait_xen_free_mem2[wait_xen_free_mem]
36+
build_pre2 -->|if NUMA/Best_effort| numa_placement2[numa_placement]
37+
Domain.build2 --> xenguest2[Invoke xenguest]
38+
click Domain.build2 "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L1111-L1210" _blank
39+
click build_domain_exn2 "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/xenops_server_xen.ml#L2222-L2225" _blank
40+
click wait_xen_free_mem2 "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L236-L272" _blank
41+
click numa_placement2 "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L862-L897" _blank
42+
click build_pre2 "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L899-L964" _blank
43+
click xenguest2 "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L1139-L1146" _blank
44+
end
45+
```
46+
47+
[`VM.build_domain_exn`](https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/xenops_server_xen.ml#L2024-L2248)
48+
[calls](https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/xenops_server_xen.ml#L2222-L2225)
49+
[`Domain.build`](https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L1111-L1210)
50+
to call:
51+
- `build_pre` to prepare the build of a VM:
52+
- If the `xe` config `numa_placement` is set to `Best_effort`, invoke the NUMA placement algorithm.
53+
- Run `xenguest`
54+
- `xenguest` to invoke the [xenguest](xenguest) program to setup the domain's system memory.
55+
56+
## Domain Build Preparation using build_pre
57+
58+
[`Domain.build`](https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L1111-L1210)
59+
[calls](https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L1137)
60+
the [function `build_pre`](https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L899-L964)
61+
(which is also used for VM restore). It must:
62+
63+
1. [Call](https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L902-L911)
64+
[wait_xen_free_mem](https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L236-L272)
65+
to wait, if necessary, for the Xen memory scrubber to catch up reclaiming memory (CA-39743)
66+
2. Call the hypercall to set the timer mode
67+
3. Call the hypercall to set the number of vCPUs
68+
4. As described in the [NUMA feature description](../../toolstack/features/NUMA),
69+
when the `xe` configuration option `numa_placement` is set to `Best_effort`,
70+
except when the VM has a hard affinity set, invoke the `numa_placement` function:
71+
72+
```ml
73+
match !Xenops_server.numa_placement with
74+
| Any ->
75+
()
76+
| Best_effort ->
77+
log_reraise (Printf.sprintf "NUMA placement") (fun () ->
78+
if has_hard_affinity then
79+
D.debug "VM has hard affinity set, skipping NUMA optimization"
80+
else
81+
numa_placement domid ~vcpus
82+
~memory:(Int64.mul memory.xen_max_mib 1048576L)
83+
)
84+
```
85+
86+
## NUMA placement
87+
88+
`build_pre` passes the `domid`, the number of `vCPUs` and `xen_max_mib` to the
89+
[numa_placement](https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L862-L897)
90+
function to run the algorithm to find the best NUMA placement.
91+
92+
When it returns a NUMA node to use, it calls the Xen hypercalls
93+
to set the vCPU affinity to this NUMA node:
94+
95+
```ml
96+
let vm = NUMARequest.make ~memory ~vcpus in
97+
let nodea =
98+
match !numa_resources with
99+
| None ->
100+
Array.of_list nodes
101+
| Some a ->
102+
Array.map2 NUMAResource.min_memory (Array.of_list nodes) a
103+
in
104+
numa_resources := Some nodea ;
105+
Softaffinity.plan ~vm host nodea
106+
```
107+
108+
By using the default `auto_node_affinity` feature of Xen,
109+
setting the vCPU affinity causes the Xen hypervisor to activate
110+
NUMA node affinity for memory allocations to be aligned with
111+
the vCPU affinity of the domain.
112+
113+
Note: See the Xen domain's
114+
[auto_node_affinity](https://wiki.xenproject.org/wiki/NUMA_node_affinity_in_the_Xen_hypervisor)
115+
feature flag, which controls this, which can be overridden in the
116+
Xen hypervisor if needed for specific VMs.
117+
118+
This can be used, for example, when there might not be enough memory
119+
on the preferred NUMA node, but there are other NUMA nodes that have
120+
enough free memory among with the memory allocations shall be done.
121+
122+
In terms of future NUMA design, it might be even more favourable to
123+
have a strategy in `xenguest` where in such cases, the superpages
124+
of the preferred node are used first and a fallback to neighbouring
125+
NUMA nodes only happens to the extent necessary.
126+
127+
Likely, the future allocation strategy should be passed to `xenguest`
128+
using Xenstore like the other platform parameters for the VM.
129+
130+
Summary: This passes the information to the hypervisor that memory
131+
allocation for this domain should preferably be done from this NUMA node.
132+
133+
## Invoke the xenguest program
134+
135+
With the preparation in `build_pre` completed, `Domain.build`
136+
[calls](https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L1127-L1155)
137+
the `xenguest` function to invoke the [xenguest](xenguest) program to build the domain.
Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
---
2+
title: VM_build micro-op
3+
linkTitle: VM_build μ-op
4+
description: Overview of the VM_build μ-op (runs after the VM_create μ-op created the domain).
5+
weight: 10
6+
---
7+
8+
## Overview
9+
10+
On Xen, `Xenctrl.domain_create` creates an empty domain and
11+
returns the domain ID (`domid`) of the new domain to `xenopsd`.
12+
13+
In the `build` phase, the `xenguest` program is called to create
14+
the system memory layout of the domain, set vCPU affinity and a
15+
lot more.
16+
17+
The [VM_build](https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/lib/xenops_server.ml#L2255-L2271)
18+
micro-op collects the VM build parameters and calls
19+
[VM.build](https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/xenops_server_xen.ml#L2290-L2291),
20+
which calls
21+
[VM.build_domain](https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/xenops_server_xen.ml#L2250-L2288),
22+
which calls
23+
[VM.build_domain_exn](https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/xenops_server_xen.ml#L2024-L2248)
24+
which calls [Domain.build](Domain.build):
25+
26+
```mermaid
27+
flowchart
28+
subgraph xenopsd VM_build[xenopsd&nbsp;VM_build&nbsp;micro#8209;op]
29+
direction LR
30+
VM_build --> VM.build
31+
VM.build --> VM.build_domain
32+
VM.build_domain --> VM.build_domain_exn
33+
VM.build_domain_exn --> Domain.build
34+
click VM_build "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/lib/xenops_server.ml#L2255-L2271" _blank
35+
click VM.build "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/xenops_server_xen.ml#L2290-L2291" _blank
36+
click VM.build_domain "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/xenops_server_xen.ml#L2250-L2288" _blank
37+
click VM.build_domain_exn "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/xenops_server_xen.ml#L2024-L2248" _blank
38+
click Domain.build "../Domain.build/index.html"
39+
end
40+
```
41+
42+
The function
43+
[VM.build_domain_exn](https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/xenops_server_xen.ml#L2024)
44+
must:
45+
46+
1. Run pygrub (or eliloader) to extract the kernel and initrd, if necessary
47+
2. [Call](https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/xenops_server_xen.ml#L2222-L2225)
48+
[Domain.build](Domain.build)
49+
to:
50+
- optionally run NUMA placement and
51+
- invoke [xenguest](VM.build/xenguest) to set up the domain memory.
52+
53+
See the walk-though on [VM.build](VM.build) for more details on this phase.
54+
3. Apply the `cpuid` configuration
55+
4. Store the current domain configuration on disk -- it's important to know
56+
the difference between the configuration you started with and the configuration
57+
you would use after a reboot because some properties (such as maximum memory
58+
and vCPUs) as fixed on create.
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
---
2+
title: Building a VM
3+
description: After VM_create, VM_build builds the core of the domain (vCPUs, memory)
4+
weight: 20
5+
---
6+
7+
Walk-through documents for the `VM_build` phase:
8+
9+
```mermaid
10+
flowchart
11+
subgraph xenopsd VM_build[xenopsd&nbsp;VM_build&nbsp;micro#8209;op]
12+
direction LR
13+
VM_build --> VM.build
14+
VM.build --> VM.build_domain
15+
VM.build_domain --> VM.build_domain_exn
16+
VM.build_domain_exn --> Domain.build
17+
click VM_build "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/lib/xenops_server.ml#L2255-L2271" _blank
18+
click VM.build "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/xenops_server_xen.ml#L2290-L2291" _blank
19+
click VM.build_domain "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/xenops_server_xen.ml#L2250-L2288" _blank
20+
click VM.build_domain_exn "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/xenops_server_xen.ml#L2024-L2248" _blank
21+
end
22+
```
23+
24+
{{% children description=true %}}

0 commit comments

Comments
 (0)