Skip to content

Commit 68d4187

Browse files
(docs) Describe the flows of setting NUMA node affinity in Xen by Xenopsd
Signed-off-by: Bernhard Kaindl <[email protected]>
1 parent fd52035 commit 68d4187

File tree

5 files changed

+379
-11
lines changed

5 files changed

+379
-11
lines changed

doc/content/lib/xenctrl/xc_domain_node_setaffinity.md

Lines changed: 69 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,32 @@
11
---
22
title: xc_domain_node_setaffinity()
3-
description: Set a Xen domain's NUMA node affinity
3+
description: Set a Xen domain's NUMA node affinity for memory allocations
4+
mermaid:
5+
force: true
46
---
57

6-
`xc_domain_node_setaffinity()` controls the NUMA node affinity of a domain.
8+
`xc_domain_node_setaffinity()` controls the NUMA node affinity of a domain,
9+
but it only updates the Xen hypervisor domain's `d->node_affinity` mask.
10+
This mask is read by the Xen memory allocator as the 2nd preference for the
11+
NUMA node to allocate memory from for this domain.
712

8-
By default, Xen enables the `auto_node_affinity` feature flag,
9-
where setting the vCPU affinity also sets the NUMA node affinity for
10-
memory allocations to be aligned with the vCPU affinity of the domain.
13+
> [!info] Preferences of the Xen memory allocator:
14+
> 1. A NUMA node passed to the allocator directly takes precedence, if present.
15+
> 2. Then, if the allocation is for a domain, it's `node_affinity` mask is tried.
16+
> 3. Finally, it falls back to spread the pages over all remaining NUMA nodes.
17+
18+
As this call has no practical effect on the Xen scheduler, vCPU affinities
19+
need to be set separately anyways.
20+
21+
The domain's `auto_node_affinity` flag is enabled by default by Xen. This means
22+
that when setting vCPU affinities, Xen updates the `d->node_affinity` mask
23+
to consist of the NUMA nodes to which its vCPUs have affinity to.
24+
25+
See [xc_vcpu_setaffinity()](xc_vcpu_setaffinity) for more information
26+
on how `d->auto_node_affinity` is used to set the NUMA node affinity.
27+
28+
Thus, so far, there is no obvious need to call `xc_domain_node_setaffinity()`
29+
when building a domain.
1130

1231
Setting the NUMA node affinity using this call can be used,
1332
for example, when there might not be enough memory on the
@@ -63,18 +82,57 @@ https://github.com/xen-project/xen/blob/master/xen/common/domain.c#L943-L970"
6382
This function implements the functionality of `xc_domain_node_setaffinity`
6483
to set the NUMA affinity of a domain as described above.
6584
If the new_affinity does not intersect the `node_online_map`,
66-
it returns `-EINVAL`, otherwise on success `0`.
85+
it returns `-EINVAL`. Otherwise, the result is a success, and it returns `0`.
6786

6887
When the `new_affinity` is a specific set of NUMA nodes, it updates the NUMA
69-
`node_affinity` of the domain to these nodes and disables `auto_node_affinity`
70-
for this domain. It also notifies the Xen scheduler of the change.
88+
`node_affinity` of the domain to these nodes and disables `d->auto_node_affinity`
89+
for this domain. With `d->auto_node_affinity` disabled,
90+
[xc_vcpu_setaffinity()](xc_vcpu_setaffinity) no longer updates the NUMA affinity
91+
of this domain.
92+
93+
If `new_affinity` has all bits set, it re-enables the `d->auto_node_affinity`
94+
for this domain and calls
95+
[domain_update_node_aff()](https://github.com/xen-project/xen/blob/e16acd80/xen/common/sched/core.c#L1809-L1876)
96+
to re-set the domain's `node_affinity` mask to the NUMA nodes of the current
97+
the hard and soft affinity of the domain's online vCPUs.
98+
99+
### Flowchart in relation to xc_set_vcpu_affinity()
100+
101+
The effect of `domain_set_node_affinity()` can be seen more clearly on this
102+
flowchart which shows how `xc_set_vcpu_affinity()` is currently used to set
103+
the NUMA affinity of a new domain, but also shows how `domain_set_node_affinity()`
104+
relates to it:
71105

72-
This sets the preference the memory allocator to the new NUMA nodes,
73-
and in theory, it could also alter the behaviour of the scheduler.
74-
This of course depends on the scheduler and its configuration.
106+
{{% include "xc_vcpu_setaffinity-xenopsd-notes.md" %}}
107+
{{% include "xc_vcpu_setaffinity-xenopsd.md" %}}
108+
109+
`xc_domain_node_setaffinity` can be used to set the domain's `node_affinity`
110+
(which is normally set by `xc_set_vcpu_affinity`) to different NUMA nodes.
111+
112+
#### No effect on the Xen scheduler
113+
114+
Currently, the node affinity does not affect the Xen scheudler:
115+
In case `d->node_affinity` would be set before vCPU creation, the initial pCPU
116+
of the new vCPU is the first pCPU of the first NUMA node in the domain's
117+
`node_affinity`. This is further changed when one of more `cpupools` are set up.
118+
As this is only the initial pCPU of the vCPU, this alone does not change the
119+
scheduling of Xen Credit scheduler as it reschedules the vCPUs to other pCPUs.
75120

76121
## Notes on future design improvements
77122

123+
### It may be possible to call it before vCPUs are created
124+
125+
When done early, before vCPU creation, some domain-related data structures
126+
could be allocated using the domain's `d->node_affinity` NUMA node mask.
127+
128+
With further changes in Xen and `xenopsd`, Xen could allocate the vCPU structs
129+
on the affine NUMA nodes of the domain.
130+
131+
For this, would be that `xenopsd` would have to call `xc_domain_node_setaffinity()`
132+
before vCPU creation, after having decided the domain's NUMA placement,
133+
preferably including claiming the required memory for the domain to ensure
134+
that the domain will be populated from the same NUMA node(s).
135+
78136
This call cannot influence the past: The `xenopsd`
79137
[VM_create](../../xenopsd/walkthroughs/VM.start.md#2-create-a-xen-domain)
80138
micro-ops calls `Xenctrl.domain_create`. It currently creates
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
---
2+
title: Simplified flowchart of xc_vcpu_setaffinity()
3+
description: See lib/xenctrl/xc_vcpu_setaffinity-xenopsd.md for an extended version
4+
hidden: true
5+
---
6+
```mermaid
7+
flowchart TD
8+
subgraph libxenctrl
9+
xc_vcpu_setaffinity("<tt>xc_vcpu_setaffinity()")--hypercall-->xen
10+
end
11+
subgraph xen[Xen Hypervisor]
12+
direction LR
13+
vcpu_set_affinity("<tt>vcpu_set_affinity()</tt><br>set the vCPU affinity")
14+
-->check_auto_node{"Is the domain's<br><tt>auto_node_affinity</tt><br>enabled?"}
15+
--"yes<br>(default)"-->
16+
auto_node_affinity("Set the<br>domain's<br><tt>node_affinity</tt>
17+
mask as well<br>(used for further<br>NUMA memory<br>allocation)")
18+
19+
click xc_vcpu_setaffinity
20+
"https://github.com/xen-project/xen/blob/7cf16387/tools/libs/ctrl/xc_domain.c#L199-L250" _blank
21+
click vcpu_set_affinity
22+
"https://github.com/xen-project/xen/blob/7cf16387/xen/common/sched/core.c#L1353-L1393" _blank
23+
click domain_update_node_aff
24+
"https://github.com/xen-project/xen/blob/7cf16387/xen/common/sched/core.c#L1809-L1876" _blank
25+
click check_auto_node
26+
"https://github.com/xen-project/xen/blob/7cf16387/xen/common/sched/core.c#L1840-L1870" _blank
27+
click auto_node_affinity
28+
"https://github.com/xen-project/xen/blob/7cf16387/xen/common/sched/core.c#L1867-L1869" _blank
29+
end
30+
```
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
---
2+
title: Notes for the flowchart on the use of setaffinity for VM.start
3+
hidden: true
4+
---
5+
In the flowchart, two code paths are set in bold:
6+
- Show the path when `Host.numa_affinity_policy` is the default (off) in `xenopsd`.
7+
- Show the default path of `xc_vcpu_setaffinity(XEN_VCPUAFFINITY_SOFT)` in Xen,
8+
when the Domain's `auto_node_affinity` flag is enabled (default) to show
9+
how it changes to the vCPU affinity update the domain's `node_affinity`
10+
in this default case as well.
11+
12+
[xenguest](../../xenopsd/walkthroughs/VM.build/xenguest/) uses the Xenstore
13+
to read the static domain configuration that it needs reads to build the domain.
Lines changed: 175 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,175 @@
1+
---
2+
title: Flowchart of the use of xc_vcpu_setaffinity() by xenopsd
3+
description: Shows how xenopsd uses xc_vcpu_setaffinity() to set NUMA affinity
4+
hidden: true
5+
---
6+
```mermaid
7+
flowchart TD
8+
9+
subgraph VM.create["xenopsd VM.create"]
10+
11+
%% Is xe vCPU-params:mask= set? If yes, write to Xenstore:
12+
13+
is_xe_vCPUparams_mask_set?{"
14+
15+
Is
16+
<tt>xe vCPU-params:mask=</tt>
17+
set? Example: <tt>1,2,3</tt>
18+
(Is used to enable vCPU<br>hard-affinity)
19+
20+
"} --"yes"--> set_hard_affinity("Write hard-affinity to XenStore:
21+
<tt>platform/vcpu/#domid/affinity</tt>
22+
(xenguest will read this and other configuration data
23+
from Xenstore)")
24+
25+
end
26+
27+
subgraph VM.build["xenopsd VM.build"]
28+
29+
%% Labels of the decision nodes
30+
31+
is_Host.numa_affinity_policy_set?{
32+
Is<p><tt>Host.numa_affinity_policy</tt><p>set?}
33+
has_hard_affinity?{
34+
Is hard-affinity configured in <p><tt>platform/vcpu/#domid/affinity</tt>?}
35+
36+
%% Connections from VM.create:
37+
set_hard_affinity --> is_Host.numa_affinity_policy_set?
38+
is_xe_vCPUparams_mask_set? == "no"==> is_Host.numa_affinity_policy_set?
39+
40+
%% The Subgraph itself:
41+
42+
%% Check Host.numa_affinity_policy
43+
44+
is_Host.numa_affinity_policy_set?
45+
46+
%% If Host.numa_affinity_policy is "best_effort":
47+
48+
-- Host.numa_affinity_policy is<p><tt>best_effort -->
49+
50+
%% If has_hard_affinity is set, skip numa_placement:
51+
52+
has_hard_affinity?
53+
--"yes"-->exec_xenguest
54+
55+
%% If has_hard_affinity is not set, run numa_placement:
56+
57+
has_hard_affinity?
58+
--"no"-->numa_placement-->exec_xenguest
59+
60+
%% If Host.numa_affinity_policy is off (default, for now),
61+
%% skip NUMA placement:
62+
63+
is_Host.numa_affinity_policy_set?
64+
=="default: disabled"==>
65+
exec_xenguest
66+
end
67+
68+
%% xenguest subgraph
69+
70+
subgraph xenguest
71+
72+
exec_xenguest
73+
74+
==> stub_xc_hvm_build("<tt>stub_xc_hvm_build()")
75+
76+
==> configure_vcpus("<tT>configure_vcpus()")
77+
78+
%% Decision
79+
==> set_hard_affinity?{"
80+
Is <tt>platform/<br>vcpu/#domid/affinity</tt>
81+
set?"}
82+
83+
end
84+
85+
%% do_domctl Hypercalls
86+
87+
numa_placement
88+
--Set the NUMA placement using soft-affinity-->
89+
XEN_VCPUAFFINITY_SOFT("<tt>xc_vcpu_setaffinity(SOFT)")
90+
==> do_domctl
91+
92+
set_hard_affinity?
93+
--yes-->
94+
XEN_VCPUAFFINITY_HARD("<tt>xc_vcpu_setaffinity(HARD)")
95+
--> do_domctl
96+
97+
xc_domain_node_setaffinity
98+
--Currently not used by the Xapi toolstack
99+
--> do_domctl
100+
101+
%% Xen subgraph
102+
103+
subgraph xen[Xen Hypervisor]
104+
105+
subgraph domain_update_node_affinity["domain_update_node_affinity()"]
106+
domain_update_node_aff("<tt>domain_update_node_aff()")
107+
==> check_auto_node{"Is domain's<br><tt>auto_node_affinity</tt><br>enabled?"}
108+
=="yes (default)"==>set_node_affinity_from_vcpu_affinities("
109+
Calculate the domain's <tt>node_affinity</tt> mask from vCPU affinity
110+
(used for further NUMA memory allocation for the domain)")
111+
end
112+
113+
do_domctl{"do_domctl()<br>op->cmd=?"}
114+
==XEN_DOMCTL_setvcpuaffinity==>
115+
vcpu_set_affinity("<tt>vcpu_set_affinity()</tt><br>set the vCPU affinity")
116+
==>domain_update_node_aff
117+
do_domctl
118+
--XEN_DOMCTL_setnodeaffinity (not used currently)
119+
-->is_new_affinity_all_nodes?
120+
121+
subgraph domain_set_node_affinity["domain_set_node_affinity()"]
122+
123+
is_new_affinity_all_nodes?{new_affinity<br>is #34;all#34;?}
124+
125+
--is #34;all#34;
126+
127+
--> enable_auto_node_affinity("<tt>auto_node_affinity=1")
128+
--> domain_update_node_aff
129+
130+
is_new_affinity_all_nodes?
131+
132+
--not #34;all#34;
133+
134+
--> disable_auto_node_affinity("<tt>auto_node_affinity=0")
135+
--> domain_update_node_aff
136+
end
137+
138+
%% setting and getting the struct domain's node_affinity:
139+
140+
disable_auto_node_affinity
141+
--node_affinity=new_affinity-->
142+
domain_node_affinity
143+
144+
set_node_affinity_from_vcpu_affinities
145+
==> domain_node_affinity@{ shape: bow-rect,label: "domain:&nbsp;node_affinity" }
146+
--XEN_DOMCTL_getnodeaffinity--> do_domctl
147+
148+
end
149+
click is_Host.numa_affinity_policy_set?
150+
"https://github.com/xapi-project/xen-api/blob/90ef043c1f3a3bc20f1c5d3ccaaf6affadc07983/ocaml/xenopsd/xc/domain.ml#L951-L962"
151+
click numa_placement
152+
"https://github.com/xapi-project/xen-api/blob/90ef043c/ocaml/xenopsd/xc/domain.ml#L862-L897"
153+
click stub_xc_hvm_build
154+
"https://github.com/xenserver/xen.pg/blob/65c0438b/patches/xenguest.patch#L2329-L2436" _blank
155+
click get_flags
156+
"https://github.com/xenserver/xen.pg/blob/65c0438b/patches/xenguest.patch#L1164-L1288" _blank
157+
click do_domctl
158+
"https://github.com/xen-project/xen/blob/7cf163879/xen/common/domctl.c#L282-L894" _blank
159+
click domain_set_node_affinity
160+
"https://github.com/xen-project/xen/blob/7cf163879/xen/common/domain.c#L943-L970" _blank
161+
click configure_vcpus
162+
"https://github.com/xenserver/xen.pg/blob/65c0438b/patches/xenguest.patch#L1297-L1348" _blank
163+
click set_hard_affinity?
164+
"https://github.com/xenserver/xen.pg/blob/65c0438b/patches/xenguest.patch#L1305-L1326" _blank
165+
click xc_vcpu_setaffinity
166+
"https://github.com/xen-project/xen/blob/7cf16387/tools/libs/ctrl/xc_domain.c#L199-L250" _blank
167+
click vcpu_set_affinity
168+
"https://github.com/xen-project/xen/blob/7cf16387/xen/common/sched/core.c#L1353-L1393" _blank
169+
click domain_update_node_aff
170+
"https://github.com/xen-project/xen/blob/7cf16387/xen/common/sched/core.c#L1809-L1876" _blank
171+
click check_auto_node
172+
"https://github.com/xen-project/xen/blob/7cf16387/xen/common/sched/core.c#L1840-L1870" _blank
173+
click set_node_affinity_from_vcpu_affinities
174+
"https://github.com/xen-project/xen/blob/7cf16387/xen/common/sched/core.c#L1867-L1869" _blank
175+
```

0 commit comments

Comments
 (0)