|
| 1 | +--- |
| 2 | +title: xc_domain_claim_pages() |
| 3 | +description: Stake a claim for further memory for a domain, and release it too. |
| 4 | +--- |
| 5 | + |
| 6 | +## Purpose |
| 7 | + |
| 8 | +The purpose of `xc_domain_claim_pages()` is to attempt to |
| 9 | +stake a claim on an amount of memory for a given domain which guarantees that |
| 10 | +memory allocations for the claimed amount will be successful. |
| 11 | + |
| 12 | +The domain can still attempt to allocate beyond the claim, but those are not |
| 13 | +guaranteed to be successful and will fail if the domain's memory reaches it's |
| 14 | +`max_mem` value. |
| 15 | + |
| 16 | +Each domain can only have one claim, and the domid is the key of the claim. |
| 17 | +By killing the domain, the claim is also released. |
| 18 | + |
| 19 | +Depending on the given size argument, the remaining stack of the domain |
| 20 | +can be set initially, updated to the given amount, or reset to no claim (0). |
| 21 | + |
| 22 | +## Management of claims |
| 23 | + |
| 24 | +- The stake is centrally managed by the Xen hypervisor using a |
| 25 | + [Hypercall](https://wiki.xenproject.org/wiki/Hypercall). |
| 26 | +- Claims are not reflected in the amount of free memory reported by Xen. |
| 27 | + |
| 28 | +## Reporting of claims |
| 29 | + |
| 30 | +- `xl claims` reports the outstanding claims of the domains: |
| 31 | + > [!info] Sample output of `xl claims`: |
| 32 | + > ```js |
| 33 | + > Name ID Mem VCPUs State Time(s) Claimed |
| 34 | + > Domain-0 0 2656 8 r----- 957418.2 0 |
| 35 | + > ``` |
| 36 | +- `xl info` reports the host-wide outstanding claims: |
| 37 | + > [!info] Sample output from `xl info | grep outstanding`: |
| 38 | + > ```js |
| 39 | + > outstanding_claims : 0 |
| 40 | + > ``` |
| 41 | +
|
| 42 | +## Tracking of claims |
| 43 | +
|
| 44 | +Xen only tracks: |
| 45 | +- the outstanding claims of each domain and |
| 46 | +- the outstanding host-wide claims. |
| 47 | +
|
| 48 | +Claiming zero pages effectively cancels the domain's outstanding claim |
| 49 | +and is always successful. |
| 50 | +
|
| 51 | +> [!info] |
| 52 | +> - Allocations for outstanding claims are expected to always be successful. |
| 53 | +> - But this reduces the amount of outstanding claims if the domain. |
| 54 | +> - Freeing memory of the domain increases the domain's claim again: |
| 55 | +> - But, when a domain consumes its claim, it is reset. |
| 56 | +> - When the claim is reset, freed memory is longer moved to the outstanding claims! |
| 57 | +> - It would have to get a new claim on memory to have spare memory again. |
| 58 | +
|
| 59 | +> [!warning] The domain's `max_mem` value is used to deny memory allocation |
| 60 | +> If an allocation would cause the domain to exceed it's `max_mem` |
| 61 | +> value, it will always fail. |
| 62 | +
|
| 63 | +
|
| 64 | +## Implementation |
| 65 | +
|
| 66 | +Function signature of the libXenCtrl function to call the Xen hypercall: |
| 67 | +
|
| 68 | +```c |
| 69 | +long xc_memory_op(libxc_handle, XENMEM_claim_pages, struct xen_memory_reservation *) |
| 70 | +``` |
| 71 | +
|
| 72 | +`struct xen_memory_reservation` is defined as : |
| 73 | + |
| 74 | +```c |
| 75 | +struct xen_memory_reservation { |
| 76 | + .nr_extents = nr_pages, /* number of pages to claim */ |
| 77 | + .extent_order = 0, /* an order 0 means: 4k pages, only 0 is allowed */ |
| 78 | + .mem_flags = 0, /* no flags, only 0 is allowed (at the moment) */ |
| 79 | + .domid = domid /* numerical domain ID of the domain */ |
| 80 | +}; |
| 81 | +``` |
| 82 | + |
| 83 | +### Concurrency |
| 84 | + |
| 85 | +Xen protects the consistency of the stake of the domain |
| 86 | +using the domain's `page_alloc_lock` and the global `heap_lock` of Xen. |
| 87 | +Thse spin-locks prevent any "time-of-check-time-of-use" races. |
| 88 | +As the hypercall needs to take those spin-locks, it cannot be preempted. |
| 89 | + |
| 90 | +### Return value |
| 91 | + |
| 92 | +The call returns 0 if the hypercall successfully claimed the requested amount |
| 93 | +of memory, else it returns non-zero. |
| 94 | + |
| 95 | +## Current users |
| 96 | + |
| 97 | +### <tt>libxl</tt> and the <tt>xl</tt> CLI |
| 98 | + |
| 99 | +If the `struct xc_dom_image` passed by `libxl` to the |
| 100 | +[libxenguest](https://github.com/xen-project/xen/tree/master/tools/libs/guest) |
| 101 | +functions |
| 102 | +[meminit_hvm()](https://github.com/xen-project/xen/blob/de0254b9/tools/libs/guest/xg_dom_x86.c#L1348-L1649) |
| 103 | +and |
| 104 | +[meminit_pv()](https://github.com/xen-project/xen/blob/de0254b9/tools/libs/guest/xg_dom_x86.c#L1183-L1333) |
| 105 | +has it's `claim_enabled` field set, they, |
| 106 | +before allocating the domain's system memory using the allocation function |
| 107 | +[xc_populate_physmap()](https://github.com/xen-project/xen/blob/de0254b9/xen/common/memory.c#L159-L314) which calls the hypercall to allocate and populate |
| 108 | +the domain's main system memory, will attempt to claim the to-be allocated |
| 109 | +memory using a call to `xc_domain_claim_pages()`. |
| 110 | +In case this fails, they do not attempt to continue and return the error code |
| 111 | +of `xc_domain_claim_pages()`. |
| 112 | + |
| 113 | +Both functions also (unconditionally) reset the claim upon return. |
| 114 | + |
| 115 | +But, the `xl` CLI uses this functionality (unless disabled in `xl.conf`) |
| 116 | +to make building the domains fail to prevent running out of memory inside |
| 117 | +the `meminit_hvm` and `meminit_pv` calls. |
| 118 | +Instead, they immediately return an error. |
| 119 | + |
| 120 | +This means that in case the claim fails, `xl` avoids: |
| 121 | +- The effort of allocating the memory, thereby not blocking it for other domains. |
| 122 | +- The effort of potentially needing to scrub the memory after the build failure. |
| 123 | + |
| 124 | +### xenguest |
| 125 | + |
| 126 | +While [xenguest](../../../xenopsd/walkthroughs/VM.build/xenguest) calls the |
| 127 | +[libxenguest](https://github.com/xen-project/xen/tree/master/tools/libs/guest) |
| 128 | +functions |
| 129 | +[meminit_hvm()](https://github.com/xen-project/xen/blob/de0254b9/tools/libs/guest/xg_dom_x86.c#L1348-L1649) |
| 130 | +and |
| 131 | +[meminit_pv()](https://github.com/xen-project/xen/blob/de0254b9/tools/libs/guest/xg_dom_x86.c#L1183-L1333) |
| 132 | +like `libxl` does, it does not set |
| 133 | +[struct xc_dom_image.claim_enabled](https://github.com/xen-project/xen/blob/de0254b9/tools/include/xenguest.h#L186), |
| 134 | +so it does not enable the first call to `xc_domain_claim_pages()` |
| 135 | +which would claim the amount of memory that these functions will |
| 136 | +attempt to allocate and populate for the domain. |
| 137 | + |
| 138 | +#### Future design ideas for improved NUMA support |
| 139 | + |
| 140 | +For improved support for [NUMA](../../../toolstack/features/NUMA/), `xenopsd` |
| 141 | +may want to call an updated version of this function for the domain, so it has |
| 142 | +a stake on the NUMA node's memory before `xenguest` will allocate for the domain |
| 143 | +before assigning an NUMA node to a new domain. |
| 144 | + |
| 145 | +Further, as PV drivers `unmap` and `free` memory for grant tables to Xen and |
| 146 | +then re-allocate memory for those grant tables, `xenopsd` may want to try to |
| 147 | +stake a very small claim for the domain on the NUMA node of the domain so that |
| 148 | +Xen can increase this claim when the PV drivers `free` this memory and re-use |
| 149 | +the resulting claimed amount for allocating the grant tables. This would ensure |
| 150 | +that the grant tables are then allocated on the local NUMA node of the domain, |
| 151 | +avoiding remote memory accesses when accessing the grant tables from inside |
| 152 | +the domain. |
| 153 | + |
| 154 | +Note: In case the corresponding backend process in Dom0 is running on another |
| 155 | +NUMA node, it would access the domain's grant tables from a remote NUMA node, |
| 156 | +but in this would enable a future improvement for Dom0, where it could prefer to |
| 157 | +run the corresponding backend process on the same or a neighbouring NUMA node. |
0 commit comments