merge master to feature/easier-pool-join #6079

BengangY · 2024-10-24T07:19:45Z

merge master to feature/easier-pool-join

If you install the XAPI RPMs in your koji build environment (e.g. to build a package that depends on XAPI) then you couldn't build XAPI again anymore because its unit tests were failing. They were failing because they found some xapi hooks installed by the previous version of XAPI, whereas normally there'd be none when the unit tests are running. Disable running XAPI hooks during unit test, even if present we are not expected to run them. ``` [exception] Unix.Unix_error(Unix.ENOENT, "connect", "") Raised at Forkhelpers.execute_command_get_output_inner.(fun) in file "ocaml/forkexecd/lib/forkhelpers.ml", line 376, characters 10-19 Called from Xapi_stdext_pervasives__Pervasiveext.finally in file "ocaml/libs/xapi-stdext/lib/xapi-stdext-pervasives/pervasiveext.ml", line 24, characters 8-14 Re-raised at Xapi_stdext_pervasives__Pervasiveext.finally in file "ocaml/libs/xapi-stdext/lib/xapi-stdext-pervasives/pervasiveext.ml", line 39, characters 6-15 Called from Xapi_hooks.execute_hook.(fun) in file "ocaml/xapi/xapi_hooks.ml", line 77, characters 10-113 Called from Stdlib__Array.iter in file "array.ml", line 95, characters 31-48 Called from Xapi_host.destroy in file "ocaml/xapi/xapi_host.ml", line 1108, characters 2-98 Called from Dune__exe__Test_cluster_host.test_forget in file "ocaml/tests/test_cluster_host.ml", line 192, characters 2-42 Called from Alcotest_engine__Core.Make.protect_test.(fun) in file "src/alcotest-engine/core.ml", line 181, characters 17-23 Called from Alcotest_engine__Monad.Identity.catch in file "src/alcotest-engine/monad.ml", line 24, characters 31-35 ``` Signed-off-by: Edwin Török <[email protected]>

to connect to nbd devices, nbd_client_manager will 1. protect the operation with /var/run/nonpersistent/nbd_client_manager file lock 2. check whether nbd is being used by `nbd-client -check` 3. load nbd kernel module by `modprobe nbd` 4. call `nbd-client` to connect to nbd device However, step 3 will trigger systemd-udevd run asyncly, which would open and lock the same nbd devices, run udev rules, etc. This introduce races with step 4, e.g. both process want to open and lock the nbd device. Note: the file lock in step 1 does NOT resovle the issue here, as it only coordinate multiple nbd_client_manager processes. To fix the issue, - we patch nbd-client to report the device busy from kernel to nbd_client_manager - nbd_client_manager should check nbd-client exit code, and retry on device busy - nbd_client_manager call `udevadm settle` to wait for udevd parsing udev rules Note: checking nbd-client exit code is still necessary in case of racing with others Signed-off-by: Lin Liu <[email protected]>

networkd generates metrics for two users simultaneously: * xapi db * rrdd Both of these read from the same shared file, but use non-overlapping stats. Having moved network metrics collection from xcp-rrdd itself into a plugin, these metrics were serialized twice - moving from networkd to the plugin and from the plugin to the server. Instead generate metrics in the plugin itself and drop this generation from networkd. Signed-off-by: Andrii Sultanov <[email protected]>

Signed-off-by: Pau Ruiz Safont <[email protected]>

Gives more flexibility in tests. Now the results from the client aren't printed, but weren't important to pass the test anyway. Signed-off-by: Pau Ruiz Safont <[email protected]>

Current behaviour for displaying stats is done with the --perf parameter Signed-off-by: Pau Ruiz Safont <[email protected]>

While this does not exercise the exact error that can happen in long migrations, it gets logged in a similar way. There's no easy way to trigger the issue, the best chance is to send a malformed response to trigger a Parse_error. I did modify the code in http_client and verified that current code can produce the logging, with backtraces successfully, when set up properly (like in the test client) Signed-off-by: Pau Ruiz Safont <[email protected]>

No functional difference Signed-off-by: Pau Ruiz Safont <[email protected]>

Taking measurements in practice doesn't lead to improved accuracy. Also change the tests so more than one sample is collected and can know how noisy the measurements really are. Here's an example of a run, including the result before the change: ``` $ ./test_client.exe --perf - 1 thread non-persistent connections: 4896.0 +/- 0.0 RPCs/sec - 1 thread non-persistent connections (query): 4811.0 +/- 0.0 RPCs/sec - 10 threads non-persistent connections: 7175.0 +/- 0.0 RPCs/sec - 1 thread persistent connection: 16047.0 +/- 0.0 RPCs/sec - 10 threads persistent connections: 7713.0 +/- 0.0 RPCs/sec + 1 thread non-persistent connections: 5042.0 +/- 247.5 RPCs/sec + 1 thread non-persistent connections (query): 5173.0 +/- 216.0 RPCs/sec + 10 threads non-persistent connections: 7678.0 +/- 2241.2 RPCs/sec + 1 thread persistent connection: 21814.0 +/- 2124.6 RPCs/sec + 10 threads persistent connections: 10154.0 +/- 2461.9 RPCs/sec ``` Signed-off-by: Pau Ruiz Safont <[email protected]>

Signed-off-by: Elijah Sadorra <[email protected]>

Signed-off-by: Andrii Sultanov <[email protected]>

…project#6028) Some long-running migrations stop because of a loss of connection, log more information when it happens. I couldn't find a way to get the backtrace to be printed in a nice way without adding too much code, this also makes the change backportable. I would prefer to log it at a debug level, but the function doesn't expose it, and it would complicate backpoerting as well.

…ect#6047)

Folding over a list to add its elements to a set (which is initially empty) is operationally equivalent to calling of_list (of the set), but potentially less efficient. The implementation of of_list only uses "add" for small lists, e.g. the cases for lists [x_1; x_2; ...; x_N] for all N in range 2 <= N <= 5 are matched literally and expanded to: add x_N (... (add x_1 (singleton x_0))) However, larger lists are first sorted and the underlying tree representing the set is constructed directly. Signed-off-by: Colin James <[email protected]>

Folding over a list to add its elements to a set (which is initially empty) is operationally equivalent to calling of_list (of the set), but potentially less efficient. The implementation of of_list only uses `add` for small lists, e.g. the cases for lists `[x_1; x_2; ...; x_N]` for all `N` in range `2 <= N <= 5` are matched literally and expanded to: ``` add x_N (... (add x_1 (singleton x_0))) ``` However, larger lists are first sorted and the underlying tree representing the set is constructed directly. --- This is a stray change I cherry-picked from another branch.

This detects some unused bindings and a mutable field. Signed-off-by: Pau Ruiz Safont <[email protected]>

Also change the interface and explain the meaning behind the values. Signed-off-by: Pau Ruiz Safont <[email protected]>

The datastructure is mean to serialize the offset and length of a piece of disk, not its data. This also frontloads the possible conversion failure to the creation of the datastructure. Signed-off-by: Pau Ruiz Safont <[email protected]>

This detects some unused bindings and a mutable field. Chunked got also documented and changed the interface to make it more understandable to use.

This reverts commit c27b1d4. Signed-off-by: Edwin Török <[email protected]>

This reverts commit af68185. Signed-off-by: Edwin Török <[email protected]>

The code to extract vdis from geneva / zurich releases has been unused for years Signed-off-by: Pau Ruiz Safont <[email protected]>

…project#6021) to connect to nbd devices, nbd_client_manager will 1. protect the operation with /var/run/nonpersistent/nbd_client_manager file lock 2. check whether nbd is being used by `nbd-client -check` 3. load nbd kernel module by `modprobe nbd` 4. call `nbd-client` to connect to nbd device However, step 3 will trigger systemd-udevd run asyncly, which would open and lock the same nbd devices, run udev rules, etc. This introduce races with step 4, e.g. both process want to open and lock the nbd device. Note: the file lock in step 1 does NOT resovle the issue here, as it only coordinate multiple nbd_client_manager processes. To fix the issue, - we patch nbd-client to report the device busy from kernel to nbd_client_manager - nbd_client_manager should check nbd-client exit code, and retry on device busy

Workaround for CA-400339. This'll allow us to get a proper fix in, without having to rush that change.

The code to extract vdis from geneva / zurich releases has been unused for years This comment was amusing: ``` (* XXX: this is totally wrong: *) ```

Signed-off-by: Pau Ruiz Safont <[email protected]>

) networkd generates metrics for two users simultaneously: * xapi db * rrdd Both of these read from the same shared file, but use non-overlapping stats. Having moved network metrics collection from xcp-rrdd itself into a plugin, these metrics were serialized twice - moving from networkd to the plugin and from the plugin to the server. Instead generate metrics in the plugin itself and drop this generation from networkd.

* New function Uuidx.make_v7_uuid, with the idea being that ordering v7 UUIDs alphabetically will also order them by creation time. This requires uuidm v0.9.9, as that contains the code for constructing a v7 UUID from a time and some random bytes. * There is a function for generating v7 from known inputs, for the purpose of unit testing. Arguably this is pointless to have unit tests for third-party code, but the tests were written to test code that was submitted to uuidm only later, and I'm always loathe to delete tests. Signed-off-by: Robin Newton <[email protected]>

* New function Uuidx.make_v7_uuid, with the idea being that ordering v7 UUIDs alphabetically will also order them by creation time * The values produced by Uuidx.make_uuid_urnd hadn't necessarily been valid UUIDs, since the variant and version fields were being filled in randomly - this is now fixed so that it returns v4 UUIDs. * There are a couple of functions for generating v4 and v7 from known inputs, for the purpose of unit testing. (The v4 function is mainly there so I could check the setting of variant and version fields by comparing the output with that which Python's UUID module produces.)

Signed-off-by: Konstantina Chremmou <[email protected]>

Signed-off-by: Rob Hoes <[email protected]>

Signed-off-by: Vincent Liu <[email protected]>

…poses. (xapi-project#6059)

Signed-off-by: Rob Hoes <[email protected]>

The pvsproxy socket is available in both /opt/ and /run. Since /run is a more sensible location for a socket, use that one to allow the other to be removed in the future. Signed-off-by: Ross Lagerwall <[email protected]>

The function is now passed a plain fd rather than a Buf_io.t ("bio") value, as it does not actually use the buffered channel and just read from the fd directly (using `Http.read_http_request_header`). There used to be an older version if `request_from_bio` that had an option to read requests in a different way and that did use Buf_io. This was called the "slow path" and was removed in bc2ff45 in favour of the current "fast path". This further clean-up opportunity was missed at that time. Signed-off-by: Rob Hoes <[email protected]>

The function `check_reusable_inner` used Buf_io to read a fixed-length HTTP response and then discarded the buffer. This is functionally the same as using `Unixext.really_read_string`, so do that instead. Signed-off-by: Rob Hoes <[email protected]>

At this point, the only function in the entire code base that read from a Buf_io.t is `Http_svr.read_body` (apart from a test for Buf_io). However, it only does so if the buffer is not empty and falls back to reading directly from the fd is not. And since nothing else reads from a Buf_io, the buffer is always empty... Signed-off-by: Rob Hoes <[email protected]>

The main difference between the BufIO and FdIO cases was that the former calls `assert_credentials_ok` with the `callback` in its `~fn` parameter, while the latter executed the `callback` directly after the credentials check. The function `assert_credentials_ok` either calls `fn` or raises an exception. Well, nearly... It did not actually call `fn` in the unix socket case, where checks are bypassed. This looks unintended and this patch corrects it. This only affects the following handlers in xapi, which use BufIO and require RBAC checks: post_remote_db_access, post_remote_db_access_v2, get_wlb_report, get_wlb_diagnostics, get_audit_log. I guess those were simply never used on the unix socket. The other thing that happens when using `~fn` is that the function `Rbac_audit.allowed_post_fn_ok` is called after `~fn`. This writes an "ALLOWED OK" line to the audit log. I don't see a reason not to do the same in all cases. The outcome is that now both cases of `add_handler` do the same and only the channel types are different. In the following commit the two handler types are joining into a single one, which is now easier. Signed-off-by: Rob Hoes <[email protected]>

HTTP handlers of type BufIO did not actually read from through the buffer at all. Instead, they all assert that the buffer is empty and then simply use the file descriptor. All HTTP handlers now directly use file descriptors. The handler type simply becomes: type 'a handler = Http.Request.t -> Unix.file_descr -> 'a -> unit Signed-off-by: Rob Hoes <[email protected]>

Signed-off-by: Rob Hoes <[email protected]>

The pvsproxy socket is available in both /opt/ and /run. Since /run is a more sensible location for a socket, use that one to allow the other to be removed in the future.

If you install the XAPI RPMs in your koji build environment (e.g. to build a package that depends on XAPI) then you couldn't build XAPI again anymore because its unit tests were failing. They were failing because they found some xapi hooks installed by the previous version of XAPI, whereas normally there'd be none when the unit tests are running. Disable running XAPI hooks during unit test, even if present we are not expected to run them. ``` [exception] Unix.Unix_error(Unix.ENOENT, "connect", "") Raised at Forkhelpers.execute_command_get_output_inner.(fun) in file "ocaml/forkexecd/lib/forkhelpers.ml", line 376, characters 10-19 Called from Xapi_stdext_pervasives__Pervasiveext.finally in file "ocaml/libs/xapi-stdext/lib/xapi-stdext-pervasives/pervasiveext.ml", line 24, characters 8-14 Re-raised at Xapi_stdext_pervasives__Pervasiveext.finally in file "ocaml/libs/xapi-stdext/lib/xapi-stdext-pervasives/pervasiveext.ml", line 39, characters 6-15 Called from Xapi_hooks.execute_hook.(fun) in file "ocaml/xapi/xapi_hooks.ml", line 77, characters 10-113 Called from Stdlib__Array.iter in file "array.ml", line 95, characters 31-48 Called from Xapi_host.destroy in file "ocaml/xapi/xapi_host.ml", line 1108, characters 2-98 Called from Dune__exe__Test_cluster_host.test_forget in file "ocaml/tests/test_cluster_host.ml", line 192, characters 2-42 Called from Alcotest_engine__Core.Make.protect_test.(fun) in file "src/alcotest-engine/core.ml", line 181, characters 17-23 Called from Alcotest_engine__Monad.Identity.catch in file "src/alcotest-engine/monad.ml", line 24, characters 31-35 ```

Cleaning up the attic... It turned out that the `Buf_io` module, while used in the type signature of every HTTP handler, was not used for anything useful anymore. Step by step rationale in the commit messages, so best reviewed commit by commit.

When these metrics were collected internally, Xenctrl was queried every 5 seconds. After being split into plugins, they started querying domains (and other information) only on startup, so couldn't pick up new VMs and report their metrics without restarting. Signed-off-by: Andrii Sultanov <[email protected]>

Remove all the gating on cluster_health enabled as an experimental feature now that it is enabled by default. Signed-off-by: Vincent Liu <[email protected]>

Remove all the gating on cluster_health enabled as an experimental feature now that it is enabled by default.

xapi-project#6067) When these metrics were collected internally, Xenctrl was queried every 5 seconds. After being split into plugins, they started querying domains (and other information) only on startup, so couldn't pick up new VMs and report their metrics without restarting.

SHA256 and SHA1 certificates' fingerprints do not get populated when the database is upgraded, so empty values need to be detected and amended on startup. Signed-off-by: Pau Ruiz Safont <[email protected]> Signed-off-by: Steven Woods <[email protected]>

This allows the CA certificate to be removed from the DB even if the certificate file does not exist. Signed-off-by: Steven Woods <[email protected]>

@stormi

…project#6006) Also CP-51527: Add --force option to pool-uninstall-ca-certificate. Addresses the issues raised here by @stormi xapi-project#5955

Without it, stats for bond's interfaces are not identified correctly. Fixes: bd4dda5 (IH-715 - rrdp-netdev: Remove double (de)serialization) Signed-off-by: Andrii Sultanov <[email protected]>

…api-project#6075) Without it, stats for bond's interfaces are not identified correctly. Fixes: bd4dda5 (IH-715 - rrdp-netdev: Remove double (de)serialization)

…ol-join Merge master to easier-pool-join

edwintorok and others added 30 commits October 2, 2024 15:12

http-lib: add backtrace to logs on connection without response

008a813

Signed-off-by: Pau Ruiz Safont <[email protected]>

http-lib: convert bash script to cram tests

ed96146

Gives more flexibility in tests. Now the results from the client aren't printed, but weren't important to pass the test anyway. Signed-off-by: Pau Ruiz Safont <[email protected]>

http-lib: prepare test client for more commands

3bef65f

Current behaviour for displaying stats is done with the --perf parameter Signed-off-by: Pau Ruiz Safont <[email protected]>

http-lib: use let@ for perf testing of the client

cda6194

No functional difference Signed-off-by: Pau Ruiz Safont <[email protected]>

CA-399256: Ensure AD domain name check is case insensitive

aa09cb8

Signed-off-by: Elijah Sadorra <[email protected]>

fixup! IH-715 - rrdp-netdev: Remove double (de)serialization

2715234

Signed-off-by: Andrii Sultanov <[email protected]>

CA-399256: Ensure AD domain name check is case insensitive (xapi-proj…

01b6205

…ect#6047)

maintenance: write interface files for vhd-tool

6631d38

This detects some unused bindings and a mutable field. Signed-off-by: Pau Ruiz Safont <[email protected]>

maintenance: add interface to vhd-tool's Chunked

c4a9e25

Also change the interface and explain the meaning behind the values. Signed-off-by: Pau Ruiz Safont <[email protected]>

maintenance: add interface files for vhd-tool (xapi-project#6052)

7670247

This detects some unused bindings and a mutable field. Chunked got also documented and changed the interface to make it more understandable to use.

Revert "CP-48676: Don't check resuable pool session validity by default"

310429c

This reverts commit c27b1d4. Signed-off-by: Edwin Török <[email protected]>

Revert "CP-48676: Reuse pool sessions on slave logins."

76008ce

This reverts commit af68185. Signed-off-by: Edwin Török <[email protected]>

maintenance: remove unused code from stream_vdi

ab2acfc

The code to extract vdis from geneva / zurich releases has been unused for years Signed-off-by: Pau Ruiz Safont <[email protected]>

Revert changes causing deadlock (xapi-project#6053)

bb41b46

Workaround for CA-400339. This'll allow us to get a proper fix in, without having to rush that change.

maintenance: remove unused code from stream_vdi (xapi-project#6054)

97565ca

The code to extract vdis from geneva / zurich releases has been unused for years This comment was amusing: ``` (* XXX: this is totally wrong: *) ```

chore: update datamodel versions

445ef24

Signed-off-by: Pau Ruiz Safont <[email protected]>

chore: update datamodel versions (xapi-project#6055)

9eb5740

kc284 and others added 26 commits October 16, 2024 16:12

Python command correction.

3a727d2

Signed-off-by: Konstantina Chremmou <[email protected]>

Remove unused Http_svr.Chunked module

80528e0

Signed-off-by: Rob Hoes <[email protected]>

chore: Fix some grammatical errors in cluster alerts

1113299

Signed-off-by: Vincent Liu <[email protected]>

Small correction to PR#6058 and a C# SDK addition for client L10n pur…

34352ac

…poses. (xapi-project#6059)

chore: Fix some grammatical errors in cluster alerts (xapi-project#6062)

ad018ce

buf_io: remove unused function input_line

8465e1b

Signed-off-by: Rob Hoes <[email protected]>

Access pvsproxy via a socket in /run

5770f42

The pvsproxy socket is available in both /opt/ and /run. Since /run is a more sensible location for a socket, use that one to allow the other to be removed in the future. Signed-off-by: Ross Lagerwall <[email protected]>

xmlrpc_client: remove us of Buf_io

a49ae63

The function `check_reusable_inner` used Buf_io to read a fixed-length HTTP response and then discarded the buffer. This is functionally the same as using `Unixext.really_read_string`, so do that instead. Signed-off-by: Rob Hoes <[email protected]>

Remove now-unused Buf_io and associated tests

8e02455

Signed-off-by: Rob Hoes <[email protected]>

Access pvsproxy via a socket in /run (xapi-project#6063)

3ae129d

The pvsproxy socket is available in both /opt/ and /run. Since /run is a more sensible location for a socket, use that one to allow the other to be removed in the future.

CP-51683: Make Cluster_health non-exp feature

c647985

Remove all the gating on cluster_health enabled as an experimental feature now that it is enabled by default. Signed-off-by: Vincent Liu <[email protected]>

CP-51683: Make Cluster_health non-exp feature (xapi-project#6023)

daa9938

Remove all the gating on cluster_health enabled as an experimental feature now that it is enabled by default.

CP-51527: Add --force option to pool-uninstall-ca-certificate

ed90086

This allows the CA certificate to be removed from the DB even if the certificate file does not exist. Signed-off-by: Steven Woods <[email protected]>

CA-398341: Populate fingerprints of CA certificates on startup (xapi-…

97aa03f

…project#6006) Also CP-51527: Add --force option to pool-uninstall-ca-certificate. Addresses the issues raised here by @stormi xapi-project#5955

CA-400924 - networkd: Add bonds to devs in network_monitor_thread

98384e8

Without it, stats for bond's interfaces are not identified correctly. Fixes: bd4dda5 (IH-715 - rrdp-netdev: Remove double (de)serialization) Signed-off-by: Andrii Sultanov <[email protected]>

CA-400924 - networkd: Add bonds to devs in network_monitor_thread (x…

46f0f42

…api-project#6075) Without it, stats for bond's interfaces are not identified correctly. Fixes: bd4dda5 (IH-715 - rrdp-netdev: Remove double (de)serialization)

Merge branch 'master' into private/bengangy/merge-master-to-easier-po…

0f093b8

…ol-join Merge master to easier-pool-join

robhoes approved these changes Oct 24, 2024

View reviewed changes

minglumlu approved these changes Oct 24, 2024

View reviewed changes

minglumlu merged commit d10a8c0 into xapi-project:feature/easier-pool-join Oct 24, 2024
15 checks passed

BengangY deleted the private/bengangy/merge-master-to-easier-pool-join branch February 5, 2025 01:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

merge master to feature/easier-pool-join #6079

merge master to feature/easier-pool-join #6079

Uh oh!

BengangY commented Oct 24, 2024

Uh oh!

Uh oh!

Uh oh!

merge master to feature/easier-pool-join #6079

merge master to feature/easier-pool-join #6079

Uh oh!

Conversation

BengangY commented Oct 24, 2024

Uh oh!

Uh oh!

Uh oh!