-
Notifications
You must be signed in to change notification settings - Fork 292
merge master to feature/easier-pool-join #6079
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
minglumlu
merged 59 commits into
xapi-project:feature/easier-pool-join
from
BengangY:private/bengangy/merge-master-to-easier-pool-join
Oct 24, 2024
Merged
merge master to feature/easier-pool-join #6079
minglumlu
merged 59 commits into
xapi-project:feature/easier-pool-join
from
BengangY:private/bengangy/merge-master-to-easier-pool-join
Oct 24, 2024
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
If you install the XAPI RPMs in your koji build environment (e.g. to build a package that depends on XAPI) then you couldn't build XAPI again anymore because its unit tests were failing. They were failing because they found some xapi hooks installed by the previous version of XAPI, whereas normally there'd be none when the unit tests are running. Disable running XAPI hooks during unit test, even if present we are not expected to run them. ``` [exception] Unix.Unix_error(Unix.ENOENT, "connect", "") Raised at Forkhelpers.execute_command_get_output_inner.(fun) in file "ocaml/forkexecd/lib/forkhelpers.ml", line 376, characters 10-19 Called from Xapi_stdext_pervasives__Pervasiveext.finally in file "ocaml/libs/xapi-stdext/lib/xapi-stdext-pervasives/pervasiveext.ml", line 24, characters 8-14 Re-raised at Xapi_stdext_pervasives__Pervasiveext.finally in file "ocaml/libs/xapi-stdext/lib/xapi-stdext-pervasives/pervasiveext.ml", line 39, characters 6-15 Called from Xapi_hooks.execute_hook.(fun) in file "ocaml/xapi/xapi_hooks.ml", line 77, characters 10-113 Called from Stdlib__Array.iter in file "array.ml", line 95, characters 31-48 Called from Xapi_host.destroy in file "ocaml/xapi/xapi_host.ml", line 1108, characters 2-98 Called from Dune__exe__Test_cluster_host.test_forget in file "ocaml/tests/test_cluster_host.ml", line 192, characters 2-42 Called from Alcotest_engine__Core.Make.protect_test.(fun) in file "src/alcotest-engine/core.ml", line 181, characters 17-23 Called from Alcotest_engine__Monad.Identity.catch in file "src/alcotest-engine/monad.ml", line 24, characters 31-35 ``` Signed-off-by: Edwin Török <[email protected]>
to connect to nbd devices, nbd_client_manager will 1. protect the operation with /var/run/nonpersistent/nbd_client_manager file lock 2. check whether nbd is being used by `nbd-client -check` 3. load nbd kernel module by `modprobe nbd` 4. call `nbd-client` to connect to nbd device However, step 3 will trigger systemd-udevd run asyncly, which would open and lock the same nbd devices, run udev rules, etc. This introduce races with step 4, e.g. both process want to open and lock the nbd device. Note: the file lock in step 1 does NOT resovle the issue here, as it only coordinate multiple nbd_client_manager processes. To fix the issue, - we patch nbd-client to report the device busy from kernel to nbd_client_manager - nbd_client_manager should check nbd-client exit code, and retry on device busy - nbd_client_manager call `udevadm settle` to wait for udevd parsing udev rules Note: checking nbd-client exit code is still necessary in case of racing with others Signed-off-by: Lin Liu <[email protected]>
networkd generates metrics for two users simultaneously: * xapi db * rrdd Both of these read from the same shared file, but use non-overlapping stats. Having moved network metrics collection from xcp-rrdd itself into a plugin, these metrics were serialized twice - moving from networkd to the plugin and from the plugin to the server. Instead generate metrics in the plugin itself and drop this generation from networkd. Signed-off-by: Andrii Sultanov <[email protected]>
Signed-off-by: Pau Ruiz Safont <[email protected]>
Gives more flexibility in tests. Now the results from the client aren't printed, but weren't important to pass the test anyway. Signed-off-by: Pau Ruiz Safont <[email protected]>
Current behaviour for displaying stats is done with the --perf parameter Signed-off-by: Pau Ruiz Safont <[email protected]>
While this does not exercise the exact error that can happen in long migrations, it gets logged in a similar way. There's no easy way to trigger the issue, the best chance is to send a malformed response to trigger a Parse_error. I did modify the code in http_client and verified that current code can produce the logging, with backtraces successfully, when set up properly (like in the test client) Signed-off-by: Pau Ruiz Safont <[email protected]>
No functional difference Signed-off-by: Pau Ruiz Safont <[email protected]>
Taking measurements in practice doesn't lead to improved accuracy. Also change the tests so more than one sample is collected and can know how noisy the measurements really are. Here's an example of a run, including the result before the change: ``` $ ./test_client.exe --perf - 1 thread non-persistent connections: 4896.0 +/- 0.0 RPCs/sec - 1 thread non-persistent connections (query): 4811.0 +/- 0.0 RPCs/sec - 10 threads non-persistent connections: 7175.0 +/- 0.0 RPCs/sec - 1 thread persistent connection: 16047.0 +/- 0.0 RPCs/sec - 10 threads persistent connections: 7713.0 +/- 0.0 RPCs/sec + 1 thread non-persistent connections: 5042.0 +/- 247.5 RPCs/sec + 1 thread non-persistent connections (query): 5173.0 +/- 216.0 RPCs/sec + 10 threads non-persistent connections: 7678.0 +/- 2241.2 RPCs/sec + 1 thread persistent connection: 21814.0 +/- 2124.6 RPCs/sec + 10 threads persistent connections: 10154.0 +/- 2461.9 RPCs/sec ``` Signed-off-by: Pau Ruiz Safont <[email protected]>
Signed-off-by: Elijah Sadorra <[email protected]>
Signed-off-by: Andrii Sultanov <[email protected]>
…project#6028) Some long-running migrations stop because of a loss of connection, log more information when it happens. I couldn't find a way to get the backtrace to be printed in a nice way without adding too much code, this also makes the change backportable. I would prefer to log it at a debug level, but the function doesn't expose it, and it would complicate backpoerting as well.
Folding over a list to add its elements to a set (which is initially empty) is operationally equivalent to calling of_list (of the set), but potentially less efficient. The implementation of of_list only uses "add" for small lists, e.g. the cases for lists [x_1; x_2; ...; x_N] for all N in range 2 <= N <= 5 are matched literally and expanded to: add x_N (... (add x_1 (singleton x_0))) However, larger lists are first sorted and the underlying tree representing the set is constructed directly. Signed-off-by: Colin James <[email protected]>
Folding over a list to add its elements to a set (which is initially empty) is operationally equivalent to calling of_list (of the set), but potentially less efficient. The implementation of of_list only uses `add` for small lists, e.g. the cases for lists `[x_1; x_2; ...; x_N]` for all `N` in range `2 <= N <= 5` are matched literally and expanded to: ``` add x_N (... (add x_1 (singleton x_0))) ``` However, larger lists are first sorted and the underlying tree representing the set is constructed directly. --- This is a stray change I cherry-picked from another branch.
This detects some unused bindings and a mutable field. Signed-off-by: Pau Ruiz Safont <[email protected]>
Also change the interface and explain the meaning behind the values. Signed-off-by: Pau Ruiz Safont <[email protected]>
The datastructure is mean to serialize the offset and length of a piece of disk, not its data. This also frontloads the possible conversion failure to the creation of the datastructure. Signed-off-by: Pau Ruiz Safont <[email protected]>
This detects some unused bindings and a mutable field. Chunked got also documented and changed the interface to make it more understandable to use.
This reverts commit c27b1d4. Signed-off-by: Edwin Török <[email protected]>
This reverts commit af68185. Signed-off-by: Edwin Török <[email protected]>
The code to extract vdis from geneva / zurich releases has been unused for years Signed-off-by: Pau Ruiz Safont <[email protected]>
…project#6021) to connect to nbd devices, nbd_client_manager will 1. protect the operation with /var/run/nonpersistent/nbd_client_manager file lock 2. check whether nbd is being used by `nbd-client -check` 3. load nbd kernel module by `modprobe nbd` 4. call `nbd-client` to connect to nbd device However, step 3 will trigger systemd-udevd run asyncly, which would open and lock the same nbd devices, run udev rules, etc. This introduce races with step 4, e.g. both process want to open and lock the nbd device. Note: the file lock in step 1 does NOT resovle the issue here, as it only coordinate multiple nbd_client_manager processes. To fix the issue, - we patch nbd-client to report the device busy from kernel to nbd_client_manager - nbd_client_manager should check nbd-client exit code, and retry on device busy
Workaround for CA-400339. This'll allow us to get a proper fix in, without having to rush that change.
The code to extract vdis from geneva / zurich releases has been unused for years This comment was amusing: ``` (* XXX: this is totally wrong: *) ```
Signed-off-by: Pau Ruiz Safont <[email protected]>
) networkd generates metrics for two users simultaneously: * xapi db * rrdd Both of these read from the same shared file, but use non-overlapping stats. Having moved network metrics collection from xcp-rrdd itself into a plugin, these metrics were serialized twice - moving from networkd to the plugin and from the plugin to the server. Instead generate metrics in the plugin itself and drop this generation from networkd.
* New function Uuidx.make_v7_uuid, with the idea being that ordering v7 UUIDs alphabetically will also order them by creation time. This requires uuidm v0.9.9, as that contains the code for constructing a v7 UUID from a time and some random bytes. * There is a function for generating v7 from known inputs, for the purpose of unit testing. Arguably this is pointless to have unit tests for third-party code, but the tests were written to test code that was submitted to uuidm only later, and I'm always loathe to delete tests. Signed-off-by: Robin Newton <[email protected]>
* New function Uuidx.make_v7_uuid, with the idea being that ordering v7 UUIDs alphabetically will also order them by creation time * The values produced by Uuidx.make_uuid_urnd hadn't necessarily been valid UUIDs, since the variant and version fields were being filled in randomly - this is now fixed so that it returns v4 UUIDs. * There are a couple of functions for generating v4 and v7 from known inputs, for the purpose of unit testing. (The v4 function is mainly there so I could check the setting of variant and version fields by comparing the output with that which Python's UUID module produces.)
Signed-off-by: Konstantina Chremmou <[email protected]>
Signed-off-by: Rob Hoes <[email protected]>
Signed-off-by: Vincent Liu <[email protected]>
Signed-off-by: Rob Hoes <[email protected]>
The pvsproxy socket is available in both /opt/ and /run. Since /run is a more sensible location for a socket, use that one to allow the other to be removed in the future. Signed-off-by: Ross Lagerwall <[email protected]>
The function is now passed a plain fd rather than a Buf_io.t ("bio") value, as it does not actually use the buffered channel and just read from the fd directly (using `Http.read_http_request_header`). There used to be an older version if `request_from_bio` that had an option to read requests in a different way and that did use Buf_io. This was called the "slow path" and was removed in bc2ff45 in favour of the current "fast path". This further clean-up opportunity was missed at that time. Signed-off-by: Rob Hoes <[email protected]>
The function `check_reusable_inner` used Buf_io to read a fixed-length HTTP response and then discarded the buffer. This is functionally the same as using `Unixext.really_read_string`, so do that instead. Signed-off-by: Rob Hoes <[email protected]>
At this point, the only function in the entire code base that read from a Buf_io.t is `Http_svr.read_body` (apart from a test for Buf_io). However, it only does so if the buffer is not empty and falls back to reading directly from the fd is not. And since nothing else reads from a Buf_io, the buffer is always empty... Signed-off-by: Rob Hoes <[email protected]>
The main difference between the BufIO and FdIO cases was that the former calls `assert_credentials_ok` with the `callback` in its `~fn` parameter, while the latter executed the `callback` directly after the credentials check. The function `assert_credentials_ok` either calls `fn` or raises an exception. Well, nearly... It did not actually call `fn` in the unix socket case, where checks are bypassed. This looks unintended and this patch corrects it. This only affects the following handlers in xapi, which use BufIO and require RBAC checks: post_remote_db_access, post_remote_db_access_v2, get_wlb_report, get_wlb_diagnostics, get_audit_log. I guess those were simply never used on the unix socket. The other thing that happens when using `~fn` is that the function `Rbac_audit.allowed_post_fn_ok` is called after `~fn`. This writes an "ALLOWED OK" line to the audit log. I don't see a reason not to do the same in all cases. The outcome is that now both cases of `add_handler` do the same and only the channel types are different. In the following commit the two handler types are joining into a single one, which is now easier. Signed-off-by: Rob Hoes <[email protected]>
HTTP handlers of type BufIO did not actually read from through the buffer at all. Instead, they all assert that the buffer is empty and then simply use the file descriptor. All HTTP handlers now directly use file descriptors. The handler type simply becomes: type 'a handler = Http.Request.t -> Unix.file_descr -> 'a -> unit Signed-off-by: Rob Hoes <[email protected]>
Signed-off-by: Rob Hoes <[email protected]>
The pvsproxy socket is available in both /opt/ and /run. Since /run is a more sensible location for a socket, use that one to allow the other to be removed in the future.
If you install the XAPI RPMs in your koji build environment (e.g. to build a package that depends on XAPI) then you couldn't build XAPI again anymore because its unit tests were failing. They were failing because they found some xapi hooks installed by the previous version of XAPI, whereas normally there'd be none when the unit tests are running. Disable running XAPI hooks during unit test, even if present we are not expected to run them. ``` [exception] Unix.Unix_error(Unix.ENOENT, "connect", "") Raised at Forkhelpers.execute_command_get_output_inner.(fun) in file "ocaml/forkexecd/lib/forkhelpers.ml", line 376, characters 10-19 Called from Xapi_stdext_pervasives__Pervasiveext.finally in file "ocaml/libs/xapi-stdext/lib/xapi-stdext-pervasives/pervasiveext.ml", line 24, characters 8-14 Re-raised at Xapi_stdext_pervasives__Pervasiveext.finally in file "ocaml/libs/xapi-stdext/lib/xapi-stdext-pervasives/pervasiveext.ml", line 39, characters 6-15 Called from Xapi_hooks.execute_hook.(fun) in file "ocaml/xapi/xapi_hooks.ml", line 77, characters 10-113 Called from Stdlib__Array.iter in file "array.ml", line 95, characters 31-48 Called from Xapi_host.destroy in file "ocaml/xapi/xapi_host.ml", line 1108, characters 2-98 Called from Dune__exe__Test_cluster_host.test_forget in file "ocaml/tests/test_cluster_host.ml", line 192, characters 2-42 Called from Alcotest_engine__Core.Make.protect_test.(fun) in file "src/alcotest-engine/core.ml", line 181, characters 17-23 Called from Alcotest_engine__Monad.Identity.catch in file "src/alcotest-engine/monad.ml", line 24, characters 31-35 ```
Cleaning up the attic... It turned out that the `Buf_io` module, while used in the type signature of every HTTP handler, was not used for anything useful anymore. Step by step rationale in the commit messages, so best reviewed commit by commit.
When these metrics were collected internally, Xenctrl was queried every 5 seconds. After being split into plugins, they started querying domains (and other information) only on startup, so couldn't pick up new VMs and report their metrics without restarting. Signed-off-by: Andrii Sultanov <[email protected]>
Remove all the gating on cluster_health enabled as an experimental feature now that it is enabled by default. Signed-off-by: Vincent Liu <[email protected]>
Remove all the gating on cluster_health enabled as an experimental feature now that it is enabled by default.
xapi-project#6067) When these metrics were collected internally, Xenctrl was queried every 5 seconds. After being split into plugins, they started querying domains (and other information) only on startup, so couldn't pick up new VMs and report their metrics without restarting.
SHA256 and SHA1 certificates' fingerprints do not get populated when the database is upgraded, so empty values need to be detected and amended on startup. Signed-off-by: Pau Ruiz Safont <[email protected]> Signed-off-by: Steven Woods <[email protected]>
This allows the CA certificate to be removed from the DB even if the certificate file does not exist. Signed-off-by: Steven Woods <[email protected]>
…project#6006) Also CP-51527: Add --force option to pool-uninstall-ca-certificate. Addresses the issues raised here by @stormi xapi-project#5955
Without it, stats for bond's interfaces are not identified correctly. Fixes: bd4dda5 (IH-715 - rrdp-netdev: Remove double (de)serialization) Signed-off-by: Andrii Sultanov <[email protected]>
…api-project#6075) Without it, stats for bond's interfaces are not identified correctly. Fixes: bd4dda5 (IH-715 - rrdp-netdev: Remove double (de)serialization)
…ol-join Merge master to easier-pool-join
robhoes
approved these changes
Oct 24, 2024
minglumlu
approved these changes
Oct 24, 2024
d10a8c0
into
xapi-project:feature/easier-pool-join
15 checks passed
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
merge master to feature/easier-pool-join