-
Notifications
You must be signed in to change notification settings - Fork 292
CA-404020: Do not fail when removing a non-existing datasource #6199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CA-404020: Do not fail when removing a non-existing datasource #6199
Conversation
The latest changes in the metrics daemon changes how archived and live metrics are synchronised, and archived sources are created less often. This meant that on some cases, like SR.forget, the operations failed because they could query for existing sources, but the call to remove them failed, because the metrics only exist in live form, not archived one. (what happens with the live one, do they disappear when the SR is forgotten?) Signed-off-by: Pau Ruiz Safont <[email protected]>
Also removes some traversals when reading from XML data Signed-off-by: Pau Ruiz Safont <[email protected]>
Signed-off-by: Pau Ruiz Safont <[email protected]>
inner rrd n | ||
) | ||
rrd removals_required | ||
List.sort_uniq String.compare ds_names |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need to sort here? Then still use Utils.setify ds_names
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to drop setify in as many places as possible from the codebase, it has a cost of O(n^2). sort_uniq should be faster in most cases. Since we don't need to maintain order of the elements the latter can be used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to avoid O(n^2) behaviour then we could use List.sort instead of List.sort_uniq, and see how many consecutive equal elements we find: those are the duplicates that we need to remove.
List.filter_map + List.filter is still O(n^2).
Would you please share how archived and live metrics are synchronized? With this fix, is it possible for some entry to be synchronized from live to archived after an attempt to remove it by |
The change is about datasources, not metrics. Archived datasources hold metadata when xapi asks to disable or enable a datasource. This bit of information is not stored in the datastructures of the live datasource, and its lifetime is also independent, which means it can't be stored easily in the live datastructure because the former needs to survive the latter when the datasource stops reporting. |
Remove can be called at any time, so data races are possible, theoretically. In practice I haven't observed a mechanism that synchronises live datasources with archival / default ones. Datasources about SRs are in a specially awkward place because they are reported currently as hosts metrics. This means that once they exist as live datasources, they are never removed from memory, because there's no mechanism for that. There's a mechanism to manage SR datasources independently, and there's a lot of code for it, but it's currently unused. VM do not have this problem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The optimization discussed to avoid List.filter + List.filter_map could be continued in another PR, this PR is already an improvement over what was there before.
The problem here, as @psafont suggested, is that |
That's a limitation of the current design, and needs careful readjusting of all the SR-related metrics: all the names need to be changed (do multiple hosts report about the same SR?). I would avoid trying to fix it here, as it adds quite a lot of risk in the change. |
The latest changes in the metrics daemon changes how archived and live metrics
are synchronised, and archived sources are created less often. This meant that
on some cases, like SR.forget, the operations failed because they could query
for existing sources, but the call to remove them failed, because the metrics
only exist in live form, not archived one.
(what happens with the live one, do they disappear when the SR is forgotten?)
Signed-off-by: Pau Ruiz Safont [email protected]