-
Notifications
You must be signed in to change notification settings - Fork 7.5k
mempool allocator can return with no allocation even if memory is available #14508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
There is another bug - I will not create separate issue as the effect is the same (allocation will fail with memory available). Assume following level 0 is empty, level 1 is 3/4 free and 1 chunk is allocated. |
FYI @andyross |
I have pushed the fixed. |
Heh, never mind. You already have a patch up. You take it :) |
Yeah, we can't do it this way. The fix effectively wraps the whole allocation in a lock, as I understand it. That's counter to the original design intent. Maybe try flagging the existence of a race somehow with a "last thread to enter allocator" state variable or something as a cue to retry? |
Hi @andyross , if you check again you will notice that the amount of the code under the lock is not really increased. What was added is (essentially) checking if free list on n level is empty. And you need to do it under a lock anyway as otherwise content of the list may be broken. Costly operations of alloc and break are already under a lock. Yes, there is an alternative. Each time allocator state is updated a token could be changed. Block allocator could take a token, do barrier, try allocating and in case there is nomem, grab the current token and if it changed do a retry. I was thinking about this solution but the main problem is that it is unbound. Second thing is that, as I written above, the change is bringing under a lock only checks which levels are empty (which is really cheap operation). The third thing is that such solutions are prone to bugs. The current state is away from the original design already as locks were introduced to handle various races conditions. Please review the code again - the number of operations added under the lock is really small. |
I swear I see it holding the lock around a for() loop, no? Can you split the patch to separate fixes from each other, and move the changes to formatting and naming into a early refactoring patch? I'll take a closer look. But the core requirement here is that the lock needs to be held for absolutely minimum amounts of time. FWIW: I just very quickly through together a "retry" version of the fix. It seems to pass tests, and looks like this: https://gist.github.com/andyross/e94fbb8f2635e26c9c714106dd4006d5 Obviously it involves extra CPU work in the case of contention, but the worst case latency is 100% unchanged. I'm bailing out after four tries right now, which IMHO is acceptable (trying and failing to get memory in the face of that much contention is almost certainly a legitimate runtime failure!). But even that could be handled one level up at the kernel mempool layer, where we can block. We could provide a "block new allocators" and "wakeup" signal at that level, which might be a better idea still. It's not 100% clear to me if this patch alone fixes the issues you saw or if there's more complexity. But broadly this is I think where we want to be as far as design. |
@andyross under the for loop you have I would say what I said before: nobody would like to use a very fast tool that gives incorrect results. |
Please please please don't start fights. I'm trying to understand your patch, I promise. Calm down. I can't merge what I don't understand. Please split the patch. I see at least three separate issues (I think):
Numbers 1. and 3. seem like easy things to review and approve, making the discussion about #2 a little more obvious. |
This was not my intention. I will try to be more careful with comments next time.
I will. Be back later today. |
Describe the bug
I was wondering if #14504 could happen if memory is actually available. I reviewed the code again and I think it can. This is theoretical scenario, I have not yet tried to replicate it.
The
_sys_mem_pool_block_alloc
has two loops. First is looking which level at which blocks are available (free_l
and level at which we would like to make an allocationalloc_l
). Second is allocating and if allocation is done in the higher level then it is splitting the allocated block.Assume preconditions where no allocation was made and only the biggest block is available (i.e. only level 0 is non-empty). Assume we want to alloc small block from bottom level (anything but the biggest block actually).
Imagine a following race. Context A (e.g. thread) is traversing the first loop. As nothing was broken down yet it finds that the only non empty level (i.e. with free blocks) is level 0. It also finds a level at which it would like to make allocation which will be greater then zero and is not really important.
After finding two indexes but before context A can take a lock and do any allocation a high priority context B (e.g. interrupt) is scheduled and takes over the CPU. It does the same traverse as did context A and finds that only level 0 is non empty. It takes the lock and and enters the second loop. Only level 0 had non empty list of free blocks so it will loop once. Block allocation succeeds and data is not null and since we wanted smaller chunk it will be broken down filling in lists on higher levels. Note here that level zero had only one block so allocating it made the list empty. Context B is done and context A is scheduled back. It resumes just before traversing the second loop. It found that only level 0 has non empty list so it will loop only once. It tries to allocate but fails as list is empty (context B tool the only available element). As this was the only cycle loop finishes with nothing allocated. data is NULL and success is reported (check the other bug).
To Reproduce
Steps to reproduce the behavior:
See description
Expected behavior
Allocation should always give block if memory is available.
Impact
showstopper
Screenshots or console output
N/A
Environment (please complete the following information):
ncs zephyr
745326266d9c32ba3aa67d5af1f727059d9a88e3
(fromd9876be
upstream)Additional context
N/A
The text was updated successfully, but these errors were encountered: