mempool allocator can return with no allocation even if memory is available #14508

pdunaj · 2019-03-14T09:29:52Z

Describe the bug
I was wondering if #14504 could happen if memory is actually available. I reviewed the code again and I think it can. This is theoretical scenario, I have not yet tried to replicate it.

The _sys_mem_pool_block_alloc has two loops. First is looking which level at which blocks are available (free_l and level at which we would like to make an allocation alloc_l). Second is allocating and if allocation is done in the higher level then it is splitting the allocated block.

Assume preconditions where no allocation was made and only the biggest block is available (i.e. only level 0 is non-empty). Assume we want to alloc small block from bottom level (anything but the biggest block actually).
Imagine a following race. Context A (e.g. thread) is traversing the first loop. As nothing was broken down yet it finds that the only non empty level (i.e. with free blocks) is level 0. It also finds a level at which it would like to make allocation which will be greater then zero and is not really important.
After finding two indexes but before context A can take a lock and do any allocation a high priority context B (e.g. interrupt) is scheduled and takes over the CPU. It does the same traverse as did context A and finds that only level 0 is non empty. It takes the lock and and enters the second loop. Only level 0 had non empty list of free blocks so it will loop once. Block allocation succeeds and data is not null and since we wanted smaller chunk it will be broken down filling in lists on higher levels. Note here that level zero had only one block so allocating it made the list empty. Context B is done and context A is scheduled back. It resumes just before traversing the second loop. It found that only level 0 has non empty list so it will loop only once. It tries to allocate but fails as list is empty (context B tool the only available element). As this was the only cycle loop finishes with nothing allocated. data is NULL and success is reported (check the other bug).

To Reproduce
Steps to reproduce the behavior:
See description

Expected behavior
Allocation should always give block if memory is available.

Impact
showstopper

Screenshots or console output
N/A

Environment (please complete the following information):
ncs zephyr 745326266d9c32ba3aa67d5af1f727059d9a88e3 (from d9876be upstream)

Additional context
N/A

The text was updated successfully, but these errors were encountered:

pdunaj · 2019-03-14T09:45:40Z

There is another bug - I will not create separate issue as the effect is the same (allocation will fail with memory available).

Assume following level 0 is empty, level 1 is 3/4 free and 1 chunk is allocated.
Reproduction will be as follows. Context A starts traverse of the first loop. It checks that level 0 is empty and continues. At this point high priority context B takes over and does free. When free is done all blocks on level 1 are merged and put back to level 0 as one big block. Conext B finishes and context A is scheduled back. It continue with the loop at level 1 and sees that there are no free blocks on this level nor on any higher level. It will finish after loop is done with -ENOMEM.

pdunaj · 2019-03-14T09:47:10Z

FYI @andyross

pdunaj · 2019-03-14T11:45:45Z

I have pushed the fixed.
Shame I have not seen the problem the previous time I reviewed this code... :(

andyross · 2019-03-14T15:43:23Z

Heh, never mind. You already have a patch up. You take it :)

andyross · 2019-03-14T15:54:22Z

Yeah, we can't do it this way. The fix effectively wraps the whole allocation in a lock, as I understand it. That's counter to the original design intent. Maybe try flagging the existence of a race somehow with a "last thread to enter allocator" state variable or something as a cue to retry?

pdunaj · 2019-03-14T16:11:59Z

Hi @andyross , if you check again you will notice that the amount of the code under the lock is not really increased. What was added is (essentially) checking if free list on n level is empty. And you need to do it under a lock anyway as otherwise content of the list may be broken. Costly operations of alloc and break are already under a lock.

Yes, there is an alternative. Each time allocator state is updated a token could be changed. Block allocator could take a token, do barrier, try allocating and in case there is nomem, grab the current token and if it changed do a retry. I was thinking about this solution but the main problem is that it is unbound. Second thing is that, as I written above, the change is bringing under a lock only checks which levels are empty (which is really cheap operation). The third thing is that such solutions are prone to bugs.

The current state is away from the original design already as locks were introduced to handle various races conditions. Please review the code again - the number of operations added under the lock is really small.

andyross · 2019-03-14T16:21:08Z

I swear I see it holding the lock around a for() loop, no? Can you split the patch to separate fixes from each other, and move the changes to formatting and naming into a early refactoring patch? I'll take a closer look.

But the core requirement here is that the lock needs to be held for absolutely minimum amounts of time.

FWIW: I just very quickly through together a "retry" version of the fix. It seems to pass tests, and looks like this: https://gist.github.com/andyross/e94fbb8f2635e26c9c714106dd4006d5

Obviously it involves extra CPU work in the case of contention, but the worst case latency is 100% unchanged. I'm bailing out after four tries right now, which IMHO is acceptable (trying and failing to get memory in the face of that much contention is almost certainly a legitimate runtime failure!).

But even that could be handled one level up at the kernel mempool layer, where we can block. We could provide a "block new allocators" and "wakeup" signal at that level, which might be a better idea still.

It's not 100% clear to me if this patch alone fixes the issues you saw or if there's more complexity. But broadly this is I think where we want to be as far as design.

pdunaj · 2019-03-14T16:29:57Z

@andyross under the for loop you have block_alloc that first checks if any block is available on the level. Only the last block_alloc is expensive.
Token check and retry is unbound. You have no guarantee when started alloc will finish. Also you cannot simply bail out if memory is available in the system (although I agree the probability of this is low I don't agree this is a legitimate failure; e.g. low priority background task can be preempted by everything in the system increasing the probability of the failure). In the end this will bring more problems then it solves.

I would say what I said before: nobody would like to use a very fast tool that gives incorrect results.

andyross · 2019-03-14T16:35:49Z

I would say what I said before: nobody would like to use a very fast tool that gives incorrect results.

Please please please don't start fights. I'm trying to understand your patch, I promise. Calm down. I can't merge what I don't understand.

Please split the patch. I see at least three separate issues (I think):

there's a wrong return value from z_sys_mem_pool_block_alloc() when it exhausts the levels bug data is still NULL.
There's a race you're fixing
You're moving comments around and making formatting changes to unrelated code.

Numbers 1. and 3. seem like easy things to review and approve, making the discussion about #2 a little more obvious.

pdunaj · 2019-03-15T08:30:38Z

don't start fights

This was not my intention. I will try to be more careful with comments next time.

Please split the patch.

I will.
I will also try to benchmark the influence of the change and put some numbers. When I got back home I realized this was the thing I should do in the first place.

Be back later today.

pdunaj added bug The issue is a bug, or the PR is fixing a bug area: Kernel labels Mar 14, 2019

pdunaj mentioned this issue Mar 14, 2019

mempool can return success if no memory was available #14504

Closed

pdunaj mentioned this issue Mar 14, 2019

lib: mempool: Fix issues in mempool allocator #14514

Merged

rljordan-zz assigned pdunaj Mar 15, 2019

rljordan-zz added the priority: medium Medium impact/importance bug label Mar 15, 2019

galak closed this as completed in #14514 Mar 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

mempool allocator can return with no allocation even if memory is available #14508

mempool allocator can return with no allocation even if memory is available #14508

pdunaj commented Mar 14, 2019

pdunaj commented Mar 14, 2019

Uh oh!

pdunaj commented Mar 14, 2019

Uh oh!

pdunaj commented Mar 14, 2019

Uh oh!

andyross commented Mar 14, 2019

Uh oh!

andyross commented Mar 14, 2019

Uh oh!

pdunaj commented Mar 14, 2019

Uh oh!

andyross commented Mar 14, 2019

Uh oh!

pdunaj commented Mar 14, 2019

Uh oh!

andyross commented Mar 14, 2019 •

edited

Loading

Uh oh!

pdunaj commented Mar 15, 2019

Uh oh!

mempool allocator can return with no allocation even if memory is available #14508

mempool allocator can return with no allocation even if memory is available #14508

Comments

pdunaj commented Mar 14, 2019

pdunaj commented Mar 14, 2019

Uh oh!

pdunaj commented Mar 14, 2019

Uh oh!

pdunaj commented Mar 14, 2019

Uh oh!

andyross commented Mar 14, 2019

Uh oh!

andyross commented Mar 14, 2019

Uh oh!

pdunaj commented Mar 14, 2019

Uh oh!

andyross commented Mar 14, 2019

Uh oh!

pdunaj commented Mar 14, 2019

Uh oh!

andyross commented Mar 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pdunaj commented Mar 15, 2019

Uh oh!

andyross commented Mar 14, 2019 •

edited

Loading