mempool can result in OOM while memory is available #15154

pdunaj · 2019-04-03T11:22:23Z

Describe the bug
This is known issue. The problem is we are being hit by it frequently.

The mempool alloctor when looking for a block to give takes the smallest one available and if it is too large breaks it into smaller pieces. The problem is lock is released during this operation. This can lead to memory depletion if low priority task grabs the large block but cannot finish split as a high priority task (or isr) takes CPU and wants to allocate too.

The chance of hitting it is quite high as mempool is constantly breaking and merging blocks (causing huge performance drop btw!).

Expanding the memory will not help really. This can happen regardless of the heap size as long as you have at least 5 places that can allocate and preempt each other.
Can also happen if you have 2 contexts but 3/4 of the memory is taken, 3 context but 2/4 or memory is taken , etc. (so only one large block is left for split and two context compete for it).

To Reproduce
Steps to reproduce the behavior:
Allocate from various context of mixed priorities. This is a race.

Expected behavior
Should work, no races.

Impact
Showstopper.

Screenshots or console output
N/A

Environment (please complete the following information):
ncs zephyr: 0bf5263b0522bb5cfac84eefdfdee86dc2c67e3e (upstream d3bb3cf)

Additional context
Call stack when OOM points to the ISR. I added a stats collector that shows 288 of 512 B heap are allocated. I suspect at least one k_malloc is happeing.

#0  foo() at /home/pdunaj/work/ncs/zephyr/kernel/mempool.c:55
#1  0x0000b53a in z_sys_mem_pool_block_alloc (p=p@entry=0x2000a318 <_heap_mem_pool>, size=size@entry=16,
    level_p=0x200063d8 <_interrupt_stack+1464>, level_p@entry=0x200063a8 <_interrupt_stack+1416>, block_p=0x2e971 <k_mem_pool_alloc+76>,
    block_p@entry=0x200063ac <_interrupt_stack+1420>, data_p=<optimized out>, data_p@entry=0x200063d8 <_interrupt_stack+1464>)
    at /home/pdunaj/work/ncs/zephyr/lib/os/mempool.c:299
#2  0x0002e970 in k_mem_pool_alloc (p=p@entry=0x2000a318 <_heap_mem_pool>, block=block@entry=0x200063d8 <_interrupt_stack+1464>,
    size=size@entry=16, timeout=timeout@entry=0) at /home/pdunaj/work/ncs/zephyr/kernel/mempool.c:80
#3  0x0002eb6a in k_mem_pool_malloc (pool=pool@entry=0x2000a318 <_heap_mem_pool>, size=size@entry=12)
    at /home/pdunaj/work/ncs/zephyr/kernel/mempool.c:185
#4  0x0002ecf4 in k_malloc (size=size@entry=12) at /home/pdunaj/work/ncs/zephyr/kernel/mempool.c:241
#5  0x00005f08 in new_wheel_event () at ../src/events/wheel_event.h:28
#6  data_ready_handler (dev=<optimized out>, trig=<optimized out>) at ../src/hw_interface/wheel.c:57
#7  0x0002d456 in qdec_nrfx_event_handler (event=...) at /home/pdunaj/work/ncs/zephyr/drivers/sensor/qdec_nrfx/qdec_nrfx.c:155
#8  0x0000eb94 in nrfx_qdec_irq_handler () at /home/pdunaj/work/ncs/zephyr/ext/hal/nordic/nrfx/drivers/src/nrfx_qdec.c:76
#9  0x00019ec6 in _isr_wrapper () at /home/pdunaj/work/ncs/zephyr/arch/arm/core/isr_wrapper.S:120

The text was updated successfully, but these errors were encountered:

pdunaj · 2019-04-03T11:46:22Z

This is a regeression caused by my change 41e90630d74110a8d9d9e74681b43af7fb6a59a6:

                                pool_irq_unlock(p, key);                                                                 
                                key = pool_irq_lock(p);                                                                  
                                data = block_break(p, data, from_l, lsizes);

Previously a first block_break was happening before irq was relaxed giving 3/4 of the block back to the pool. I guess we would have to move this back.

I see two other things that could be done maybe:

At the moment heap consist of the 1 large block that is split. Maybe we could add a config option that would allow to configure heap as N blocks of SIZE. If somebody would like to be able to allocate entire heap in one go N would be 1. We could set N to a number of known allocating contexts. Our allocations are small so N x size would not cause excessive memory usage.
Avoid split / merge on alloc/free. I actually wanted to create a separate issue for that. At the moment we pay a lot of CPU time for this operations causing our report rate to drop from 1000 to 850 reports per second. Algorithm could keep the heap broken down and merge it only if large allocation is needed. During break operation lock would be taken.

pdunaj · 2019-04-03T12:06:51Z

Fix created...

pdunaj added bug The issue is a bug, or the PR is fixing a bug area: Kernel labels Apr 3, 2019

pdunaj mentioned this issue Apr 3, 2019

lib: mempool: Alloc and break must happen atomically #15155

Merged

nashif added the priority: medium Medium impact/importance bug label Apr 3, 2019

andrewboie assigned pdunaj Apr 3, 2019

nashif closed this as completed in #15155 Apr 3, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

mempool can result in OOM while memory is available #15154

mempool can result in OOM while memory is available #15154

pdunaj commented Apr 3, 2019

pdunaj commented Apr 3, 2019

Uh oh!

pdunaj commented Apr 3, 2019

Uh oh!

mempool can result in OOM while memory is available #15154

mempool can result in OOM while memory is available #15154

Comments

pdunaj commented Apr 3, 2019

pdunaj commented Apr 3, 2019

Uh oh!

pdunaj commented Apr 3, 2019

Uh oh!