Skip to content

mempool can result in OOM while memory is available #15154

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pdunaj opened this issue Apr 3, 2019 · 2 comments · Fixed by #15155
Closed

mempool can result in OOM while memory is available #15154

pdunaj opened this issue Apr 3, 2019 · 2 comments · Fixed by #15155
Assignees
Labels
area: Kernel bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug

Comments

@pdunaj
Copy link
Collaborator

pdunaj commented Apr 3, 2019

Describe the bug
This is known issue. The problem is we are being hit by it frequently.

The mempool alloctor when looking for a block to give takes the smallest one available and if it is too large breaks it into smaller pieces. The problem is lock is released during this operation. This can lead to memory depletion if low priority task grabs the large block but cannot finish split as a high priority task (or isr) takes CPU and wants to allocate too.

The chance of hitting it is quite high as mempool is constantly breaking and merging blocks (causing huge performance drop btw!).

Expanding the memory will not help really. This can happen regardless of the heap size as long as you have at least 5 places that can allocate and preempt each other.
Can also happen if you have 2 contexts but 3/4 of the memory is taken, 3 context but 2/4 or memory is taken , etc. (so only one large block is left for split and two context compete for it).

To Reproduce
Steps to reproduce the behavior:
Allocate from various context of mixed priorities. This is a race.

Expected behavior
Should work, no races.

Impact
Showstopper.

Screenshots or console output
N/A

Environment (please complete the following information):
ncs zephyr: 0bf5263b0522bb5cfac84eefdfdee86dc2c67e3e (upstream d3bb3cf)

Additional context
Call stack when OOM points to the ISR. I added a stats collector that shows 288 of 512 B heap are allocated. I suspect at least one k_malloc is happeing.

#0  foo() at /home/pdunaj/work/ncs/zephyr/kernel/mempool.c:55
#1  0x0000b53a in z_sys_mem_pool_block_alloc (p=p@entry=0x2000a318 <_heap_mem_pool>, size=size@entry=16,
    level_p=0x200063d8 <_interrupt_stack+1464>, level_p@entry=0x200063a8 <_interrupt_stack+1416>, block_p=0x2e971 <k_mem_pool_alloc+76>,
    block_p@entry=0x200063ac <_interrupt_stack+1420>, data_p=<optimized out>, data_p@entry=0x200063d8 <_interrupt_stack+1464>)
    at /home/pdunaj/work/ncs/zephyr/lib/os/mempool.c:299
#2  0x0002e970 in k_mem_pool_alloc (p=p@entry=0x2000a318 <_heap_mem_pool>, block=block@entry=0x200063d8 <_interrupt_stack+1464>,
    size=size@entry=16, timeout=timeout@entry=0) at /home/pdunaj/work/ncs/zephyr/kernel/mempool.c:80
#3  0x0002eb6a in k_mem_pool_malloc (pool=pool@entry=0x2000a318 <_heap_mem_pool>, size=size@entry=12)
    at /home/pdunaj/work/ncs/zephyr/kernel/mempool.c:185
#4  0x0002ecf4 in k_malloc (size=size@entry=12) at /home/pdunaj/work/ncs/zephyr/kernel/mempool.c:241
#5  0x00005f08 in new_wheel_event () at ../src/events/wheel_event.h:28
#6  data_ready_handler (dev=<optimized out>, trig=<optimized out>) at ../src/hw_interface/wheel.c:57
#7  0x0002d456 in qdec_nrfx_event_handler (event=...) at /home/pdunaj/work/ncs/zephyr/drivers/sensor/qdec_nrfx/qdec_nrfx.c:155
#8  0x0000eb94 in nrfx_qdec_irq_handler () at /home/pdunaj/work/ncs/zephyr/ext/hal/nordic/nrfx/drivers/src/nrfx_qdec.c:76
#9  0x00019ec6 in _isr_wrapper () at /home/pdunaj/work/ncs/zephyr/arch/arm/core/isr_wrapper.S:120
@pdunaj pdunaj added bug The issue is a bug, or the PR is fixing a bug area: Kernel labels Apr 3, 2019
@pdunaj
Copy link
Collaborator Author

pdunaj commented Apr 3, 2019

This is a regeression caused by my change 41e90630d74110a8d9d9e74681b43af7fb6a59a6:

                                pool_irq_unlock(p, key);                                                                 
                                key = pool_irq_lock(p);                                                                  
                                data = block_break(p, data, from_l, lsizes);

Previously a first block_break was happening before irq was relaxed giving 3/4 of the block back to the pool. I guess we would have to move this back.

I see two other things that could be done maybe:

  • At the moment heap consist of the 1 large block that is split. Maybe we could add a config option that would allow to configure heap as N blocks of SIZE. If somebody would like to be able to allocate entire heap in one go N would be 1. We could set N to a number of known allocating contexts. Our allocations are small so N x size would not cause excessive memory usage.
  • Avoid split / merge on alloc/free. I actually wanted to create a separate issue for that. At the moment we pay a lot of CPU time for this operations causing our report rate to drop from 1000 to 850 reports per second. Algorithm could keep the heap broken down and merge it only if large allocation is needed. During break operation lock would be taken.

@pdunaj
Copy link
Collaborator Author

pdunaj commented Apr 3, 2019

Fix created...

@nashif nashif added the priority: medium Medium impact/importance bug label Apr 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Kernel bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants