Skip to content

I2S transfers causes exception/crash in xtensa/Intel S1000 #13223

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sathishkuttan opened this issue Feb 10, 2019 · 31 comments · Fixed by #15430
Closed

I2S transfers causes exception/crash in xtensa/Intel S1000 #13223

sathishkuttan opened this issue Feb 10, 2019 · 31 comments · Fixed by #15430
Assignees
Labels
area: I2S area: Xtensa Xtensa Architecture bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug

Comments

@sathishkuttan
Copy link
Collaborator

Describe the bug
When a memory slab is defined using the K_MEM_SLAB_DEFINE macro, exceptions/crashes are seen on an Intel S1000 board

To Reproduce
Steps to reproduce the behavior:

  1. merge the changes in PR tests: intel_s1000: Fix broken tests for Intel S1000 CRB #13222
  2. make the following changes in tests/boards/intel_s1000_crb/src/i2s_test.c
    • comment out the i2s_mem_slab and audio_buffers definitions
    • replace them with K_MEM_SLAB_DEFINE(i2s_mem_slab, BLOCK_SIZE_BYTES, NUM_I2S_BUFFERS, 4);
    • To see the exceptions (instead of a silent crash). change the thread start delay at by changing the thread definition at the end of the file as PRIORITY, 0, K_MSEC(100));
  3. Disable logger by setting CONFIG_LOG=n in tests/boards/intel_s1000_crb/prj.conf
  4. build and run the test on an Intel S1000 CRB

Expected behavior
LEDs blink, DMA transfers successful, I2S transfers successful.

Screenshots or console output

*** Unhandled exception ****
Exception cause 12 (instr PIF data error):
  EPC1     : 0xbe00334f EXCSAVE1 : 0xbe004d6a EXCVADDR : 0x00000000
Program state (PS):
  INTLEVEL : 05 EXCM    : 0 UM  : 1 RING : 0 WOE : 1
  OWB      : 03 CALLINC : 2
Current thread ID = 0xbe00fd98
Faulting instruction address = 0xdeaddead
Fatal fault in ISR! Spinning...

Environment (please complete the following information):

  • Development OS: Linux
  • Toolchain: xt-xcc
  • Commit SHA used: 268572f

Additional context
If an explicit k_mem_slab object is defined and initialized using k_mem_slab_init, there is no exception/crash.

@sathishkuttan sathishkuttan added the bug The issue is a bug, or the PR is fixing a bug label Feb 10, 2019
@nashif nashif assigned dcpleung and unassigned nashif Feb 10, 2019
@nashif nashif added the priority: medium Medium impact/importance bug label Feb 10, 2019
@nashif
Copy link
Member

nashif commented Feb 10, 2019

@dcpleung can you please take a look. I still have no jtag access.

@dcpleung
Copy link
Member

I could not re-produce this on commit f81346c. Also the commit 268572f you mentioned is not on master. Could you try it again with the latest master?

@galak galak added the area: Xtensa Xtensa Architecture label Feb 11, 2019
@sathishkuttan
Copy link
Collaborator Author

I can reproduce the exception with

@dcpleung
Copy link
Member

dcpleung commented Mar 4, 2019

I looked into it a bit more. The exception seems to be caused by any DMA transfers. When I commented out the K_THREAD_DEFINE() in dma_test.c, there were no longer exception. When I re-enabled the DMA thread, exceptions came back.

Also, when I disabled ASM2, turned on CONFIG_DEBUG and CONFIG_NO_OPTIMIZATIONS, __device_i2s3_cavs->config was constantly corrupted after dw_dma_transfer_start() (__device_i2s3_cavs->config.name and __device_i2s3_cavs->config.init no longer pointed to valid string and function).

From these observations, I don't believe it is an issue with memory slab. Using K_MEM_SLAB_DEFINE() instead of defining your own memory slab variables moves the memory block into another section. This lessens the cushioning between data, and any data corruption is going to affect more variables (in other words, the same size of corruption now covers more variables).

@dcpleung
Copy link
Member

dcpleung commented Mar 6, 2019

I disabled the DMA test and found out that, even without the K_MEM_SLAB_DEFINE() modifying, it was still causing exception. It was traced to the I2S transfers, and I have the I2C and DMA thread disabled. The I2S transfer was corrupting data (after a few iterations) in the UART device struct. This causes the _impl_uart_poll_out() to raise double fault exception when it was trying to do dev->driver_api.

@dcpleung dcpleung changed the title mem_slab: use of K_MEM_SLAB_DEFINE causes exception/crash in xtensa/Intel S1000 I2S transfers causes exception/crash in xtensa/Intel S1000 Mar 6, 2019
@dcpleung dcpleung assigned sathishkuttan and unassigned dcpleung Mar 6, 2019
@dcpleung
Copy link
Member

dcpleung commented Mar 6, 2019

Assigning this back to driver and test owners.

@galak
Copy link
Collaborator

galak commented Mar 26, 2019

Any updates?

@rljordan-zz rljordan-zz assigned rgundi and unassigned nashif Mar 27, 2019
@rgundi
Copy link
Contributor

rgundi commented Apr 1, 2019

@dcpleung @sathishkuttan This is really weird but i am seeing exactly the opposite behaviour with the HEAD at 766edd4. The static K_MEM_SLAB_DEFINE works whereas the dynamic k_mem_slab doesn't. k_mem_slab_alloc fails the first time itself (line number 621 in file drivers/i2s/i2s_cavs.c) citing "insufficient memory". I didn't see any exception though. Can you guys please confirm this behaviour at your end?

PS: I made 2 changes to the current HEAD.

  1. intel_s1000 build was broken because of an issue with create_board_img.py. I have raised a PR to fix this (boards: intel_s1000_crb: declare some variables as global #15081).
  2. To make use of K_MEM_SLAB_DEFINE, I changed i2s_test.c as per Sathish's suggestions but also realized I had to comment out the k_mem_slab_init in test_i2s_bidirectional_transfer_configure(). May be Sathish missed mentioning that.

@rgundi
Copy link
Contributor

rgundi commented Apr 2, 2019

This is getting trickier. Looks like there are some timing related issues in the i2s test/driver. All my analysis below is for K_MEM_SLAB_DEFINE.
I happened to introduce a couple of printk statements in test_i2s_bidirectional_transfer_configure function for debug and i could see that there was a k_mem_slab_alloc failure after sometime. On debugging further, i could see that the allocs were being called multiple times (more than 4 times) before the k_mem_slab_free is called in i2s_dma_tx_callback. Also, k_mem_slab_alloc is happening in i2s_dma_rx_callback which is a little unusual since it is an ISR. @sathishkuttan, please have a look as the driver and the test both have tx and rx happening simultaneously and it is difficult to analyze what's happening.

@dcpleung
Copy link
Member

dcpleung commented Apr 2, 2019

I just tested it again on my setup.

  • On master, it failed every time due to allocation errors.
  • With the K_MEM_SLAB_DEFINE() changes above, it ran 100+ times of I2S transfers without errors. (This is with CONFIG_LOG=y)
  • With the K_MEM_SLAB_DEFINE() changes above and CONFIG_LOG=n, the first I2S transfer would succeed, but subsequent transfers failed with allocation errors.

From what I see here, the K_MEM_SLAB_DEFINE() change simply exposes the underlying issue and has nothing to do with the issue itself.

Also I haven't seen any exception or crashes.

@dcpleung

This comment has been minimized.

@dcpleung
Copy link
Member

dcpleung commented Apr 2, 2019

When CONFIG_STACK_SENTINEL=y is set, there would be stack check errors running the I2S transfer test. This seems to be caused by xthal_dcache_region_writeback() and xthal_dcache_region_invalidate(). When I cleared the macros SOC_DCACHE_FLUSH() and SOC_DCACHE_INVALIDATE(), there were no stack check errors and I2S transfers were working fine. So this might actually caused by the Xtensa HAL.

@rgundi
Copy link
Contributor

rgundi commented Apr 3, 2019

When CONFIG_STACK_SENTINEL=y is set, there would be stack check errors running the I2S transfer test. This seems to be caused by xthal_dcache_region_writeback() and xthal_dcache_region_invalidate(). When I cleared the macros SOC_DCACHE_FLUSH() and SOC_DCACHE_INVALIDATE(), there were no stack check errors and I2S transfers were working fine. So this might actually caused by the Xtensa HAL.

Yes, verified your observations to be correct. Also, the stack check errors don't go away upon doubling the i2s_test thread's as well as the interrupt stack's size to 4096.
However, when i reintroduced the printks that i was talking about earlier, even with CONFIG_STACK_SENTINEL=y, i did NOT see any failure. So, just to state explicitly so that it's clear:

  1. If i make CONFIG_STACK_SENTINEL=n with the printks, it FAILS.
  2. If i make CONFIG_STACK_SENTINEL=y with the printks, it PASSES.
  3. If i make CONFIG_STACK_SENTINEL=y with NO printks, it FAILS.
  4. If i make CONFIG_STACK_SENTINEL=n with NO printks, it PASSES.
    All this analysis is with K_MEM_SLAB_DEFINE and CONFIG_LOG=y.

This to me, looks like something timing dependent. May be it messes up some sequence in the code which then results in some kind of a stack corruption.

@dcpleung
Copy link
Member

dcpleung commented Apr 3, 2019

I don't have your printk changes so I cannot reproduce on my side. Here is what I have been trying today @ https://github.com/dcpleung/zephyr/tree/sue_creek_i2s_test. I had to comment out the SPI flash test because the SPI flash on my board is bad (can't even talk to it via dediprog). With or without CONFIG_STACK_SENTINEL and CONFIG_LOG (4 cases), the app worked without exceptions.

@rgundi
Copy link
Contributor

rgundi commented Apr 4, 2019

I have uploaded my branch at https://github.com/rgundi/zephyr/tree/i2s_crash. The various results are in the attached file.
I2s issue.xlsx

I did not see any silent failures/exceptions with these tests. All I saw was Stack failures and/or mem_alloc failures.

@dcpleung
Copy link
Member

dcpleung commented Apr 4, 2019

The two i2c_read() errors were -EAGAIN, which means there was nothing to read. IMHO, the test should retry instead of failing, as this is a recoverable error.

@dcpleung
Copy link
Member

dcpleung commented Apr 5, 2019

@rgundi Could you try https://github.com/dcpleung/zephyr/commits/sue_creek_i2s_test again? I have updated the branch with a few new patches. I tried the different scenarios in your xlsx file and they are all passing with my changes.

@sathishkuttan
Copy link
Collaborator Author

I happened to introduce a couple of printk statements

@rgundi printk should not be used once the interface has started because it can delay the reloading of DMA resulting in transfer errors.

@rgundi @dcpleung based on some of your observations, my hypothesis would be that the crashes/exceptions are a result of unintended eviction of data from the cache. Since the entire internal memory is is configured with writeback data cache attribute, some data structures that are used by the program may share the same cache line as the mem_slab buffers. When data cache is invalidated, the data structure contents that have not been written back to memory will be evicted and replaced by the contents of memory causing non-deterministic behavior.
Can you test by mapping the DACACHE_INVALIDATE macro to xthal_dcache_region_writeback_inv instead of xthal_dcache_region_invalidate?

@rgundi
Copy link
Contributor

rgundi commented Apr 8, 2019

@rgundi Could you try https://github.com/dcpleung/zephyr/commits/sue_creek_i2s_test again? I have updated the branch with a few new patches. I tried the different scenarios in your xlsx file and they are all passing with my changes.

Tested with your branch and i concur with your findings. All the scenarios passed for me as well. I had to change your latest patch a bit though as i was not able to get JTAG working reliably with that patch.

Can you test by mapping the DACACHE_INVALIDATE macro to xthal_dcache_region_writeback_inv instead of xthal_dcache_region_invalidate?

I see that tests specific to DMA were failing but the I2S bidirectional transfer was successful. This testing was with JTAG connected.

@rgundi
Copy link
Contributor

rgundi commented Apr 9, 2019

I ran 2 experiments.

  1. Experiments on Latest HEAD (38ecf5b)
    a) Pristine HEAD - fails with mem error (see below)
    Testing I2S bidirectional transfer
    RX Start failed with -12 error
    [00:00:03.950,000] soc: Reference clock frequency: 38400000 Hz
    [00:00:03.950,000] dma_cavs: Device DMA_0 initialized
    [00:00:03.950,000] dev_i2s_cavs: Device I2S_1 initialized
    [00:00:03.950,000] dev_i2s_cavs: Device I2S_2 initialized
    [00:00:03.950,000] dev_i2s_cavs: Device I2S_3 initialized
    [00:00:03.950,000] spi_dw: CS control inhibited (no GPIO device)
    [00:00:03.960,000] main: Sample app running on: xtensa Intel S1000 CRB
    [00:00:03.960,000] test_flash: SPI flash driver was found!
    [00:00:03.960,000] dev_i2s_cavs: MCLK/BCLK is not an integer, using M/N divider
    [00:00:03.960,000] dev_i2s_cavs: MCLK/BCLK is not an integer, using M/N divider
    [00:00:03.980,000] dev_i2s_cavs: buffer alloc from mem_slab failed (-12)

    b) HEAD + cache_writeback_inv -> same failure as above.
    c) HEAD + K_MEM_SLAB_DEFINE + Sentinel -> Stack corruption
    d) HEAD + K_MEM_SLAB_DEFINE + Sentinel + cache_writeback_inv -> Passes.

  2. Experiments on the commit ID 69b6cc8 (with workaround for Build failure when using XCC toolchain  #13710) on which Sathish had last seen the exception
    a) 69b6cc8 - This is with dynamic k_mem_slab. I see a failure here itself. See below.
    Testing I2S bidirectional transfer
    [00:00:05.738,078] dev_i2s_cavs: Stopping DMA channel 6 for TX stream
    [00:00:05.738,234] dev_i2s_cavs: Stopping RX stream & DMA channel 7
    [00:00:06.243,014] dev_i2s_cavs: Starting DMA Ch7
    [00:00:06.247,046] dev_i2s_cavs: buffer alloc from slab 0xbe010570 err -12
    [00:00:06.247,047] dev_i2s_cavs: Stopping RX stream & DMA channel 7
    [00:00:06.250,918] dev_i2s_cavs: DMA status 00000000 channel 6 k_msgq_get ret -80
    [00:00:06.250,919] dev_i2s_cavs: Stopping DMA channel 6 for TX stream
    i2s_read failed with -11 error
    b) 69b6cc8 with K_MEM_SLAB_DEFINE - I see an exception with or without cache_writeback_inv modification. Code changes to cache do not seem to solve the exception.

@rgundi
Copy link
Contributor

rgundi commented Apr 9, 2019

So, my inference is that there are/were 3 separate problems.

  1. The dynamic k_mem_slab usage in I2S test has been giving some problem from before itself. We need to separately see what that issue is.
  2. With respect to using the static K_MEM_SLAB_DEFINE, there was some exception seen earlier which is not there anymore on the main line. I suggest we drop it for the time being. We can look into it if and when it resurfaces.
  3. There is probably an issue with the way we used xthal_dcache_region_invalidate API which is causing stack corruptions. These corruptions are not seen if we use xthal_dcache_region_writeback_inv API instead. Sathish described a possible reason for this in one of his earlier comments.

@rgundi
Copy link
Contributor

rgundi commented Apr 11, 2019

  1. The dynamic k_mem_slab usage in I2S test has been giving some problem from before itself. We need to separately see what that issue is.

This is easy to reproduce. Further characterization in progress.

  1. With respect to using the static K_MEM_SLAB_DEFINE, there was some exception seen earlier which is not there anymore on the main line. I suggest we drop it for the time being. We can look into it if and when it resurfaces.

There's exception on the commit ID 69b6cc8 (with workaround for #13710) even after enabling XTENSA_ASM2. That means we don't have an answer to why there's no exception seen on the latest HEAD.

  1. There is probably an issue with the way we used xthal_dcache_region_invalidate API which is causing stack corruptions. These corruptions are not seen if we use xthal_dcache_region_writeback_inv API instead. Sathish described a possible reason for this in one of his earlier comments.

We realized xthal_dcache_region_writeback_inv was not the right API to use and debugged further. Looks like aligning i2s_mem_slab to XCHAL_DCACHE_LINESIZE solves the issue. Below are the map file snippets around i2s_mem_slab for both the cases.

With alignment:
.noinit 0xbe00a6c0 0x38a0
*(SORT(.noinit))
(SORT(.noinit.))
.noinit.53 0xbe00a6c0 0xc00 ../app/libapp.a(i2s_test.c.obj)
0xbe00a6c0 _k_mem_slab_buf_i2s_mem_slab
.noinit.175 0xbe00b2c0 0x400 ../app/libapp.a(i2c_test.c.obj)
0xbe00b2c0 _k_thread_stack_i2c_thread_id
.noinit.272 0xbe00b6c0 0x400 ../app/libapp.a(dma_test.c.obj)
0xbe00b6c0 _k_thread_stack_dma_thread_id
.noinit.187 0xbe00bac0 0x800 ../app/libapp.a(i2s_test.c.obj)
0xbe00bac0 _k_thread_stack_i2s_thread_id

Without alignment:
.noinit 0xbe00a6a0 0x38a0
*(SORT(.noinit))
(SORT(.noinit.))
.noinit.175 0xbe00a6a0 0x400 ../app/libapp.a(i2c_test.c.obj)
0xbe00a6a0 _k_thread_stack_i2c_thread_id
.noinit.272 0xbe00aaa0 0x400 ../app/libapp.a(dma_test.c.obj)
0xbe00aaa0 _k_thread_stack_dma_thread_id
.noinit.53 0xbe00aea0 0xc00 ../app/libapp.a(i2s_test.c.obj)
0xbe00aea0 _k_mem_slab_buf_i2s_mem_slab
.noinit.187 0xbe00baa0 0x800 ../app/libapp.a(i2s_test.c.obj)
0xbe00baa0 _k_thread_stack_i2s_thread_id

When there's no alignment to XCHAL_DCACHE_LINESIZE, invalidating the i2s_mem_slab buffer may also end up invalidating the thread stacks surrounding this object (dma_thread and i2s_thread stacks). Whenever these threads are accessed the next time, garbage data will be pulled from the memory hence resulting in a non-deterministic behavior. So, wherever we have a buffer which would get invalidated (mostly the Rx buffers for dma, i2s etc), we would need to align it to XCHAL_DCACHE_LINESIZE.

@dcpleung
Copy link
Member

I tested against 7352d2b (which is 69b6cc8 plus fixes in #13710). I was getting exception via k_mem_slab_free due to invalid pointers in slab->wait_q. The exception went away and the test was working after aligning the buffers to 64 bytes.

@sathishkuttan
Copy link
Collaborator Author

sathishkuttan commented Apr 11, 2019

We realized xthal_dcache_region_writeback_inv was not the right API

That was not the realization. It's the right API if the problem is due to valid data eviction from the cache upon invalidation.
There are a bunch of experiments done and observations made, yet no confirmed root cause. So it is early to say what is the right solution and what is not.

I tested against 7352d2b (which is 69b6cc8 plus fixes in #13710). I was getting exception via k_mem_slab_free due to invalid pointers in slab->wait_q. The exception went away and the test was working after aligning the buffers to 64 bytes.

In order to root cause, can you post the memory addresses of the buffers as well as the k_mem_slab structure to see if there were in the same cache "line"?

@dcpleung
Copy link
Member

dcpleung commented Apr 11, 2019

With K_MEM_SLAB_DEFINE onto top of 7352d2b:

.noinit         0xbe00aba0     0x38a0
 *(.noinit)
 *(.noinit.*)
 .noinit.168    0xbe00aba0      0x400 ../app/libapp.a(i2c_test.c.obj)
                0xbe00aba0                _k_thread_stack_i2c_thread_id
 .noinit.272    0xbe00afa0      0x400 ../app/libapp.a(dma_test.c.obj)
                0xbe00afa0                _k_thread_stack_dma_thread_id
 .noinit.54     0xbe00b3a0      0xc00 ../app/libapp.a(i2s_test.c.obj)
                0xbe00b3a0                _k_mem_slab_buf_i2s_mem_slab
 .noinit.187    0xbe00bfa0      0x800 ../app/libapp.a(i2s_test.c.obj)
                0xbe00bfa0                _k_thread_stack_i2s_thread_id
 .noinit.51     0xbe00c7a0       0xa0 libzephyr.a(log_core.c.obj)

_k_mem_slab_buf_i2s_mem_slab is not aligned to cache lines, and the last buffer occupies the same cache line as top of stack for i2s_thread_id (which is _k_thread_stack_i2s_thread_id). The_k_mem_slab_buf_i2s_mem_slab occupies 0xbe00b3a0 to 0xbe00bf9f.

Without K_MEM_SLAB_DEFINE:

 *(.bss)
 .bss           0xbe0101e0      0x158 ../app/libapp.a(i2c_test.c.obj)
                0xbe0101e0                _k_thread_obj_i2c_thread_id
 *fill*         0xbe010338	  0x8
 .bss           0xbe010340	0x230 ../app/libapp.a(dma_test.c.obj)
                0xbe01034c                _k_thread_obj_dma_thread_id
 .bss           0xbe010570      0xd80 ../app/libapp.a(i2s_test.c.obj)
                0xbe01058c                _k_thread_obj_i2s_thread_id
 .bss           0xbe0112f0      0x194 libzephyr.a(log_core.c.obj)
                0xbe011304                log_strdup_pool
                0xbe01132c                logging_thread
(xt-gdb) p &audio_buffers
$2 = (char (*)[3072]) 0xbe0106f0 <audio_buffers_352>

Where the audio_buffers starts from 0xbe0106f0 and ends at 0xbe0112ef (since size +0xc00) according to GDB (but does not appear inside the map file). The audio_buffers is cache line aligned in this case.

@sathishkuttan
Copy link
Collaborator Author

Good info.

_k_mem_slab_buf_i2s_mem_slab is not aligned to cache lines, and the last buffer occupies the same cache line as top of stack for i2s_thread_id (which is _k_thread_stack_i2s_thread_id).

for the record, can you update the comment with the start & end address of audio_buffers for the case "With K_MEM_SLAB_DEFINE"?

@dcpleung
Copy link
Member

Good info.

_k_mem_slab_buf_i2s_mem_slab is not aligned to cache lines, and the last buffer occupies the same cache line as top of stack for i2s_thread_id (which is _k_thread_stack_i2s_thread_id).

for the record, can you update the comment with the start & end address of audio_buffers for the case "With K_MEM_SLAB_DEFINE"?

There is no audio_buffers with K_MEM_SLAB_DEFINE as the buffer creation is taken care of by the macro, which in this case is the _k_mem_slab_buf_i2s_mem_slab.

@sathishkuttan
Copy link
Collaborator Author

There is no audio_buffers with K_MEM_SLAB_DEFINE as the buffer creation is taken care of by the macro, which in this case is the _k_mem_slab_buf_i2s_mem_slab.

OK, I just wanted the address of the buffers and didn't meant to say specifically the audio_buffers. I missed the _k_mem_slab_buf_i2s_mem_slab earlier.

_k_mem_slab_buf_i2s_mem_slab is not aligned to cache lines, and the last buffer occupies the same cache line as top of stack for i2s_thread_id (which is _k_thread_stack_i2s_thread_id). The_k_mem_slab_buf_i2s_mem_slab occupies 0xbe00b3a0 to 0xbe00bf9f.

Agree, this shows that the 1st 32 bytes of a cache line will be that of the buffer and the 2nd 32 bytes of the same cache line will be that of the stack.

Where the audio_buffers starts from 0xbe0106f0 and ends at 0xbe0112ef (since size +0xc00) according to GDB (but does not appear inside the map file). The audio_buffers is cache line aligned in this case.

It's not aligned to cache line, though. 0xf0 falls in a 48 byte offset, so in this case what ever data structure is allocated to the memory following the end of the buffer (ie., starting 0xbe0112f0) will share the same cache line. Similarly, any data structure allocated to the memory before the start of the buffer (ie., in the 48 bytes from 0xbe0106c0 to 0xbe0106ef) will share the same cache line as the 1st 16 bytes of the buffer.

@dcpleung
Copy link
Member

_k_mem_slab_buf_i2s_mem_slab is not aligned to cache lines, and the last buffer occupies the same cache line as top of stack for i2s_thread_id (which is _k_thread_stack_i2s_thread_id). The_k_mem_slab_buf_i2s_mem_slab occupies 0xbe00b3a0 to 0xbe00bf9f.

Agree, this shows that the 1st 32 bytes of a cache line will be that of the buffer and the 2nd 32 bytes of the same cache line will be that of the stack.

This would explain why turning on CONFIG_STACK_SENTINEL would cause the stack check to fail. It is possible that the sentinel was set (at top of stack as intended) but never committed to the memory. So the sentinel value only exists in the data cache. When that cache line was invalidated, entire line was read back from memory on next read. The top of stack then contained the value in memory instead of what was set for stack sentinel. So the next time task swapping was commenced, the check for sentinel would fail, and thus an exception was raised by the scheduler. Without stack sentinel, whatever occupied that space would get incorrect value.

Where the audio_buffers starts from 0xbe0106f0 and ends at 0xbe0112ef (since size +0xc00) according to GDB (but does not appear inside the map file). The audio_buffers is cache line aligned in this case.

It's not aligned to cache line, though. 0xf0 falls in a 48 byte offset, so in this case what ever data structure is allocated to the memory following the end of the buffer (ie., starting 0xbe0112f0) will share the same cache line. Similarly, any data structure allocated to the memory before the start of the buffer (ie., in the 48 bytes from 0xbe0106c0 to 0xbe0106ef) will share the same cache line as the 1st 16 bytes of the buffer.

With CONFIG_LOG=y, logs might have been corrupted a bit (due to log_strdup_pool) depending on how log is using the pool.

Note that as long as K_MEM_SLAB_DEFINE() is used, the _k_thread_stack_i2s_thread_id is immediately after _k_mem_slab_buf_i2s_mem_slab whether CONFIG_LOG is turned on or not. This probably reflects the earlier comment where there were exceptions regardless of CONFIG_LOG.

@dcpleung
Copy link
Member

Also, with 7352d2b and K_MEM_SLAB_DEFINE(), I was always getting double fault exception via GDB. When I changed the MEMCTL value to bypass cache (0x2222fff2, instead of 0x4242fff2), I was no longer getting any exception and the test was running fine. Note that this was done without forced cache alignment. In other words, vanilla 7352d2b, K_MEM_SLAB_DEFINE() and MEMCTL were the only two changes.

@rgundi
Copy link
Contributor

rgundi commented Apr 12, 2019

Upon configuring the memmap_cache_attr to 0x4212FFF2, I do not see the issue anymore with both static K_MEM_SLAB_DEFINE and the dynamic k_mem_slab (latest HEAD a942fcc). The cache attribute is set to "write through" for the memory addresses in this case.

dcpleung added a commit to dcpleung/zephyr that referenced this issue Apr 12, 2019
The i2s_cavs.c driver manipulates cache lines before commencing
any DMA transfers. With write-back cache, if the DMA receive
buffer is not aligned to the cache lines, the data around
the buffer will be invalidated and may never written to memory.
Since the driver takes an external memory slab as buffer and
there is no easy way to force cache line alignment on
the application side, set the cached region to write-through
to avoid potential issue.

Fixes zephyrproject-rtos#13223

Signed-off-by: Daniel Leung <[email protected]>
nashif pushed a commit that referenced this issue Apr 12, 2019
The i2s_cavs.c driver manipulates cache lines before commencing
any DMA transfers. With write-back cache, if the DMA receive
buffer is not aligned to the cache lines, the data around
the buffer will be invalidated and may never written to memory.
Since the driver takes an external memory slab as buffer and
there is no easy way to force cache line alignment on
the application side, set the cached region to write-through
to avoid potential issue.

Fixes #13223

Signed-off-by: Daniel Leung <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: I2S area: Xtensa Xtensa Architecture bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants