ASoC: SOF: debug: change runtime PM get/put to correct usuage #237

keyonjie · 2018-11-01T02:00:18Z

this should fix #233

ASoC: SOF: debug: change runtime PM get/put to correct usuage

When doing debugFS entry read, we should make sure it won't enter
suspend during the reading, and the reading won't interact the runtime
PM(e.g. if the idle timeout is expired when should enter runtime suspend
ASSP) neither.

Here change to use get_noresume() to make sure not resuming ADSP for
debugFS reading, and put_sync_autosuspend() to ensure we can enter
runtime suspend after reading finished if needed.

Signed-off-by: Keyon Jie [email protected]

xiulipan · 2018-11-01T11:35:11Z

@keyonjie
In fail cases, the DSP are not able to power off, so this fixes could work.
when runtime pm power off the DSP, then we try to read from memory window. What will happen?

plbossart

Humm, in the case "For audio suspend case, reading back old value is reasonable", do you mean to say values that may not be up-to-date but weren't previously read? Or that the debugs info will always be consistent but not necessarily up-to-date. Not sure what you meant by 'old' and whether the data can be inconsistent/misleading?

lgirdwood · 2018-11-01T16:55:14Z

@plbossart I think "old" means the current values before the FW crash (that prevents runtime PM transitions). Iirc, a FW panic would prevent any D3 entry or core PM so that the SRAM contents should be valid here. @keyonjie can you confirm if this is what you can see.

keyonjie · 2018-11-02T05:43:30Z

@plbossart @lgirdwood the value depends on what is in DSP MMIO(memory window) at that moment, for DSP crashed or suspended cases, it might be the valid or invalid(e.g. 0xffffffff), but that is what we can try to read back, for analyzing.

The word "old value" is not accurate, now corrected.

xiulipan · 2018-11-02T10:32:50Z

@keyonjie
Will this have regression about the error trace disappear? If we did not power up the DSP when we try to get etrace, will it show 0xffffffff as it is invalid.

keyonjie · 2018-11-05T01:32:33Z

@xiulipan when we try to get etrace at suspend, it might show invalid or 0xffffffff, but I don't want to call it regression.
If we want to get etrace, we can do that after ADSP is waked up again, e.g. open alsamixer. So this is only design style choice, if we want(or can bear) to disturb the normal running of ADSP when reading debugFS.

xiulipan · 2018-11-05T03:20:14Z

@keyonjie
The previous design is to wake up DSP as we want to read the memory window. But every wake up will clean the memory window. So the previous run log will be cleaned after DSP suspend.
But for most debug cases, we want to check what caused the last run to exit or error, the logs are cleared by DSP power on.

For this PR, we can not get the last run log as well.

To achieve the goal to read error trace all the time, we should have more change about trace.

keyonjie · 2018-11-05T05:16:16Z

@xiulipan agree, this PR can't fix the error trace be cleared after resume issue.

lgirdwood · 2018-11-05T10:08:42Z

@keyonjie @xiulipan lets have a debubFS file that can be used to enable/disable runtime PM. That way we can disable D3 entry from cmd line if needed.

plbossart · 2018-11-05T16:19:04Z

@lgirdwood @ranj063 don't we already have existing pm_runtime hooks in /sys to disable D3 entry?

keyonjie · 2018-11-06T04:02:21Z

@lgirdwood @plbossart we already have that in /sys/bus/pci/device/xxx/power/control for that purpose.

lgirdwood · 2018-11-06T13:49:44Z

@keyonjie great, lets document that then under debug in public wiki

plbossart · 2018-11-06T20:05:10Z

@keyonjie I can't figure out if we need this patch merged, or if it contains a design flaw. Can you clarify what the status is, the discussion is not self-explanatory any longer.

ranj063 · 2018-11-06T22:29:00Z

@lgirdwood @plbossart we already have that in /sys/bus/pci/device/xxx/power/control for that purpose.

its actually /sys/devices/pci*/00*0e/power/control

ranj063 · 2018-11-06T22:31:59Z

@plbossart I think we might still need this patch to solve the problem of not being able to read debugfs when dsp panic occurs.

we do not use get/put while reading trace debugfs, so this seems OK to me.

keyonjie · 2018-11-08T05:43:32Z

@plbossart as Ranjani stated, I think better to apply this, it helps on dsp panic trace/debug, and no harm(or it actually improve to avoid interaction to DSP running during debugFS reading on suspend) for other cases.

plbossart · 2018-11-08T14:38:53Z

maybe a stupid comment but wouldn't pm_runtime_get_noresume and put_autosuspend be appropriate here? Nothing would happen if the device is already sleeping or crashed.

ranj063 · 2018-11-08T18:56:09Z

@plbossart @keyonjie I think put_autosuspend will still cause errors.

But get_noresume() and put_noidle() will just increment and decrement the usage counters and it might be useful to have this to prevent the dsp from entering D3 (if it were in D0 before) during debugfs read.

keyonjie · 2018-11-09T02:02:49Z

@ranj063 sounds good to use get_noresume() which will help prevent dsp to enter D3,
but not suitable to use put_noidle() fore return as it won't decrease to 0 where we need it actually.

@plbossart comparing to put_autosuspend(), I think we should use put_sync_autosuspend(), the latter one will use sync mode which may lead to suspend ASAP which means our debugFS reading won't influence runtime PM too much.

@lgirdwood I think you are familiar with this, any comments?

keyonjie · 2018-11-09T03:04:12Z

@lgirdwood @plbossart @ranj063 updated as stated above, please help review.

ranj063 · 2018-11-09T03:41:49Z

@keyonjie if the dsp has crashed and you request a put_sync_autosuspend(), what happens then?

keyonjie · 2018-11-09T04:27:43Z

@keyonjie if the dsp has crashed and you request a put_sync_autosuspend(), what happens then?

it just put the count we got above, if it is planed to go suspend, then let it go suspend.
It don't change the behavior, that means, keep what it should do as that without the debugFS read, keep what the driver will do autosuspend or not at DSP panic.

plbossart · 2018-11-09T15:00:51Z

There are a couple of points that are completely unclear to me.

using pm_runtime_get_noresume will only not have any effect if the DSP is in D3. If it is in D0, it will prevent entry in D3. What about when the DSP is crashed? What is the effect? Does this solve the problem that was initially reported?

When you use autosuspend you typically use mark_last_busy to provide a reference point. If you use pm_runtime_put_sync_autosuspend, when exactly will the transition to D3 take place. And what's the difference with pm_runtime_put_sync if you haven't used the mark-last-busy (which possibly makes no sense if you are already in D3)?

If we don't have the answers then I'll need experimental evidence that this patch solves the initial problem and has no side effects before merging it.

keyonjie · 2018-11-13T09:52:26Z

There are a couple of points that are completely unclear to me.

using pm_runtime_get_noresume will only not have any effect if the DSP is in D3. If it is in D0, it will prevent entry in D3. What about when the DSP is crashed? What is the effect? Does this solve the problem that was initially reported?

Currently, if DSP is crashed, usually we are still in D0 in driver side(we may do improvement in future, e.g. unload/reload modules to try to recover, or disable runtime PM when DSP panic detected..), the patch here will solved the #233, I have tried we can read etrace at this case, we actually treat it as normal D0.

When you use autosuspend you typically use mark_last_busy to provide a reference point. If you use pm_runtime_put_sync_autosuspend, when exactly will the transition to D3 take place. And what's the difference with pm_runtime_put_sync if you haven't used the mark-last-busy (which possibly makes no sense if you are already in D3)?

I am not sure about this, I intent to don't mark-busy here which means don't change the normal audio device runtime suspend/resume per my understanding, just raise question on PM ML.

If we don't have the answers then I'll need experimental evidence that this patch solves the initial problem and has no side effects before merging it.

I had done experiment:
for DSP panic case, issue #233 is fixed, we can read etrace and other memory windows values.
for D0 case, no regression found.
for D3 case, we can also read back entries without resuming DSP.

plbossart · 2018-11-13T15:08:48Z

When you use autosuspend you typically use mark_last_busy to provide a reference point. If you use pm_runtime_put_sync_autosuspend, when exactly will the transition to D3 take place. And what's the difference with pm_runtime_put_sync if you haven't used the mark-last-busy (which possibly makes no sense if you are already in D3)?

I am not sure about this, I intent to don't mark-busy here which means don't change the normal audio device runtime suspend/resume per my understanding, just raise question on PM ML.

Can you add a TODO comment in the code to track this is an open point so that we can merge this first and deal with this later.

keyonjie · 2018-11-14T01:11:47Z

When you use autosuspend you typically use mark_last_busy to provide a reference point. If you use pm_runtime_put_sync_autosuspend, when exactly will the transition to D3 take place. And what's the difference with pm_runtime_put_sync if you haven't used the mark-last-busy (which possibly makes no sense if you are already in D3)?

I am not sure about this, I intent to don't mark-busy here which means don't change the normal audio device runtime suspend/resume per my understanding, just raise question on PM ML.

Can you add a TODO comment in the code to track this is an open point so that we can merge this first and deal with this later.

Sure, let me add this now.

When doing debugFS entry read, we should make sure it won't enter suspend during the reading, and the reading won't interact the runtime PM(e.g. if the idle timeout is expired when should enter runtime suspend ASSP) neither. Here change to use get_noresume() to make sure not resuming ADSP for debugFS reading, and put_sync_autosuspend() to ensure we can enter runtime suspend after reading finished if needed. Signed-off-by: Keyon Jie <[email protected]>

keyonjie · 2018-11-14T01:46:33Z

@plbossart updated.

Currently without THP being enabled, MAX_ORDER via FORCE_MAX_ZONEORDER gets reduced to 11, which falls below HUGETLB_PAGE_ORDER for certain 16K and 64K page size configurations. This is problematic which throws up the following warning during boot as pageblock_order via HUGETLB_PAGE_ORDER order exceeds MAX_ORDER. WARNING: CPU: 7 PID: 127 at mm/vmstat.c:1092 __fragmentation_index+0x58/0x70 Modules linked in: CPU: 7 PID: 127 Comm: kswapd0 Not tainted 5.12.0-rc1-00005-g0221e3101a1 #237 Hardware name: linux,dummy-virt (DT) pstate: 20400005 (nzCv daif +PAN -UAO -TCO BTYPE=--) pc : __fragmentation_index+0x58/0x70 lr : fragmentation_index+0x88/0xa8 sp : ffff800016ccfc00 x29: ffff800016ccfc00 x28: 0000000000000000 x27: ffff800011fd4000 x26: 0000000000000002 x25: ffff800016ccfda0 x24: 0000000000000002 x23: 0000000000000640 x22: ffff0005ffcb5b18 x21: 0000000000000002 x20: 000000000000000d x19: ffff0005ffcb3980 x18: 0000000000000004 x17: 0000000000000001 x16: 0000000000000019 x15: ffff800011ca7fb8 x14: 00000000000002b3 x13: 0000000000000000 x12: 00000000000005e0 x11: 0000000000000003 x10: 0000000000000080 x9 : ffff800011c93948 x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000007000 x5 : 0000000000007944 x4 : 0000000000000032 x3 : 000000000000001c x2 : 000000000000000b x1 : ffff800016ccfc10 x0 : 000000000000000d Call trace: __fragmentation_index+0x58/0x70 compaction_suitable+0x58/0x78 wakeup_kcompactd+0x8c/0xd8 balance_pgdat+0x570/0x5d0 kswapd+0x1e0/0x388 kthread+0x154/0x158 ret_from_fork+0x10/0x30 This solves the problem via keeping FORCE_MAX_ZONEORDER unchanged with or without THP on 16K and 64K page size configurations, making sure that the HUGETLB_PAGE_ORDER (and pageblock_order) would never exceed MAX_ORDER. Cc: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Anshuman Khandual <[email protected]> Acked-by: Catalin Marinas <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>

GitHub: `README.md` improvements

Currently there is no synchronisation between finalize_pkvm() and kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if kvm_arm_init() fails resulting in the following warning on all the CPUs and eventually a HYP panic: | kvm [1]: IPA Size Limit: 48 bits | kvm [1]: Failed to init hyp memory protection | kvm [1]: error initializing Hyp mode: -22 | | <snip> | | WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50 | Modules linked in: | CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237 | Hardware name: FVP Base RevC (DT) | pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--) | pc : _kvm_host_prot_finalize+0x30/0x50 | lr : __flush_smp_call_function_queue+0xd8/0x230 | | Call trace: | _kvm_host_prot_finalize+0x3c/0x50 | on_each_cpu_cond_mask+0x3c/0x6c | pkvm_drop_host_privileges+0x4c/0x78 | finalize_pkvm+0x3c/0x5c | do_one_initcall+0xcc/0x240 | do_initcall_level+0x8c/0xac | do_initcalls+0x54/0x94 | do_basic_setup+0x1c/0x28 | kernel_init_freeable+0x100/0x16c | kernel_init+0x20/0x1a0 | ret_from_fork+0x10/0x20 | Failed to finalize Hyp protection: -22 | dtb=fvp-base-revc.dtb | kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540! | kvm [95]: nVHE call trace: | kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8 | kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac | kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160 | kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4 | kvm [95]: ---[ end nVHE call trace ]--- | kvm [95]: Hyp Offset: 0xfffe8db00ffa0000 | Kernel panic - not syncing: HYP panic: | PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800 | FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000 | VCPU:0000000000000000 | CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237 | Hardware name: FVP Base RevC (DT) | Workqueue: rpciod rpc_async_schedule | Call trace: | dump_backtrace+0xec/0x108 | show_stack+0x18/0x2c | dump_stack_lvl+0x50/0x68 | dump_stack+0x18/0x24 | panic+0x138/0x33c | nvhe_hyp_panic_handler+0x100/0x184 | new_slab+0x23c/0x54c | ___slab_alloc+0x3e4/0x770 | kmem_cache_alloc_node+0x1f0/0x278 | __alloc_skb+0xdc/0x294 | tcp_stream_alloc_skb+0x2c/0xf0 | tcp_sendmsg_locked+0x3d0/0xda4 | tcp_sendmsg+0x38/0x5c | inet_sendmsg+0x44/0x60 | sock_sendmsg+0x1c/0x34 | xprt_sock_sendmsg+0xdc/0x274 | xs_tcp_send_request+0x1ac/0x28c | xprt_transmit+0xcc/0x300 | call_transmit+0x78/0x90 | __rpc_execute+0x114/0x3d8 | rpc_async_schedule+0x28/0x48 | process_one_work+0x1d8/0x314 | worker_thread+0x248/0x474 | kthread+0xfc/0x184 | ret_from_fork+0x10/0x20 | SMP: stopping secondary CPUs | Kernel Offset: 0x57c5cb460000 from 0xffff800080000000 | PHYS_OFFSET: 0x80000000 | CPU features: 0x00000000,1035b7a3,ccfe773f | Memory Limit: none | ---[ end Kernel panic - not syncing: HYP panic: | PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800 | FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000 | VCPU:0000000000000000 ]--- Fix it by checking for the successfull initialisation of kvm_arm_init() in finalize_pkvm() before proceeding any futher. Fixes: 87727ba ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege") Cc: Will Deacon <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Oliver Upton <[email protected]> Cc: James Morse <[email protected]> Cc: Suzuki K Poulose <[email protected]> Cc: Zenghui Yu <[email protected]> Signed-off-by: Sudeep Holla <[email protected]> Acked-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Oliver Upton <[email protected]>

mengdonglin mentioned this pull request Nov 1, 2018

Invalid DMA trace stream tag causing FW oops thesofproject/sof#443

Closed

xiulipan mentioned this pull request Nov 1, 2018

could not access debugfs when DSP panic or IPC failed #233

Closed

xiulipan requested review from plbossart and lgirdwood November 1, 2018 05:37

keyonjie force-pushed the sof-pr branch from 0224a0d to 0efb0b1 Compare November 1, 2018 10:12

lgirdwood approved these changes Nov 1, 2018

View reviewed changes

plbossart reviewed Nov 1, 2018

View reviewed changes

plbossart force-pushed the topic/sof-dev branch from 2af9599 to 7e41d53 Compare November 1, 2018 16:18

keyonjie force-pushed the sof-pr branch from 0efb0b1 to d5f11b0 Compare November 2, 2018 05:10

keyonjie force-pushed the sof-pr branch from d5f11b0 to 3cd2bc0 Compare November 2, 2018 05:43

xiulipan mentioned this pull request Nov 2, 2018

verbose tracing doesn't refresh thesofproject/sof#526

Closed

keyonjie force-pushed the sof-pr branch from 3cd2bc0 to 52bbed4 Compare November 9, 2018 03:00

keyonjie changed the title ~~ASoC: SOF: debug: don't hold runtime PM for debugFS entries reading~~ ASoC: SOF: debug: change runtime PM get/put to correct usuage Nov 9, 2018

keyonjie force-pushed the sof-pr branch 2 times, most recently from 68ac58a to 211d0a6 Compare November 14, 2018 01:44

plbossart merged commit 3c1f692 into thesofproject:topic/sof-dev Nov 14, 2018

keyonjie mentioned this pull request Nov 20, 2018

ASoC: SOF: trace: fix trace doesn't update issue #248

Closed

plbossart mentioned this pull request Feb 4, 2019

[RFC]ASoC: SOF: remove runtime PM calls during debugfs read #620

Closed

aiChaoSONG pushed a commit to aiChaoSONG/linux that referenced this pull request May 6, 2021

Merge pull request thesofproject#237 from ojeda/contribute

912beef

GitHub: `README.md` improvements

ASoC: SOF: debug: change runtime PM get/put to correct usuage #237

ASoC: SOF: debug: change runtime PM get/put to correct usuage #237

Uh oh!

Conversation

keyonjie commented Nov 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xiulipan commented Nov 1, 2018

Uh oh!

plbossart left a comment

Choose a reason for hiding this comment

Uh oh!

lgirdwood commented Nov 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

keyonjie commented Nov 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xiulipan commented Nov 2, 2018

Uh oh!

keyonjie commented Nov 5, 2018

Uh oh!

xiulipan commented Nov 5, 2018

Uh oh!

keyonjie commented Nov 5, 2018

Uh oh!

lgirdwood commented Nov 5, 2018

Uh oh!

plbossart commented Nov 5, 2018

Uh oh!

keyonjie commented Nov 6, 2018

Uh oh!

lgirdwood commented Nov 6, 2018

Uh oh!

plbossart commented Nov 6, 2018

Uh oh!

ranj063 commented Nov 6, 2018

Uh oh!

ranj063 commented Nov 6, 2018

Uh oh!

keyonjie commented Nov 8, 2018

Uh oh!

plbossart commented Nov 8, 2018

Uh oh!

ranj063 commented Nov 8, 2018

Uh oh!

keyonjie commented Nov 9, 2018

Uh oh!

keyonjie commented Nov 9, 2018

Uh oh!

ranj063 commented Nov 9, 2018

Uh oh!

keyonjie commented Nov 9, 2018

Uh oh!

plbossart commented Nov 9, 2018

Uh oh!

keyonjie commented Nov 13, 2018

Uh oh!

plbossart commented Nov 13, 2018

Uh oh!

keyonjie commented Nov 14, 2018

Uh oh!

keyonjie commented Nov 14, 2018

Uh oh!

Uh oh!

keyonjie commented Nov 1, 2018 •

edited

Loading

lgirdwood commented Nov 1, 2018 •

edited

Loading

keyonjie commented Nov 2, 2018 •

edited

Loading