-
Notifications
You must be signed in to change notification settings - Fork 134
ASoC: SOF: debug: change runtime PM get/put to correct usuage #237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@keyonjie |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Humm, in the case "For audio suspend case, reading back old value is reasonable", do you mean to say values that may not be up-to-date but weren't previously read? Or that the debugs info will always be consistent but not necessarily up-to-date. Not sure what you meant by 'old' and whether the data can be inconsistent/misleading?
2af9599
to
7e41d53
Compare
@plbossart I think "old" means the current values before the FW crash (that prevents runtime PM transitions). Iirc, a FW panic would prevent any D3 entry or core PM so that the SRAM contents should be valid here. @keyonjie can you confirm if this is what you can see. |
@plbossart @lgirdwood the value depends on what is in DSP MMIO(memory window) at that moment, for DSP crashed or suspended cases, it might be the valid or invalid(e.g. 0xffffffff), but that is what we can try to read back, for analyzing. The word "old value" is not accurate, now corrected. |
@keyonjie |
@xiulipan when we try to get etrace at suspend, it might show invalid or 0xffffffff, but I don't want to call it regression. |
@keyonjie For this PR, we can not get the last run log as well. To achieve the goal to read error trace all the time, we should have more change about trace. |
@xiulipan agree, this PR can't fix the error trace be cleared after resume issue. |
@lgirdwood @ranj063 don't we already have existing pm_runtime hooks in /sys to disable D3 entry? |
@lgirdwood @plbossart we already have that in /sys/bus/pci/device/xxx/power/control for that purpose. |
@keyonjie great, lets document that then under debug in public wiki |
@keyonjie I can't figure out if we need this patch merged, or if it contains a design flaw. Can you clarify what the status is, the discussion is not self-explanatory any longer. |
its actually /sys/devices/pci*/00*0e/power/control |
@plbossart I think we might still need this patch to solve the problem of not being able to read debugfs when dsp panic occurs. we do not use get/put while reading trace debugfs, so this seems OK to me. |
@plbossart as Ranjani stated, I think better to apply this, it helps on dsp panic trace/debug, and no harm(or it actually improve to avoid interaction to DSP running during debugFS reading on suspend) for other cases. |
maybe a stupid comment but wouldn't pm_runtime_get_noresume and put_autosuspend be appropriate here? Nothing would happen if the device is already sleeping or crashed. |
@plbossart @keyonjie I think put_autosuspend will still cause errors. But get_noresume() and put_noidle() will just increment and decrement the usage counters and it might be useful to have this to prevent the dsp from entering D3 (if it were in D0 before) during debugfs read. |
@ranj063 sounds good to use get_noresume() which will help prevent dsp to enter D3, @plbossart comparing to put_autosuspend(), I think we should use put_sync_autosuspend(), the latter one will use sync mode which may lead to suspend ASAP which means our debugFS reading won't influence runtime PM too much. @lgirdwood I think you are familiar with this, any comments? |
@lgirdwood @plbossart @ranj063 updated as stated above, please help review. |
@keyonjie if the dsp has crashed and you request a put_sync_autosuspend(), what happens then? |
it just put the count we got above, if it is planed to go suspend, then let it go suspend. |
There are a couple of points that are completely unclear to me. using pm_runtime_get_noresume will only not have any effect if the DSP is in D3. If it is in D0, it will prevent entry in D3. What about when the DSP is crashed? What is the effect? Does this solve the problem that was initially reported? When you use autosuspend you typically use mark_last_busy to provide a reference point. If you use pm_runtime_put_sync_autosuspend, when exactly will the transition to D3 take place. And what's the difference with pm_runtime_put_sync if you haven't used the mark-last-busy (which possibly makes no sense if you are already in D3)? If we don't have the answers then I'll need experimental evidence that this patch solves the initial problem and has no side effects before merging it. |
Currently, if DSP is crashed, usually we are still in D0 in driver side(we may do improvement in future, e.g. unload/reload modules to try to recover, or disable runtime PM when DSP panic detected..), the patch here will solved the #233, I have tried we can read etrace at this case, we actually treat it as normal D0.
I am not sure about this, I intent to don't mark-busy here which means don't change the normal audio device runtime suspend/resume per my understanding, just raise question on PM ML.
I had done experiment: |
Can you add a TODO comment in the code to track this is an open point so that we can merge this first and deal with this later. |
Sure, let me add this now. |
68ac58a
to
211d0a6
Compare
When doing debugFS entry read, we should make sure it won't enter suspend during the reading, and the reading won't interact the runtime PM(e.g. if the idle timeout is expired when should enter runtime suspend ASSP) neither. Here change to use get_noresume() to make sure not resuming ADSP for debugFS reading, and put_sync_autosuspend() to ensure we can enter runtime suspend after reading finished if needed. Signed-off-by: Keyon Jie <[email protected]>
@plbossart updated. |
Currently without THP being enabled, MAX_ORDER via FORCE_MAX_ZONEORDER gets reduced to 11, which falls below HUGETLB_PAGE_ORDER for certain 16K and 64K page size configurations. This is problematic which throws up the following warning during boot as pageblock_order via HUGETLB_PAGE_ORDER order exceeds MAX_ORDER. WARNING: CPU: 7 PID: 127 at mm/vmstat.c:1092 __fragmentation_index+0x58/0x70 Modules linked in: CPU: 7 PID: 127 Comm: kswapd0 Not tainted 5.12.0-rc1-00005-g0221e3101a1 #237 Hardware name: linux,dummy-virt (DT) pstate: 20400005 (nzCv daif +PAN -UAO -TCO BTYPE=--) pc : __fragmentation_index+0x58/0x70 lr : fragmentation_index+0x88/0xa8 sp : ffff800016ccfc00 x29: ffff800016ccfc00 x28: 0000000000000000 x27: ffff800011fd4000 x26: 0000000000000002 x25: ffff800016ccfda0 x24: 0000000000000002 x23: 0000000000000640 x22: ffff0005ffcb5b18 x21: 0000000000000002 x20: 000000000000000d x19: ffff0005ffcb3980 x18: 0000000000000004 x17: 0000000000000001 x16: 0000000000000019 x15: ffff800011ca7fb8 x14: 00000000000002b3 x13: 0000000000000000 x12: 00000000000005e0 x11: 0000000000000003 x10: 0000000000000080 x9 : ffff800011c93948 x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000007000 x5 : 0000000000007944 x4 : 0000000000000032 x3 : 000000000000001c x2 : 000000000000000b x1 : ffff800016ccfc10 x0 : 000000000000000d Call trace: __fragmentation_index+0x58/0x70 compaction_suitable+0x58/0x78 wakeup_kcompactd+0x8c/0xd8 balance_pgdat+0x570/0x5d0 kswapd+0x1e0/0x388 kthread+0x154/0x158 ret_from_fork+0x10/0x30 This solves the problem via keeping FORCE_MAX_ZONEORDER unchanged with or without THP on 16K and 64K page size configurations, making sure that the HUGETLB_PAGE_ORDER (and pageblock_order) would never exceed MAX_ORDER. Cc: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Anshuman Khandual <[email protected]> Acked-by: Catalin Marinas <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
GitHub: `README.md` improvements
Currently there is no synchronisation between finalize_pkvm() and kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if kvm_arm_init() fails resulting in the following warning on all the CPUs and eventually a HYP panic: | kvm [1]: IPA Size Limit: 48 bits | kvm [1]: Failed to init hyp memory protection | kvm [1]: error initializing Hyp mode: -22 | | <snip> | | WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50 | Modules linked in: | CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237 | Hardware name: FVP Base RevC (DT) | pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--) | pc : _kvm_host_prot_finalize+0x30/0x50 | lr : __flush_smp_call_function_queue+0xd8/0x230 | | Call trace: | _kvm_host_prot_finalize+0x3c/0x50 | on_each_cpu_cond_mask+0x3c/0x6c | pkvm_drop_host_privileges+0x4c/0x78 | finalize_pkvm+0x3c/0x5c | do_one_initcall+0xcc/0x240 | do_initcall_level+0x8c/0xac | do_initcalls+0x54/0x94 | do_basic_setup+0x1c/0x28 | kernel_init_freeable+0x100/0x16c | kernel_init+0x20/0x1a0 | ret_from_fork+0x10/0x20 | Failed to finalize Hyp protection: -22 | dtb=fvp-base-revc.dtb | kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540! | kvm [95]: nVHE call trace: | kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8 | kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac | kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160 | kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4 | kvm [95]: ---[ end nVHE call trace ]--- | kvm [95]: Hyp Offset: 0xfffe8db00ffa0000 | Kernel panic - not syncing: HYP panic: | PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800 | FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000 | VCPU:0000000000000000 | CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237 | Hardware name: FVP Base RevC (DT) | Workqueue: rpciod rpc_async_schedule | Call trace: | dump_backtrace+0xec/0x108 | show_stack+0x18/0x2c | dump_stack_lvl+0x50/0x68 | dump_stack+0x18/0x24 | panic+0x138/0x33c | nvhe_hyp_panic_handler+0x100/0x184 | new_slab+0x23c/0x54c | ___slab_alloc+0x3e4/0x770 | kmem_cache_alloc_node+0x1f0/0x278 | __alloc_skb+0xdc/0x294 | tcp_stream_alloc_skb+0x2c/0xf0 | tcp_sendmsg_locked+0x3d0/0xda4 | tcp_sendmsg+0x38/0x5c | inet_sendmsg+0x44/0x60 | sock_sendmsg+0x1c/0x34 | xprt_sock_sendmsg+0xdc/0x274 | xs_tcp_send_request+0x1ac/0x28c | xprt_transmit+0xcc/0x300 | call_transmit+0x78/0x90 | __rpc_execute+0x114/0x3d8 | rpc_async_schedule+0x28/0x48 | process_one_work+0x1d8/0x314 | worker_thread+0x248/0x474 | kthread+0xfc/0x184 | ret_from_fork+0x10/0x20 | SMP: stopping secondary CPUs | Kernel Offset: 0x57c5cb460000 from 0xffff800080000000 | PHYS_OFFSET: 0x80000000 | CPU features: 0x00000000,1035b7a3,ccfe773f | Memory Limit: none | ---[ end Kernel panic - not syncing: HYP panic: | PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800 | FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000 | VCPU:0000000000000000 ]--- Fix it by checking for the successfull initialisation of kvm_arm_init() in finalize_pkvm() before proceeding any futher. Fixes: 87727ba ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege") Cc: Will Deacon <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Oliver Upton <[email protected]> Cc: James Morse <[email protected]> Cc: Suzuki K Poulose <[email protected]> Cc: Zenghui Yu <[email protected]> Signed-off-by: Sudeep Holla <[email protected]> Acked-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Oliver Upton <[email protected]>
this should fix #233
ASoC: SOF: debug: change runtime PM get/put to correct usuage
When doing debugFS entry read, we should make sure it won't enter
suspend during the reading, and the reading won't interact the runtime
PM(e.g. if the idle timeout is expired when should enter runtime suspend
ASSP) neither.
Here change to use get_noresume() to make sure not resuming ADSP for
debugFS reading, and put_sync_autosuspend() to ensure we can enter
runtime suspend after reading finished if needed.
Signed-off-by: Keyon Jie [email protected]