-
Notifications
You must be signed in to change notification settings - Fork 2.3k
1.15 Android mystery crash thread #17364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Is that a null deref or could it be a divide by zero or something? I guess that wouldn't crash. -[Unknown] |
Here's more, first an assert, that now reports the game:
Also got another identical one on another device.
There's very few reports so far so these are really not statistically significant in any way. |
There's a hang (ANR), where we can see multiple callstacks:
Though given that one is on onPause, dunno how serious is it. Quite common to see ANRs related to onPause.. |
A potentially interesting one, though, could just be a OOM (actually, if so, should have asserted in PushPool):
memcpy(void*, void const* pass_object_size0, unsigned long) DrawBuffer::Flush(bool) This is really weird, the flush is from:
And the only background animation that doesn't already flush is FloatingSymbolsAnimation. |
One in the jit:
I don't see how this could crash unless currentMIPS was stomped on? |
crash in push_back, either oom, or not in a render pass similar to the SetNoBlendAndMask crash we've seen in past mystery threads. |
GPU/Common/SoftwareTransformCommon.cpp:847 which is indsOut[2] = i * 2 + 2; Maybe some bad arithmetic? Or just not enough space in the buffer, I guess Execute_Spline can generate a lot of verts here potentially..Have we ever seen a game legitimately using splines + lines? Or actually I guess this is the flush before actually drawing the spline, but seems likely that the previous command might also have been a spline. |
Here's an easy one, will fix:
|
Data race in HLE plugin input data map. Fixing. |
CoreTiming, crash during deleting events on exit. Hard to say, memory corruption? |
|
These are the memcpy loop at in TextureReplacer::NotifyTextureDecoded, at GPU/Common/TextureReplacer.cpp:718. Don't know how that can crash... |
I don't see how this can crash.. |
|
Found the first one that seems related to the new parallel shader compiles..
|
I think this one is because IsFull doesn't leave space for another block, and we don't check it again before we call ProxyBlock. |
This doesn't make too much sense, this is the line, and we've already checked that it's not empty a few lines above:
I should also mention that many of these have single-digit occurence counts, and fixing them all for 1.15.1 is not really necessary (some are very hard to root cause). |
… queue. Mainly paranoia, but might help with the mutex crash from #17364
Maybe somehow w/h being crazy values? Or
I can only think of a use-after-free, but that shouldn't be possible... so memory corruption?
I guess also has to be memory corruption... although it's likely a PPSSPP issue, at least one of these could feasibly be from a device with failing RAM or something. -[Unknown] |
==================================================================== Below here, only 1.15.1 crashes. ====================================================================
```
#00 pc 0x0000000000630a10 !libppsspp_jni.so (GPU_Vulkan::~GPU_Vulkan()+56)
#1 pc 0x0000000000630b90 !libppsspp_jni.so (GPU_Vulkan::~GPU_Vulkan()+16)
#2 pc 0x00000000006bb5c0 !libppsspp_jni.so (GPU_Shutdown()+100)
#3 pc 0x0000000000601a48 !libppsspp_jni.so (PSP_Shutdown()+132)
#4 pc 0x00000000007bb9a8 !libppsspp_jni.so (EmuScreen::~EmuScreen()+48)
#5 pc 0x00000000007bbaa0 !libppsspp_jni.so (EmuScreen::~EmuScreen()+16)
#6 pc 0x0000000000cb85b0 !libppsspp_jni.so (ScreenManager::shutdown()+80)
#7 pc 0x00000000007b7248 !libppsspp_jni.so (NativeShutdown()+28)
#8 pc 0x00000000007ac6b8 !libppsspp_jni.so (Java_org_ppsspp_ppsspp_NativeApp_shutdown+600)
#9 pc 0x00000000000094c0 /oat/arm64/base.odex (art_jni_trampoline+144)
```
apparently draw_ can be nullptr here in some shutdown scenarios. Oh well, fixing. |
Many of the crashes above no longer seems to be happening, thanks to the fixes. There's a new-looking crash in GPU_Vulkan::~GPU_Vulkan() , a few varieties. This is becoming the new top crash. Unfortunately the stack traces aren't loading (only the top frame which is visible in the stack title), it's often like that on Play right after a release. (Finally got it, see above) Other than that, it's mainly the usual Vulkan lost devices and weird ones left, like There is still a DrawUP crash which should be fixable if only I understood it.. Overall looking good so far but I only have very few early crashes so far. Should have better data tomorrow. |
Ah, the new OpenGL assert pays off (replaces a bunch of different crashes in the GL rendermanager):(DrawEngineGLES.cpp:DoFlush:254): [render_->IsInRenderPass()] (ULUS10025 Burnout Legends) Assert! Although, I haven't been able to trigger it yet in the game...
|
GPU/GPUCommonHW.cpp:801 An oldie but goodie. These have always been happening occasionally and usually we're screwed anyway, but maybe we should introduce a GPU-induced bluescreen, because I think we can either detect these from the memory exceptions fairly easily, or directly through not-too-expensive checks.
|
So, somehow the floating symbol animation can cause a crash? Or on this device the vulkan device is so broken that the first draw crashes.. Actually, something might linger? Because the flush below is from MiscScreens.cpp:163, which is the flush at the very start of FloatingSymbolsAnimation... Weird! device is xiaomi jasmine_sprout (Mi A2) , Android 10 (SDK 29)
Another one (1.15): GPU/Common/DrawEngineCommon.cpp:625
|
Wow, here's a quite special assert. Clearly a modded game, but that's quite an allocation it's trying to do:
I think the GLPushBuffer might need some work.. |
Hm, this doesn't sound good. Race condition? (Hashmaps.h:Insert:72): [false] (ULUS10277 Castlevania The Dracula X Chronicles) DenseHashMap: Duplicate key of size 8 inserted
|
Can GetFileLoader return nullptr?
if (!info_->GetFileLoader()->Exists()) { |
Looking deep within the rarest issues at this point. There's a few variations of this:
|
These ones are pretty rare but have occured for a very long time (definitely not new):
Baffling to me because we recently read from the location it's trying to write the op to, so I don't see how it can crash. |
Alright, time for 1.15.2 crashes. Msot of these are obscure and not likely to be common, and are of course not specifically 1.15.2-regressions. SettingInfoMessage nullptr check
AsyncIOManager locking issue
VirtualDiscFileSystem``` #00 pc 0x00000000004c5850 arm64_v8a.apk!libppsspp_jni.so (VirtualDiscFileSystem::GetFileInfo(std::__ndk1::basic_string, std::__ndk1::allocator>)+1228) #1 pc 0x00000000007c6fa8 arm64_v8a.apk!libppsspp_jni.so (ReadFileToString(IFileSystem*, char const*, std::__ndk1::basic_string, std::__ndk1::allocator>*, std::__ndk1::mutex*)+4096) #2 pc 0x00000000007c5974 arm64_v8a.apk!libppsspp_jni.so (GameInfoWorkItem::Run()+1304) #3 pc 0x0000000000799b70 arm64_v8a.apk!libppsspp_jni.so (WorkerThreadFunc(GlobalThreadContext*, TaskThreadContext*)+4096) #4 pc 0x000000000079b680 arm64_v8a.apk!libppsspp_jni.so (void* std::__ndk1::__thread_proxy>, void (*)(GlobalThreadContext*, TaskThreadContext*), GlobalThreadContext*, TaskThreadContext*>>(void*)+48) ```InitSwapchain thingy``` #00 pc 0x000000000002114c /system/lib64/libvulkan.so (vulkan::driver::GetPhysicalDeviceSurfaceCapabilitiesKHR(VkPhysicalDevice_T*, VkSurfaceKHR_T*, VkSurfaceCapabilitiesKHR*)+48) #1 pc 0x0000000000768468 split_config.arm64_v8a.apk!libppsspp_jni.so (VulkanContext::InitSwapchain()+80) #2 pc 0x00000000007b2358 split_config.arm64_v8a.apk!libppsspp_jni.so (AndroidVulkanContext::InitFromRenderThread(ANativeWindow*, int, int, int, int)+192) #3 pc 0x00000000007b03e8 split_config.arm64_v8a.apk!libppsspp_jni.so (Java_org_ppsspp_ppsspp_NativeActivity_runVulkanRenderLoop+264) #4 pc 0x0000000000009198 oat/arm64/base.odex (art_jni_trampoline+152) ```
|
This trace looks wrong, not sure how it gets to Read_U16:
Are the other locking ones ANRs? The ShaderManagerGLES::ApplyFragmentShader() one - not sure how it could race, hmm. The cache is loaded during start and not in the background (this is GL, after all.) And there wouldn't be two callers to ApplyFragmentShader() at the same time. Memory corruption somehow within -[Unknown] |
Yeah, it seems confused, really not sure how that could have happened. Maybe the linker partially merged the functions, or something. Not all of the lock related ones are ANRs, though I'll mark more clearly when I add more. Yeah the ApplyFragmentShader / DenseHashMap is quite baffling to me as well... |
Here's an oldie but goodie that I want to figure out at some point, it's not new but is bubbling up towards the top as other stuff is getting fixed:
|
Probing the depths of single-digit crash instances in 1.15.3... A Java nullpointerexception we might be able to avoid:``` Exception java.lang.RuntimeException: at android.app.ActivityThread.performDestroyActivity (ActivityThread.java:5950) at android.app.ActivityThread.handleDestroyActivity (ActivityThread.java:5995) at android.app.servertransaction.DestroyActivityItem.execute (DestroyActivityItem.java:47) at android.app.servertransaction.ActivityTransactionItem.execute (ActivityTransactionItem.java:45) at android.app.servertransaction.TransactionExecutor.executeLifecycleState (TransactionExecutor.java:176) at android.app.servertransaction.TransactionExecutor.execute (TransactionExecutor.java:97) at android.app.ActivityThread$H.handleMessage (ActivityThread.java:2438) at android.os.Handler.dispatchMessage (Handler.java:106) at android.os.Looper.loopOnce (Looper.java:226) at android.os.Looper.loop (Looper.java:313) at android.app.ActivityThread.main (ActivityThread.java:8669) at java.lang.reflect.Method.invoke at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run (RuntimeInit.java:571) at com.android.internal.os.ZygoteInit.main (ZygoteInit.java:1135) Caused by java.lang.NullPointerException: at org.ppsspp.ppsspp.NativeActivity.onDestroy (NativeActivity.java:757) at android.app.Activity.performDestroy (Activity.java:8571) at android.app.Instrumentation.callActivityOnDestroy (Instrumentation.java:1364) at android.app.ActivityThread.performDestroyActivity (ActivityThread.java:5937) ```ExpandLines still seems to have an edge case or two left:
weird glslang crash:
|
This can't possibly be new in 1.15.4, but I haven't seen it before:
Looks like we're missing some range check for block transfers. We really should have caught this with an assert, or ignored the out-of-bounds part of the copy. |
Interesting error in the tilt setup screen:
This one will be resolved by the upcoming UI event refactor though. |
Well, this could happen if a block transfer spans a memory mirror in certain ways, as I realized somewhat recently. For example, suppose I'm copying the 4 bytes from 0x041FFFFE to 0x04200002 to 0x04100010-0x04100014. This may crash, depending on how it's copied, since a single access crossing mirrors is unsafe. That said, maybe it's not this, I'm just not sure how else it wouldn't trip something else. -[Unknown] |
This is different from #19522? |
I've started the slow rollout, and we got our first crash already:
addr2line: ppsspp/Common/UI/ScrollView.cpp:206
EDIT: A bit later, this is by far the most prominent new crash. Fix implemented!
Link to browse files as 1.15, so line numbers match: https://github.com/hrydgard/ppsspp/tree/4a9227504219bbc64e444ba7f0e306746e5a806d
The text was updated successfully, but these errors were encountered: