ffi_call inside jitted function gives incorrect results #27101
Replies: 3 comments 2 replies
-
I don't have any suggestions off the top of my head. It'll be much easier to give tips if you can share a minimal reproducing example, so please do! |
Beta Was this translation helpful? Give feedback.
-
So I failed to reproduce it in a minimal way, but ultimately I seem to have resolved it. Probably just some misunderstanding about how it jit things properly. fwiw I have a class Foo(nnx.Module) that has a member instance of a class Bar(nnx.Module). A few things I observed:
So yeah I guess I jitted wrong (shame that I can do it multiple ways and it runs but gives different results), but surprising at least one variant had different results cuda vs non-cuda. Maybe just some UB shaking out differently. /shrug |
Beta Was this translation helpful? Give feedback.
-
So even after this, I was still having invalid results for some of my scripts/jobs. After lots of head-on-table banging later, I think I found the core issue: I was doing cudaMemcpy, cudaMemset operations in my .cu host function to initialize outputs before launching the kernel. While cudaSynchronizeStream() fails hard, these synchronizing calls do not, so I'm left with undefined behavior. This is my best guess/understanding, anyway. I moved the logic into the kernel and I seem to be smooth sailing now. Just FYI in case anyone else stumbles on this. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I’m writing a custom CUDA kernel executed via ffi_call() and returns 2 tensors. This is called from within another function (actually method on an nnx Module, if that matters). Works perfectly if I don’t jit the method/module, but the results are wrong if I do. Any pro tips for what I might be doing wrong? I haven’t done it yet, but I can try to make and share a minimal reproducing example.
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions