-
Notifications
You must be signed in to change notification settings - Fork 118
File transfer sometimes doesn't work #1572
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I finally figured this out. Sending medium sized templates over the network repeatedly with node memory 256mb can cause problems. These buffers are cached, in a cache that is rather large Because of that, the offheap memory never gets cleared. After a while, the offheap memory fills up and you get an OutOfMemoryError. But that error is silently ignored, which is a problem, because
I inserted some code at the right place to get this error.
I will go to sleep for now, it's 4am and I'm done. I will open a PR to fix this, probably tomorrow, after a good night's sleep and a clear head. |
I have lied, netty does not cache buffers. It is possible to reduce the max size of these buffers, or even disable them with the For the node this can be done with
For the wrapper this can be done by specifying the jvmArgs in the task. |
Stacktrace
Actions to reproduce
I have a setup where a minigame server downloads the Map to play on when the game starts.
This is done by calling
TemplateStorage#openZipInputStreamAsync
from the server main thread.Sometimes the call never returns, causing the minigame server to be stuck in a waiting state.
As soon as that happens, all other requests also never get answered.
Because of that, as soon as it breaks once (usually once-twice a day), I have to restart the entire node to get the servers to work again.
This, to me, seems like an issue in the node.
I believe somehow the node tries to send the packet to the wrapper, but the packet never makes it all the way. The curious thing is, that any zip request packet from another server also never makes it back to the wrapper. There must be some sort of state in the node, that breaks.
Or some sort of deadlock, though I don't believe that. Maybe a race condition breaking a state? To me it seems that the packets should get sent to the wrapper (as per logging messages) so can this be a netty bug?
But that also seems unlikely, because other packets like ChannelMessages work fine.
I have no clue as to what exactly is happening here, those are just the things I thought of as of now.
I have done some debugging on my own to try and get more information about this issue. This issue is somewhat recent (I haven't always updated to the latest snapshots), but I don't think it is much older than 6 months, because until I updated the node everything worked fine.
My logging customizations are here.
The exact build I am running is here
My exact build has some more customizations, but mostly to module loading, nothing that should impact packet handling/file transfer except disabling zip compression
CloudNet version
[19.01 15:11:08.538] INFO :
[19.01 15:11:08.538] INFO : CloudNet Blizzard 4.0.0-RC12-SNAPSHOT f18671a
[19.01 15:11:08.538] INFO : Discord: https://discord.cloudnetservice.eu/
[19.01 15:11:08.538] INFO :
[19.01 15:11:08.538] INFO : ClusterId: ae0bbf39--431d--857e2580ae82
[19.01 15:11:08.538] INFO : NodeId: Node-1
[19.01 15:11:08.538] INFO : Head-NodeId: Node-1
[19.01 15:11:08.538] INFO : CPU usage: (P/S) .18/11.45/100%
[19.01 15:11:08.538] INFO : Node services memory allocation (U/R/M): 5532/5532/16384 MB
[19.01 15:11:08.538] INFO : Threads: 55
[19.01 15:11:08.538] INFO : Heap usage: 198/256MB
[19.01 15:11:08.538] INFO : JVM: Eclipse Adoptium 23 (OpenJDK 64-Bit Server VM 23.0.1+11)
[19.01 15:11:08.538] INFO : Update Repo: CloudNetService/launchermeta, Update Branch: beta (development mode)
[19.01 15:11:08.538] INFO :
Other
This is how the logging with my build should look like:
minigame server
node
how the log looks like when it breaks
minigame server
node
I filtered the node logs and removed the listener debug messages, otherwise the logs would be very cluttered.
The minigame server logs are from the .wrapper/logs directory
I have also created heap dumps of the problematic servers and saved the logs, should any questions arise here.
Issue uniqueness
EDIT1
Trying to create thousands of zip requests doesn't hasn't broken the thing yet for me. So this is difficult to reproduce.
The 1-2 breakages/day happen from ~15 requests spread over the day
EDIT2
I uploaded the relevant log files
working minigame
working node
broken minigame
broken node
The text was updated successfully, but these errors were encountered: