Skip to content

cleanup code to remove race conditions #13

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Sep 24, 2015
Merged

cleanup code to remove race conditions #13

merged 5 commits into from
Sep 24, 2015

Conversation

whyrusleeping
Copy link
Collaborator

This is a moderate rewrite of the bindings to clean up some race conditions, and remove unnecessary async stuff. It also has roughly double the throughput as before.

@jbenet
Copy link
Owner

jbenet commented Sep 23, 2015

I don't recall why this was all there. thought the C++ lib was not threadsafe-- is it?

think should stress it hard. probably could use https://github.com/jbenet/go-stream-muxer/ to stress it.

@jbenet
Copy link
Owner

jbenet commented Sep 23, 2015

this repo doesnt run tests and thus i don't feel safe merging this. maybe add travis ci testing with some stress stuff first?

@jbenet
Copy link
Owner

jbenet commented Sep 23, 2015

  • enable it on travis
  • add .travis.yml
  • add stress test.

@jbenet
Copy link
Owner

jbenet commented Sep 23, 2015

I find it very concerning that:

  • in go 1.4 the tests take ~1min to run.
  • in go 1.5 the tests timeout after 10min

See:

either there's an issue with 1.5, or there's possibly a deadlock.

@jbenet
Copy link
Owner

jbenet commented Sep 23, 2015

Maybe it was travis? /still concerned

@jbenet
Copy link
Owner

jbenet commented Sep 23, 2015

seems to have deadlocked on travis again. hit 10m timeout in both 1.4 and 1.5

@jbenet
Copy link
Owner

jbenet commented Sep 23, 2015

And again.

@RichardLitt RichardLitt mentioned this pull request Sep 23, 2015
65 tasks
@whyrusleeping
Copy link
Collaborator Author

@jbenet strange, i did 'go test -count 50' and the first 8 or so passed normally, but the 9th errored with an EOF.

Wasnt able to reproduce a hang though... (but I should mention that the stress test eats my machines resources like no other)

@whyrusleeping
Copy link
Collaborator Author

so, completely unable to repro a hang locally, but I can repro a random EOF on the reads.

the EOF is caused by an EINVSOCK being returned, looking what returns that from the code tells me that the socket i'm trying to read from either does not exist, or has been closed (same error message for both, aint that great?)

My naive guess (still reading through the c++ code) is that somehow the 'close' packet gets received by the reader before some of the final data, causing the EINVSOCK to be returned before my reads actually finish

@whyrusleeping
Copy link
Collaborator Author

also:

thought the C++ lib was not threadsafe-- is it?

He says its completely threadsafe here: http://udt.sourceforge.net/udt4/doc/t-intro.htm
I'm hoping hes right.

@whyrusleeping
Copy link
Collaborator Author

I'm going to have to mark my suspicions about the close confirmed. I made the reader send a small message back to the writer once its read the expected amount and I no longer see that error.

@whyrusleeping
Copy link
Collaborator Author

I lowered the number of test connections. It was using over 3GB of ram on my machine, and when i applied additional stress to the system (running other traffic over localhost and stressing my cpu a bit) the amount of ram used went up even more.

I assume that travis doesnt give us that much ram to work with and we may have sunk into swapdeath causing the hangs we saw.

@whyrusleeping
Copy link
Collaborator Author

@jbenet ping!

jbenet added a commit that referenced this pull request Sep 24, 2015
cleanup code to remove race conditions
@jbenet jbenet merged commit 27ec03d into master Sep 24, 2015
@jbenet jbenet deleted the fix-race-err branch September 24, 2015 18:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants