-
Notifications
You must be signed in to change notification settings - Fork 9
MultiplexTrace is critically broken #42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
In other words, MultiplexTrace lacks the ability of discerning elements that are multiplexed (both a state and a next_state) and those that are only next_state. |
I noticed this issue while working on a way to store information about episodes while still allowing the use of fixed-sized replay buffers. This is currently impossible with the Episodes struct. I believe this issue can be solved by systematically using this approach. In a few words, it consists in tracking the start and end index of the traces in an episode. The reason I originally started implementing that is to allow MultiStepBatchSampler to work on episodes that do not end at a terminal state (among other reasons). A related PR will be published soon. |
I've seen this issue before, I think. It's effectively a cold start problem, right? I have wondered from time to time whether from a Trace (but perhaps not from an RL algorithm) perspective, it's easier to think about |
How I'd like to rework this is to do (for the two episodes above):
where u is a placeholder for non-multiplex traces to keep the indexes aligned. And there's a Deque of episodes that tracks
indexing and consequently sampling would be disallowed for indexes 5 and 10. How to do this index rejection efficiently I don't know yet but I'll figure it out. |
I found a critical bug in MutliplexTrace:
MWE:
these loops mimic what happens during the run loop when two episodes of length 4 are pushed to the trace (5 being the terminal state). Now indexing
t
at index 5 should return(state = 1, next_state = 2)
but instead it returns(state = 5, next_state = 1)
: it mixes two episodes. You can also see thatlength(t) == 9
even though it should be 8.This is a huge problem, it makes the learning of basically every algorithm in RL.jl incorrect. @jeremiahpslewis
The text was updated successfully, but these errors were encountered: