Skip to content

adding an EpisodesBuffer #43

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 40 commits into from
Jul 7, 2023
Merged

adding an EpisodesBuffer #43

merged 40 commits into from
Jul 7, 2023

Conversation

HenriDeh
Copy link
Member

@HenriDeh HenriDeh commented Jun 30, 2023

This PR adds an EpisodesBuffer.

Why is it useful/necessary? First there is the issue raised in #42 that shows how necessary it is to track episodes. Second, this also allows the use of multi-step batch sampling non-episodic environments that do not stop at a terminal state (they don't have any); or the early stopping of any environment due to say a limited number of steps. Currently multisteps are truncated using the terminal flag.

This PR is not over though. It is just a concept implementation of a wrapper to Traces (EpisodesBuffer) that will store a queue of Episode objects. An Episode is just the start and end indices (kind of a View) of one episode, its length, and whether it is terminated.

Termination is automatically detected when partial insertion happens. Indeed, in the run loop, before acting for the first time, only a (state, action) pair is pushed to the trajectory, then it's (reward, terminal, next_state, next_action).

This is the choice I made at the moment BUT, it is not compatible with the current run loop as the partial inserting only happens for the very first episode. Then at the first step of any other episode, it pushes (reward_prev_ep, terminal_prev_ep, state_new_ep, action_new_ep), that is, the last state of the previous episode is not pushed. This is related to the problem we discussed in #913.

The other way I thought is instead of detecting with partial inserting, we add program PostEpisodeStage to send a termination status to the last episode (/!\ terminal != terminated). I did not choose this because it kinda means that RLTrajectories must rely on whatever package that uses it to send the correct information. Whereas the partial insertion detection is self-contained.

In any case, my proposal is to accompany this PR with another one in RL.jl that would change the run loop as follows.

  1. PreEpisode: cache and push first state and action
  2. start the episode loop
  3. PreAct: nothing, we already know state and action
  4. act!
  5. PostAct: empty current cache then cache reward, terminal, next_state
  6. plan! (with regard to the issue raised in #913
  7. go back to 1 unless reset condition.

This way, plan! is executed at the end of the loop and the trajectory is always complete. As explained in #42, the completeness of the trajectory is ensured by inserting dummy elements between episodes, at the indices that should be excluded from sampling.

This is a big PR and it already took me a lot of time, I'd like to know what you think @jeremiahpslewis before I move on with the...

...Remaining stuff to do:

  • Rework sampling to avoid sampling (state = last_state_of_ep1, next_state = first_state_of_ep2).
  • Prioritized sampling compatibility.
  • Building a Trajectory automatically wraps the container in an EpisodesBuffer.
  • (How are episodes detected if no MultiplexTraces is present ? ) I think we can safely assume that for all intended purposes of this package there will always be a state + next state as it is core to all RL algorithms.
  • How to make this work with MultiThreadedEnvs ? MTE does not even work with the current implementation of Trajectories.
  • Should we support append! ? > Only appending another trajectory can work as we cannot infer the valid indices from a named tuple Answer is no because MultiplexTraces do not currently support appending. Appending is just not generally supported at the moment and needs to be implemented in a dedicated PR. Appending will be mostly useful for distributed agents that have their own trajectories to send to a main trajectory.
  • Bump version
  • CircularSARTTraces should not have next_action.

All of the following are to be done in the RL.jl PR.

  • Change docs wherever applicable
  • Rework the run loop and change compat for Trajectories in RLCore.
  • Check performance impact (+ can we make this more efficient).

@HenriDeh HenriDeh linked an issue Jun 30, 2023 that may be closed by this pull request
@codecov
Copy link

codecov bot commented Jun 30, 2023

Codecov Report

Merging #43 (65978c9) into main (8a5ac1b) will increase coverage by 0.68%.
The diff coverage is 82.26%.

@@            Coverage Diff             @@
##             main      #43      +/-   ##
==========================================
+ Coverage   69.94%   70.63%   +0.68%     
==========================================
  Files          13       15       +2     
  Lines         579      630      +51     
==========================================
+ Hits          405      445      +40     
- Misses        174      185      +11     
Impacted Files Coverage Δ
src/ReinforcementLearningTrajectories.jl 100.00% <ø> (ø)
src/common/CircularArraySLARTTraces.jl 87.50% <0.00%> (-12.50%) ⬇️
src/common/CircularPrioritizedTraces.jl 84.37% <ø> (ø)
src/common/CircularArraySARTTraces.jl 85.71% <50.00%> (-14.29%) ⬇️
src/trajectory.jl 61.90% <60.00%> (-10.32%) ⬇️
src/samplers.jl 63.88% <66.66%> (+4.51%) ⬆️
src/traces.jl 84.74% <83.33%> (+0.93%) ⬆️
src/common/CircularArraySARSATTraces.jl 85.71% <85.71%> (ø)
src/episodes.jl 92.00% <92.00%> (ø)
src/normalization.jl 63.09% <100.00%> (+1.55%) ⬆️

... and 1 file with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@HenriDeh
Copy link
Member Author

HenriDeh commented Jul 4, 2023

I changed the approach. Regarding sampleable steps, the EB now simply keeps a bit array with 1s where steps are valid samples, 0s otherwise. This array is then used as weights with the weighted sampling functionalities of StatsBase, basically valid steps all have weight 1 and invalid steps weight 0. This certainly adds overhead to sampling but I'm not expecting it to be significant.

Copy link
Member

@jeremiahpslewis jeremiahpslewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look like a really elegant solution to a tricky problem. Just one comment about namespaces, above.

@HenriDeh
Copy link
Member Author

HenriDeh commented Jul 6, 2023

Latest commits include changes that were needed to work with the new run loop. These are

  • MultiplexTraces other than :state should be pushed to the trajectory at the end of the episode using a PartialNamedTuple struct. This is to ensure no shift occurs between traces.
  • SARTTraces was renamed to SARSATTraces because it includes next_actions, which is not generally needed. This is breaking for experiments that do need the next_action.
  • SARTTraces now do not have a multiplex trace for actions, only for state (it should be called SARSTTraces but it would break everyexperiment that uses it.

@HenriDeh HenriDeh marked this pull request as ready for review July 6, 2023 15:45
@HenriDeh HenriDeh requested a review from jeremiahpslewis July 6, 2023 15:45
@HenriDeh
Copy link
Member Author

HenriDeh commented Jul 6, 2023

All is ready I think. Can you review this a second time ? Then I'll merge and release 0.2 in order to then work on RL.jl to integrate this.

@jeremiahpslewis
Copy link
Member

Latest commits include changes that were needed to work with the new run loop. These are

  • MultiplexTraces other than :state should be pushed to the trajectory at the end of the episode using a PartialNamedTuple struct. This is to ensure no shift occurs between traces.
  • SARTTraces was renamed to SARSATTraces because it includes next_actions, which is not generally needed. This is breaking for experiments that do need the next_action.
  • SARTTraces now do not have a multiplex trace for actions, only for state (it should be called SARSTTraces but it would break everyexperiment that uses it.

I would fix the naming now so it's accurate, given that we haven't had concerns about other breaking changes in the past (clearly we'll need to change the version number of downstream projects once we incorporate 0.2), but if all we need to do on the Experiments side is a find and replace, we'll thank ourselves later that we have an 'honest' / 'clear' API... ;)

@HenriDeh
Copy link
Member Author

HenriDeh commented Jul 7, 2023

Latest commits include changes that were needed to work with the new run loop. These are

  • MultiplexTraces other than :state should be pushed to the trajectory at the end of the episode using a PartialNamedTuple struct. This is to ensure no shift occurs between traces.
  • SARTTraces was renamed to SARSATTraces because it includes next_actions, which is not generally needed. This is breaking for experiments that do need the next_action.
  • SARTTraces now do not have a multiplex trace for actions, only for state (it should be called SARSTTraces but it would break everyexperiment that uses it.

I would fix the naming now so it's accurate, given that we haven't had concerns about other breaking changes in the past (clearly we'll need to change the version number of downstream projects once we incorporate 0.2), but if all we need to do on the Experiments side is a find and replace, we'll thank ourselves later that we have an 'honest' / 'clear' API... ;)

Fair enough. I applied that change. I'll merge as soon as tests are finished. Thanks for your comments.

@HenriDeh HenriDeh merged commit e57329f into main Jul 7, 2023
@HenriDeh HenriDeh deleted the EpisodeContainer branch July 7, 2023 08:49
@HenriDeh
Copy link
Member Author

HenriDeh commented Jul 7, 2023

@JuliaRegistrator register

@JuliaRegistrator
Copy link

Comments on pull requests will not trigger Registrator, as it is disabled. Please try commenting on a commit or issue.

@HenriDeh HenriDeh restored the EpisodeContainer branch July 7, 2023 09:25
@HenriDeh HenriDeh deleted the EpisodeContainer branch July 7, 2023 09:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MultiplexTrace is critically broken
3 participants