adding an EpisodesBuffer #43

HenriDeh · 2023-06-30T09:54:26Z

This PR adds an EpisodesBuffer.

Why is it useful/necessary? First there is the issue raised in #42 that shows how necessary it is to track episodes. Second, this also allows the use of multi-step batch sampling non-episodic environments that do not stop at a terminal state (they don't have any); or the early stopping of any environment due to say a limited number of steps. Currently multisteps are truncated using the terminal flag.

This PR is not over though. It is just a concept implementation of a wrapper to Traces (EpisodesBuffer) that will store a queue of Episode objects. An Episode is just the start and end indices (kind of a View) of one episode, its length, and whether it is terminated.

Termination is automatically detected when partial insertion happens. Indeed, in the run loop, before acting for the first time, only a (state, action) pair is pushed to the trajectory, then it's (reward, terminal, next_state, next_action).

This is the choice I made at the moment BUT, it is not compatible with the current run loop as the partial inserting only happens for the very first episode. Then at the first step of any other episode, it pushes (reward_prev_ep, terminal_prev_ep, state_new_ep, action_new_ep), that is, the last state of the previous episode is not pushed. This is related to the problem we discussed in #913.

The other way I thought is instead of detecting with partial inserting, we add program PostEpisodeStage to send a termination status to the last episode (/!\ terminal != terminated). I did not choose this because it kinda means that RLTrajectories must rely on whatever package that uses it to send the correct information. Whereas the partial insertion detection is self-contained.

In any case, my proposal is to accompany this PR with another one in RL.jl that would change the run loop as follows.

PreEpisode: cache and push first state and action
start the episode loop
PreAct: nothing, we already know state and action
act!
PostAct: empty current cache then cache reward, terminal, next_state
plan! (with regard to the issue raised in #913
go back to 1 unless reset condition.

This way, plan! is executed at the end of the loop and the trajectory is always complete. As explained in #42, the completeness of the trajectory is ensured by inserting dummy elements between episodes, at the indices that should be excluded from sampling.

This is a big PR and it already took me a lot of time, I'd like to know what you think @jeremiahpslewis before I move on with the...

...Remaining stuff to do:

Rework sampling to avoid sampling (state = last_state_of_ep1, next_state = first_state_of_ep2).
Prioritized sampling compatibility.
Building a Trajectory automatically wraps the container in an EpisodesBuffer.
(How are episodes detected if no MultiplexTraces is present ? ) I think we can safely assume that for all intended purposes of this package there will always be a state + next state as it is core to all RL algorithms.
~~How to make this work with MultiThreadedEnvs ?~~ MTE does not even work with the current implementation of Trajectories.
Should we support append! ? > ~~Only appending another trajectory can work as we cannot infer the valid indices from a named tuple~~ Answer is no because MultiplexTraces do not currently support appending. Appending is just not generally supported at the moment and needs to be implemented in a dedicated PR. Appending will be mostly useful for distributed agents that have their own trajectories to send to a main trajectory.
Bump version
CircularSARTTraces should not have next_action.

All of the following are to be done in the RL.jl PR.

Change docs wherever applicable
Rework the run loop and change compat for Trajectories in RLCore.
Check performance impact (+ can we make this more efficient).

codecov · 2023-06-30T10:27:42Z

Codecov Report

Merging #43 (65978c9) into main (8a5ac1b) will increase coverage by 0.68%.
The diff coverage is 82.26%.

@@            Coverage Diff             @@
##             main      #43      +/-   ##
==========================================
+ Coverage   69.94%   70.63%   +0.68%     
==========================================
  Files          13       15       +2     
  Lines         579      630      +51     
==========================================
+ Hits          405      445      +40     
- Misses        174      185      +11

Impacted Files	Coverage Δ
src/ReinforcementLearningTrajectories.jl	`100.00% <ø> (ø)`
src/common/CircularArraySLARTTraces.jl	`87.50% <0.00%> (-12.50%)`	⬇️
src/common/CircularPrioritizedTraces.jl	`84.37% <ø> (ø)`
src/common/CircularArraySARTTraces.jl	`85.71% <50.00%> (-14.29%)`	⬇️
src/trajectory.jl	`61.90% <60.00%> (-10.32%)`	⬇️
src/samplers.jl	`63.88% <66.66%> (+4.51%)`	⬆️
src/traces.jl	`84.74% <83.33%> (+0.93%)`	⬆️
src/common/CircularArraySARSATTraces.jl	`85.71% <85.71%> (ø)`
src/episodes.jl	`92.00% <92.00%> (ø)`
src/normalization.jl	`63.09% <100.00%> (+1.55%)`	⬆️

... and 1 file with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

HenriDeh · 2023-07-04T13:28:42Z

I changed the approach. Regarding sampleable steps, the EB now simply keeps a bit array with 1s where steps are valid samples, 0s otherwise. This array is then used as weights with the weighted sampling functionalities of StatsBase, basically valid steps all have weight 1 and invalid steps weight 0. This certainly adds overhead to sampling but I'm not expecting it to be significant.

jeremiahpslewis

Look like a really elegant solution to a tricky problem. Just one comment about namespaces, above.

test/samplers.jl

…entLearning/ReinforcementLearningTrajectories.jl into EpisodeContainer

HenriDeh · 2023-07-06T15:35:03Z

Latest commits include changes that were needed to work with the new run loop. These are

MultiplexTraces other than :state should be pushed to the trajectory at the end of the episode using a PartialNamedTuple struct. This is to ensure no shift occurs between traces.
SARTTraces was renamed to SARSATTraces because it includes next_actions, which is not generally needed. This is breaking for experiments that do need the next_action.
SARTTraces now do not have a multiplex trace for actions, only for state (it should be called SARSTTraces but it would break everyexperiment that uses it.

HenriDeh · 2023-07-06T15:47:41Z

All is ready I think. Can you review this a second time ? Then I'll merge and release 0.2 in order to then work on RL.jl to integrate this.

src/episodes.jl

jeremiahpslewis · 2023-07-06T15:55:10Z

Latest commits include changes that were needed to work with the new run loop. These are

MultiplexTraces other than :state should be pushed to the trajectory at the end of the episode using a PartialNamedTuple struct. This is to ensure no shift occurs between traces.

SARTTraces was renamed to SARSATTraces because it includes next_actions, which is not generally needed. This is breaking for experiments that do need the next_action.

SARTTraces now do not have a multiplex trace for actions, only for state (it should be called SARSTTraces but it would break everyexperiment that uses it.

I would fix the naming now so it's accurate, given that we haven't had concerns about other breaking changes in the past (clearly we'll need to change the version number of downstream projects once we incorporate 0.2), but if all we need to do on the Experiments side is a find and replace, we'll thank ourselves later that we have an 'honest' / 'clear' API... ;)

HenriDeh · 2023-07-07T08:45:50Z

Latest commits include changes that were needed to work with the new run loop. These are

MultiplexTraces other than :state should be pushed to the trajectory at the end of the episode using a PartialNamedTuple struct. This is to ensure no shift occurs between traces.

SARTTraces was renamed to SARSATTraces because it includes next_actions, which is not generally needed. This is breaking for experiments that do need the next_action.

SARTTraces now do not have a multiplex trace for actions, only for state (it should be called SARSTTraces but it would break everyexperiment that uses it.

I would fix the naming now so it's accurate, given that we haven't had concerns about other breaking changes in the past (clearly we'll need to change the version number of downstream projects once we incorporate 0.2), but if all we need to do on the Experiments side is a find and replace, we'll thank ourselves later that we have an 'honest' / 'clear' API... ;)

Fair enough. I applied that change. I'll merge as soon as tests are finished. Thanks for your comments.

HenriDeh · 2023-07-07T08:50:50Z

@JuliaRegistrator register

JuliaRegistrator · 2023-07-07T08:50:52Z

Comments on pull requests will not trigger Registrator, as it is disabled. Please try commenting on a commit or issue.

HenriDeh added 12 commits June 19, 2023 19:08

recreate episodes

909c364

merge

b926a33

implement append traces with multiplex

50587be

Merge branch 'append' into EpisodeContainer

85f6e3d

add Episodesbuffer

9804d45

include

8b4bb20

add RLTraj.capacity and remove old Episode stuff

71bf1c5

new capacity definition

e05243f

add tests

57163f3

dummy samples

c1d6b8d

remove pointer artifact

3f52521

remove old episode tests

e8a740f

HenriDeh linked an issue Jun 30, 2023 that may be closed by this pull request

MultiplexTrace is critically broken #42

Closed

fix tests

fb81276

HenriDeh added 7 commits July 3, 2023 15:51

Rework episodes

d73316e

typing

258c5e0

StatsBase and sampling

748e876

StatsBase

c6007b4

no export Episodes

2586d55

test sampling

0a2424d

Priorities compat

9f7ec72

fix tests

2fd8db2

jeremiahpslewis requested changes Jul 5, 2023

View reviewed changes

test/samplers.jl Show resolved Hide resolved

HenriDeh added 5 commits July 6, 2023 12:52

Autowrap and normalization compat

178ef24

bump version

d180a8d

update readme

8ccbf95

update readme

2571796

Merge branch 'EpisodeContainer' of https://github.com/JuliaReinforcem…

767f764

…entLearning/ReinforcementLearningTrajectories.jl into EpisodeContainer

HenriDeh added 10 commits July 6, 2023 14:45

increase codecov

63eb4c9

rename to SARSA trace

3a58315

partial insertion forcing

6fffbd1

add trajectory key fetcher

3495151

add a comment

01e8117

fix sarsa capacity

6dbe801

move partial

2b2eb4a

deal with partialnt indexation

824765f

add tests for partial namedtuple

3a05be0

Test without partialnt

ca511f7

sample namespace

65978c9

HenriDeh marked this pull request as ready for review July 6, 2023 15:45

HenriDeh requested a review from jeremiahpslewis July 6, 2023 15:45

jeremiahpslewis requested changes Jul 6, 2023

View reviewed changes

src/episodes.jl Outdated Show resolved Hide resolved

HenriDeh added 3 commits July 7, 2023 10:36

drop commented code

8484578

rename to SARST

b81d18f

rename traces

8205e9e

HenriDeh merged commit e57329f into main Jul 7, 2023

HenriDeh deleted the EpisodeContainer branch July 7, 2023 08:49

HenriDeh restored the EpisodeContainer branch July 7, 2023 09:25

HenriDeh deleted the EpisodeContainer branch July 7, 2023 09:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

adding an EpisodesBuffer #43

adding an EpisodesBuffer #43

Uh oh!

HenriDeh commented Jun 30, 2023 •

edited

Loading

Uh oh!

codecov bot commented Jun 30, 2023 •

edited

Loading

Uh oh!

HenriDeh commented Jul 4, 2023

Uh oh!

jeremiahpslewis left a comment

Uh oh!

Uh oh!

HenriDeh commented Jul 6, 2023

Uh oh!

HenriDeh commented Jul 6, 2023

Uh oh!

Uh oh!

jeremiahpslewis commented Jul 6, 2023

Uh oh!

HenriDeh commented Jul 7, 2023

Uh oh!

HenriDeh commented Jul 7, 2023

Uh oh!

JuliaRegistrator commented Jul 7, 2023

Uh oh!

Uh oh!

adding an EpisodesBuffer #43

adding an EpisodesBuffer #43

Uh oh!

Conversation

HenriDeh commented Jun 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jun 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

HenriDeh commented Jul 4, 2023

Uh oh!

jeremiahpslewis left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

HenriDeh commented Jul 6, 2023

Uh oh!

HenriDeh commented Jul 6, 2023

Uh oh!

Uh oh!

jeremiahpslewis commented Jul 6, 2023

Uh oh!

HenriDeh commented Jul 7, 2023

Uh oh!

HenriDeh commented Jul 7, 2023

Uh oh!

JuliaRegistrator commented Jul 7, 2023

Uh oh!

Uh oh!

HenriDeh commented Jun 30, 2023 •

edited

Loading

codecov bot commented Jun 30, 2023 •

edited

Loading