Description
for any game which set the "action_mask" not equal all 1, for example when creating the BaseEnv:
if not self._continuous:
action_mask = np.ones(self.discrete_action_num, 'int8')
else:
action_mask = None
# Here I set the action 2 to be invalid:
action_mask[2] = 0
obs = {'observation': obs, 'action_mask': action_mask, 'to_play': -1}
return BaseEnvTimestep(obs, rew, done, info)
Will result in the following error:
Traceback (most recent call last):
File "./zoo/custom/pkgir/config/pjk_disc_gumbel_muzero_config.py", line 93, in
train_muzero([main_config, create_config], seed=0, max_env_step=max_env_step)
File "/home/LightZero-main/lzero/entry/train_muzero.py", line 174, in train_muzero
train_data = replay_buffer.sample(batch_size, policy)
File "/home/LightZero-main/lzero/mcts/buffer/game_buffer_muzero.py", line 76, in sample
batch_target_policies_non_re = self._compute_target_policy_non_reanalyzed(
File "/home/LightZero-main/lzero/mcts/buffer/game_buffer_muzero.py", line 681, in _compute_target_policy_non_reanalyzed
batch_target_policies_non_re = np.asarray(batch_target_policies_non_re)
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 2 dimensions. The detected shape was (128, 6) + inhomogeneous part.
Exception ignored in: <function MuZeroEvaluator.del at 0x7f8bebff93a0>
After reading the code in game_buffer_muzero around p.661
I found that when
if self._cfg.env_type == 'not_board_games':
The legal_actions isn't processed. But when the case is board game, the legal action is processed.
So I guess the action_mask for not_board_games scenario isn't supported?