Skip to content

Button Press Topdown Wall Expert Policy gives 0 reward but solves task #481

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
reginald-mclean opened this issue May 21, 2024 · 1 comment

Comments

@reginald-mclean
Copy link
Collaborator

No description provided.

@andrewwwj
Copy link

It keeps returning 0 rewards as the SawyerButtonPressTopdownWallEnvV2.compute_reward simply computes hamacher_product of tcp_closed and near_button regardless of tcp_to_obj (where tcp_closed returns 0 as the positions of left/right fingers are too distant to be clipped as 1.0 in SawyerXYZEnv._get_curr_obs_combined_no_goal), followed by reward += 5 * button_pressed until tcp_to_obj reaches 0.03`

reward = 5 * reward_utils.hamacher_product(tcp_closed, near_button)
if tcp_to_obj <= 0.03:
    reward += 5 * button_pressed

which results in some sort of "sparse" rewards.

Instead, when I computed rewards using tcp_opened as SawyerButtonPressEnvV2 did with a bit loose thresold, such as:

tcp_opened = max(obs[3], 0.0)
reward = 5 * reward_utils.hamacher_product(tcp_opened, near_button)
if tcp_to_obj <= 0.07:
    reward += 5 * button_pressed

it gave dense rewards until success in SawyerButtonPressTopdownWallEnvV2 either.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants