Button Press Topdown Wall Expert Policy gives 0 reward but solves task #481

reginald-mclean · 2024-05-21T14:12:40Z

No description provided.

andrewwwj · 2025-05-14T14:37:37Z

It keeps returning 0 rewards as the SawyerButtonPressTopdownWallEnvV2.compute_reward simply computes hamacher_product of tcp_closed and near_button regardless of tcp_to_obj (where tcp_closed returns 0 as the positions of left/right fingers are too distant to be clipped as 1.0 in SawyerXYZEnv._get_curr_obs_combined_no_goal), followed by reward += 5 * button_pressed until tcp_to_obj reaches 0.03`

reward = 5 * reward_utils.hamacher_product(tcp_closed, near_button)
if tcp_to_obj <= 0.03:
    reward += 5 * button_pressed

which results in some sort of "sparse" rewards.

Instead, when I computed rewards using tcp_opened as SawyerButtonPressEnvV2 did with a bit loose thresold, such as:

tcp_opened = max(obs[3], 0.0)
reward = 5 * reward_utils.hamacher_product(tcp_opened, near_button)
if tcp_to_obj <= 0.07:
    reward += 5 * button_pressed

it gave dense rewards until success in SawyerButtonPressTopdownWallEnvV2 either.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Button Press Topdown Wall Expert Policy gives 0 reward but solves task #481

Button Press Topdown Wall Expert Policy gives 0 reward but solves task #481

reginald-mclean commented May 21, 2024

andrewwwj commented May 14, 2025

Button Press Topdown Wall Expert Policy gives 0 reward but solves task #481

Button Press Topdown Wall Expert Policy gives 0 reward but solves task #481

Comments

reginald-mclean commented May 21, 2024

andrewwwj commented May 14, 2025