Skip to content

[Question] Rationale for Absence of Explicit Gradient Clipping in SAC Implementation #2123

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
4 tasks done
liruiluo opened this issue Apr 24, 2025 · 4 comments
Closed
4 tasks done
Labels
question Further information is requested

Comments

@liruiluo
Copy link

❓ Question

Hi Stable-Baselines3 team,

I've been studying the SAC implementation within the library and comparing it to other algorithms like PPO. I noticed that PPO includes an explicit max_grad_norm parameter for gradient clipping, which is often helpful for stabilizing training.

However, when looking through the SAC documentation and the source code (specifically the train method and optimizer steps for the actor and critics), I couldn't find an equivalent explicit mechanism or parameter for gradient clipping being applied by default during the main policy and value function updates.

My Question:

Could you please shed some light on the design decision behind not including explicit gradient clipping (like max_grad_norm) in the SAC actor and critic updates?

Is gradient clipping generally considered less critical for the stability or performance of SAC compared to algorithms like PPO?
Are there alternative mechanisms within the SAC algorithm (perhaps related to entropy maximization or the target network updates) that implicitly provide sufficient stabilization, making explicit clipping unnecessary?
Or is it simply an implementation choice left to the user to add if needed via custom policies or callbacks?
Understanding the reasoning here would be very helpful for better appreciating the nuances of the SAC algorithm and its robust implementation in Stable-Baselines3.

Checklist

@liruiluo liruiluo added the question Further information is requested label Apr 24, 2025
@liruiluo liruiluo changed the title [Question] question title [Question] Rationale for Absence of Explicit Gradient Clipping in SAC Implementation Apr 24, 2025
@araffin
Copy link
Member

araffin commented Apr 24, 2025

Hello,

Could you please shed some light on the design decision behind not including explicit gradient clipping

The answer is pretty simple, we followed the original implementation.
And yes in general, it seems that it is not needed to get good performance.

However, I was also planning to give it a try (with https://github.com/araffin/sbx) to see if it would have an influence (no significant difference so far).

that implicitly provide sufficient stabilization, making explicit clipping unnecessary?

Smaller learning rate, bigger batch size, policy delay, smaller polyak coefficient, Adam beta parameters, ...

@liruiluo
Copy link
Author

Hello,

Could you please shed some light on the design decision behind not including explicit gradient clipping

The answer is pretty simple, we followed the original implementation. And yes in general, it seems that it is not needed to get good performance.

However, I was also planning to give it a try (with https://github.com/araffin/sbx) to see if it would have an influence (no significant difference so far).

that implicitly provide sufficient stabilization, making explicit clipping unnecessary?

Smaller learning rate, bigger batch size, policy delay, smaller polyak coefficient, Adam beta parameters, ...

Thank you for the clear explanation, @araffin! It makes sense that the implementation follows the original paper and that explicit clipping often isn't necessary for general performance with SAC.

Interestingly, in my own experiments specifically using TQC from the sbx library, I've consistently observed very high actor and critic losses on the peg-unplug-side-v2 task. This observation is what led me to try adding gradient clipping (max_grad_norm), treating it as a potential stabilization measure specifically for what seems to be a challenging environment for the algorithm.

Perhaps this suggests that while not universally required, gradient clipping might still prove beneficial for enhancing stability in certain particularly demanding environments, or maybe even more so for TQC compared to standard SAC under such conditions.

I appreciate you sharing that you're also exploring this in sbx – I'd be interested to hear if you encounter similar environment-specific behaviors in your tests.

@araffin
Copy link
Member

araffin commented May 5, 2025

This observation is what led me to try adding gradient clipping (max_grad_norm), treating it as a potential stabilization measure specifically for what seems to be a challenging environment for the algorithm.

Did it improve the performance in your case?

@liruiluo liruiluo closed this as completed May 6, 2025
@liruiluo
Copy link
Author

liruiluo commented May 6, 2025

This observation is what led me to try adding gradient clipping (max_grad_norm), treating it as a potential stabilization measure specifically for what seems to be a challenging environment for the algorithm.

Did it improve the performance in your case?

No...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants