[Question] Rationale for Absence of Explicit Gradient Clipping in SAC Implementation #2123

liruiluo · 2025-04-24T02:26:29Z

❓ Question

Hi Stable-Baselines3 team,

I've been studying the SAC implementation within the library and comparing it to other algorithms like PPO. I noticed that PPO includes an explicit max_grad_norm parameter for gradient clipping, which is often helpful for stabilizing training.

However, when looking through the SAC documentation and the source code (specifically the train method and optimizer steps for the actor and critics), I couldn't find an equivalent explicit mechanism or parameter for gradient clipping being applied by default during the main policy and value function updates.

My Question:

Could you please shed some light on the design decision behind not including explicit gradient clipping (like max_grad_norm) in the SAC actor and critic updates?

Is gradient clipping generally considered less critical for the stability or performance of SAC compared to algorithms like PPO?
Are there alternative mechanisms within the SAC algorithm (perhaps related to entropy maximization or the target network updates) that implicitly provide sufficient stabilization, making explicit clipping unnecessary?
Or is it simply an implementation choice left to the user to add if needed via custom policies or callbacks?
Understanding the reasoning here would be very helpful for better appreciating the nuances of the SAC algorithm and its robust implementation in Stable-Baselines3.

Checklist

I have checked that there is no similar issue in the repo
I have read the documentation
If code there is, it is minimal and working
If code there is, it is formatted using the markdown code blocks for both code and stack traces.

araffin · 2025-04-24T09:25:25Z

Hello,

Could you please shed some light on the design decision behind not including explicit gradient clipping

The answer is pretty simple, we followed the original implementation.
And yes in general, it seems that it is not needed to get good performance.

However, I was also planning to give it a try (with https://github.com/araffin/sbx) to see if it would have an influence (no significant difference so far).

that implicitly provide sufficient stabilization, making explicit clipping unnecessary?

Smaller learning rate, bigger batch size, policy delay, smaller polyak coefficient, Adam beta parameters, ...

liruiluo · 2025-04-24T09:42:19Z

Hello,

Could you please shed some light on the design decision behind not including explicit gradient clipping

The answer is pretty simple, we followed the original implementation. And yes in general, it seems that it is not needed to get good performance.

However, I was also planning to give it a try (with https://github.com/araffin/sbx) to see if it would have an influence (no significant difference so far).

that implicitly provide sufficient stabilization, making explicit clipping unnecessary?

Smaller learning rate, bigger batch size, policy delay, smaller polyak coefficient, Adam beta parameters, ...

Thank you for the clear explanation, @araffin! It makes sense that the implementation follows the original paper and that explicit clipping often isn't necessary for general performance with SAC.

Interestingly, in my own experiments specifically using TQC from the sbx library, I've consistently observed very high actor and critic losses on the peg-unplug-side-v2 task. This observation is what led me to try adding gradient clipping (max_grad_norm), treating it as a potential stabilization measure specifically for what seems to be a challenging environment for the algorithm.

Perhaps this suggests that while not universally required, gradient clipping might still prove beneficial for enhancing stability in certain particularly demanding environments, or maybe even more so for TQC compared to standard SAC under such conditions.

I appreciate you sharing that you're also exploring this in sbx – I'd be interested to hear if you encounter similar environment-specific behaviors in your tests.

araffin · 2025-05-05T08:21:29Z

This observation is what led me to try adding gradient clipping (max_grad_norm), treating it as a potential stabilization measure specifically for what seems to be a challenging environment for the algorithm.

Did it improve the performance in your case?

liruiluo · 2025-05-06T17:06:26Z

This observation is what led me to try adding gradient clipping (max_grad_norm), treating it as a potential stabilization measure specifically for what seems to be a challenging environment for the algorithm.

Did it improve the performance in your case?

No...

liruiluo added the question Further information is requested label Apr 24, 2025

liruiluo changed the title ~~[Question] question title~~ [Question] Rationale for Absence of Explicit Gradient Clipping in SAC Implementation Apr 24, 2025

liruiluo closed this as completed May 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Question] Rationale for Absence of Explicit Gradient Clipping in SAC Implementation #2123

[Question] Rationale for Absence of Explicit Gradient Clipping in SAC Implementation #2123

liruiluo commented Apr 24, 2025

araffin commented Apr 24, 2025

Uh oh!

liruiluo commented Apr 24, 2025

Uh oh!

araffin commented May 5, 2025

Uh oh!

liruiluo commented May 6, 2025

Uh oh!

[Question] Rationale for Absence of Explicit Gradient Clipping in SAC Implementation #2123

[Question] Rationale for Absence of Explicit Gradient Clipping in SAC Implementation #2123

Comments

liruiluo commented Apr 24, 2025

❓ Question

Checklist

araffin commented Apr 24, 2025

Uh oh!

liruiluo commented Apr 24, 2025

Uh oh!

araffin commented May 5, 2025

Uh oh!

liruiluo commented May 6, 2025

Uh oh!