auth_fail closely followed by authorize_success on Streams EventBus #969

ColoradoKiwi · 2025-01-28T18:29:28Z

Hi,
In our custom streams app with a hidden CCP we are getting some cases where an agent is logged in and working, servicing contacts, and the "AUTH_FAIL" event gets fired on the Streams EventBus (we do connect.core.getEventBus().subscribe(...)).
We have an event handler for auth_fail that forces the agent to log in again (on the assumption, based on the docs, that if auth_fail occurs, the user's authenticated session with Connect is no longer valid).

In troubleshooting these occurrences, we often see from Streams logs the auth_fail being closely followed (within 1-2 seconds) by a subsequent "authorize_success" (not mentioned in the docs, but available in code). However with our current approach, its too late as we have already prompted the user for to re-sign-in.

We are very familiar with the streams/connect cookie expiring each hour, and prior to expiry, streams doing the continual/auto refresh throughout the day/user-working-session (which usually works fine).

In cases where this occurs, the streams logs are indicating that the Expiry time is still in the future, and yet the auth_fail / 401 is still being returned, quickly followed by the authorize_success. See example log sequence below...
This example is a Europe user (CET timezone) - note the timestamps - auth_fail fires at UTC 13:44:20 (yet the suggested cookie expiry is logged as 15:09:20 CET - i.e. UTC 14:09:20 - still 25 minutes in the future. Then 1 second later at UTC 13:44:21 the authorize_success event is fired.

[2025-01-17T13:44:04.505Z] [TRACE] [...]: Publishing event: client_side_logs
[2025-01-17T13:44:10.710Z] [DEBUG] [...]: DEBUG AMZ_WEB_SOCKET_MANAGER::Sending aws/ping deep heartbeat
[2025-01-17T13:44:10.711Z] [DEBUG] [...]: DEBUG AMZ_WEB_SOCKET_MANAGER::Sending heartbeat
[2025-01-17T13:44:10.824Z] [DEBUG] [...]: DEBUG AMZ_WEB_SOCKET_MANAGER::Heartbeat response received
[2025-01-17T13:44:10.848Z] [DEBUG] [...]: DEBUG AMZ_WEB_SOCKET_MANAGER::aws/ping deep heartbeat received
[2025-01-17T13:44:20.114Z] [TRACE] [...]: [agent-snapshot.expiration-tracker]: Snapshot items expired
[2025-01-17T13:44:20.115Z] [TRACE] [...]: [agent-snapshot.snapshot-store]: Snapshot cleanup complete
[2025-01-17T13:44:20.303Z] [INFO] [...]: Auth token expires at Fri Jan 17 2025 15:09:20 GMT+0100 (Central European Standard Time) Start refreshing token with retry.
[2025-01-17T13:44:20.426Z] [TRACE] [...]: Publishing event: auth_fail <<= AUTH_FAIL firing here
[2025-01-17T13:44:20.453Z] [WARN] [...]: ...: Amazon Connect Streams Bus notified Connection Broken - event: auth_fail
<< User notification to re-sign-in is popped by our app here >>

[2025-01-17T13:44:20.425Z] [ERROR] [...]: Authorization failed with code 401
[2025-01-17T13:44:20.711Z] [DEBUG] [...]: DEBUG AMZ_WEB_SOCKET_MANAGER::Sending aws/ping deep heartbeat
[2025-01-17T13:44:20.711Z] [DEBUG] [...]: DEBUG AMZ_WEB_SOCKET_MANAGER::Sending heartbeat
[2025-01-17T13:44:20.829Z] [DEBUG] [...]: DEBUG AMZ_WEB_SOCKET_MANAGER::Heartbeat response received
[2025-01-17T13:44:20.847Z] [DEBUG] [...]: DEBUG AMZ_WEB_SOCKET_MANAGER::aws/ping deep heartbeat received
[2025-01-17T13:44:21.152Z] [TRACE] [...]: Publishing event: authorize_success <== auth success firing here (1 second later)
[2025-01-17T13:44:23.716Z] [TRACE] [...]: Publishing event: agent::update
[2025-01-17T13:44:23.716Z] [TRACE] [...]: Publishing event: agent::refresh
[2025-01-17T13:44:23.717Z] [TRACE] [...]: Publishing event: broadcast
[2025-01-17T13:44:23.717Z] [TRACE] [...]: Publishing event: broadcast
[2025-01-17T13:44:23.717Z] [TRACE] [...]: Publishing event: broadcast
[2025-01-17T13:44:23.717Z] [TRACE] [...]: Publishing event: broadcast
[2025-01-17T13:44:23.715Z] [TRACE] [...]: AWSClient: <-- Operation 'getAgentSnapshot' succeeded.

Can you indicate what scenario's might cause this behavior? In many/most customers we never see this behavior, and for the customers reporting it, it happens sporadically across different users.

We are considering holding on to the AUTH_FAIL event, and subscribing to the authorization_success event also, and only acting on the AUTH_FAIL if we don't see a subsequent authorize_success being fired in the next ~10 seconds (or so).
But prior to making that change would like to know what scenario's might be causing the above behavior.

Any insight appreciated. Thanks.

ColoradoKiwi · 2025-02-12T12:38:12Z

Nudging.
Can we get input from Connect team on what scenario's would / or are causing this behavior?
We've gone ahead with adding the work-around noted above, but still would like to understand why or what scenario's cause AUTH_FAIL to get fired when the Cookie is clearly not expired yet, and then almost immediately followed by AUTHORIZE_SUCCESS.

Its not easily re-creatable at will, but continues to sporadically happen.

Any insight appreciated!
Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

auth_fail closely followed by authorize_success on Streams EventBus #969

auth_fail closely followed by authorize_success on Streams EventBus #969

ColoradoKiwi commented Jan 28, 2025

ColoradoKiwi commented Feb 12, 2025 •

edited

Loading

auth_fail closely followed by authorize_success on Streams EventBus #969

auth_fail closely followed by authorize_success on Streams EventBus #969

Comments

ColoradoKiwi commented Jan 28, 2025

ColoradoKiwi commented Feb 12, 2025 • edited Loading

ColoradoKiwi commented Feb 12, 2025 •

edited

Loading