Batch DescribeLogGroups calls #1717

duhminick · 2025-06-04T16:28:01Z

Description of the issue

Currently, DescribeLogGroups (DLG) is used to determine the retention policy for a log group. This is done per log group. If a customer has a high number of agent deployments and/or a lot of log groups configured, throttling can be experienced.

Cloudwatch Logs has updated the DLG operation to allow for batching. As such, the agent should use the updated operation to help mitigate DLG throttling.

Description of changes

The AWS SDK has already been updated.
Modified the already existing go routine that processes the DLG channel
The updated routine will now read from the DLG channel then store it into a buffer that will later be the batch.
The batch will be processed and reset when a 50 item (max number of log groups that DLG can accept) limit is reached or when a 5 second timer ticks
- The timer is an unfortunate necessity since at startup, the agent is not aware if the log groups are already ready. So we cannot rely on the agent configuration
The batch processing calls DLG then checks to see if the configured retention policy does not match. If it does not match, then the group is placed into the already existing PutRetentionPolicy (PRP) channel. The existing logic will then update the retention policy.

License

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Tests

Unit tests
Integration tests

Requirements

Before commit the code, please do the following steps.

Run make fmt and make fmt-sh
Run make lint

Lint

plugins/outputs/cloudwatchlogs/internal/pusher/target.go

Co-authored-by: Jeffrey Chien <[email protected]>

plugins/outputs/cloudwatchlogs/internal/pusher/target_test.go

plugins/outputs/cloudwatchlogs/internal/pusher/target.go

plugins/outputs/cloudwatchlogs/internal/pusher/target_test.go

jefchien · 2025-06-13T17:23:38Z

plugins/outputs/cloudwatchlogs/internal/pusher/target.go

-				m.logger.Errorf("failed to describe log group retention for target %v: %v", target, err)
-				time.Sleep(m.calculateBackoff(attempt))
-				continue
+	t := time.NewTicker(5 * time.Second)


I'm sure it's fine, but what's the reasoning behind the ticker vs timer. Are we anticipatingDescribeLogGroup calls minutes after start up?

I think it is possible, but maybe this covers the scenario I'm thinking of:

amazon-cloudwatch-agent/logs/logs.go

Line 174 in 47683ec

func (l *LogAgent) checkRetentionAlreadyAttempted(retention int, logGroup string) int {

Another scenario - my thinking is that it's not too safe to assume that it will only need to be called once with the timer. The system could be slow to initialize the targets so having this on a timer could potentially miss those log groups

plugins/outputs/cloudwatchlogs/internal/pusher/target_test.go

Paramadon · 2025-06-13T20:55:22Z

plugins/outputs/cloudwatchlogs/internal/pusher/target.go

-	retentionChannelSize = 100
+	retentionChannelSize    = 100
+	cacheTTL                = 5 * time.Second
+	logGroupIdentifierLimit = 50


Why 50? I'm guessing this is the api limi?

Yep, it's the API limit :/

Batch DescribeLogGroups calls

fd94cc0

Lint

duhminick force-pushed the dominic-batch-dlg-pr branch from aa7b3c2 to fd94cc0 Compare June 4, 2025 16:36

duhminick added the ready for testing Indicates this PR is ready for integration tests to run label Jun 4, 2025

duhminick marked this pull request as ready for review June 4, 2025 20:42

duhminick requested a review from a team as a code owner June 4, 2025 20:42

duhminick added 2 commits June 9, 2025 15:23

Merge branch 'main' into dominic-batch-dlg-pr

d3b0fe1

Merge branch 'main' into dominic-batch-dlg-pr

a8020a8

movence reviewed Jun 12, 2025

View reviewed changes

plugins/outputs/cloudwatchlogs/internal/pusher/target.go Show resolved Hide resolved

plugins/outputs/cloudwatchlogs/internal/pusher/target.go Outdated Show resolved Hide resolved

duhminick and others added 3 commits June 12, 2025 11:56

Revert retention channel size for DLG/PRP

4ea1bb6

Merge branch 'main' into dominic-batch-dlg-pr

ebcb7bb

Format

166c9d1

jefchien reviewed Jun 12, 2025

View reviewed changes

plugins/outputs/cloudwatchlogs/internal/pusher/target.go Outdated Show resolved Hide resolved

plugins/outputs/cloudwatchlogs/internal/pusher/target.go Outdated Show resolved Hide resolved

duhminick and others added 2 commits June 12, 2025 14:39

Pre-allocate the identifiers array

0792a3d

Co-authored-by: Jeffrey Chien <[email protected]>

Add null check for RetentionInDays

dfe6c5c

jefchien reviewed Jun 12, 2025

View reviewed changes

plugins/outputs/cloudwatchlogs/internal/pusher/target_test.go Show resolved Hide resolved

plugins/outputs/cloudwatchlogs/internal/pusher/target.go Show resolved Hide resolved

Skip invalid retention & add over limit tests

2be444c

duhminick force-pushed the dominic-batch-dlg-pr branch from 4ab4530 to 2be444c Compare June 12, 2025 23:26

jefchien reviewed Jun 13, 2025

View reviewed changes

plugins/outputs/cloudwatchlogs/internal/pusher/target.go Outdated Show resolved Hide resolved

plugins/outputs/cloudwatchlogs/internal/pusher/target.go Show resolved Hide resolved

duhminick added 2 commits June 13, 2025 15:14

Revert retention check

6fc438c

Bump DLG limit to 50

c3019aa

duhminick force-pushed the dominic-batch-dlg-pr branch from 962dd0b to c3019aa Compare June 13, 2025 15:24

Merge branch 'main' into dominic-batch-dlg-pr

be352fc

jefchien reviewed Jun 13, 2025

View reviewed changes

duhminick and others added 3 commits June 13, 2025 18:10

Revert some test changes + parallel

a9c8794

Merge branch 'main' into dominic-batch-dlg-pr

6806569

Call PutRetentionPolicy instead of direct channel usage

4a2885c

jefchien approved these changes Jun 13, 2025

View reviewed changes

Paramadon reviewed Jun 13, 2025

View reviewed changes

Paramadon approved these changes Jun 13, 2025

View reviewed changes

Merge branch 'main' into dominic-batch-dlg-pr

39199ac

Merge branch 'main' into dominic-batch-dlg-pr

c665bce

sky333999 merged commit 89fed6f into main Jun 13, 2025
1 check passed

sky333999 deleted the dominic-batch-dlg-pr branch June 13, 2025 21:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Batch DescribeLogGroups calls #1717

Batch DescribeLogGroups calls #1717

Uh oh!

duhminick commented Jun 4, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jefchien Jun 13, 2025

Uh oh!

duhminick Jun 13, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Paramadon Jun 13, 2025

Uh oh!

duhminick Jun 13, 2025

Uh oh!

Uh oh!

Uh oh!

Batch DescribeLogGroups calls #1717

Batch DescribeLogGroups calls #1717

Uh oh!

Conversation

duhminick commented Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description of the issue

Description of changes

License

Tests

Requirements

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jefchien Jun 13, 2025

Choose a reason for hiding this comment

Uh oh!

duhminick Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Paramadon Jun 13, 2025

Choose a reason for hiding this comment

Uh oh!

duhminick Jun 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

duhminick commented Jun 4, 2025 •

edited

Loading

duhminick Jun 13, 2025 •

edited

Loading