Fix multithreading s3 memory leak #1495

ddelange · 2025-04-01T19:24:56Z

storages.s3 has a memory leak

this PR adds up to two thread-safe boto3 resource per-storage (one signed one unsigned) instead of two per-thread, and adds a 1 hour time to live cache for the boto3 resources such that it gets periodically recreated to avoid this memory leak.

the memory leak on infra level (6 day view):
memray --leaks profile consistently shows a long lived s3 client as the culprit (just a random snapshot from the entire profile):

jschneier · 2025-04-02T02:30:04Z

Thanks for opening. I don't think we can collapse UNSIGNED down like this. The reason that was written at all is that if you want to upload something you need to use a signed version regardless. Maintaining a single storage API that can do both uploading and generating URLs requires hiding the complexity within the class.

ddelange · 2025-04-02T07:04:09Z

storages/backends/s3.py

@@ -691,10 +674,7 @@ def url(self, name, parameters=None, expire=None, http_method=None):
        params["Bucket"] = self.bucket.name
        params["Key"] = name

-        connection = (
-            self.connection if self.querystring_auth else self.unsigned_connection
-        )


hi @jschneier 👋

I didn't see any point in code where self.querystring_auth gets altered after init, and I didn't see any other point in code where self.unsigned_connection is used, so this seemed like a safe refactor to me (the class instance only ever uses either connection, or unsigned_connection, so this can be collapsed). I didn't look at parent classes though.

ah, but the same is not true for connection! you're right, will adjust 👍

ddelange · 2025-04-02T09:49:16Z

This is green now on my fork 👍

ddelange · 2025-04-03T05:11:12Z

this has been live since April 2nd and as you can see, we are now consistently below 50% memory 🎉

pip install 'django-storages[s3]@https://github.com/jschneier/django-storages/archive/4623ca742f6d40d7aeb3d9b4d47f8bcbe3163fc5.zip'

ddelange · 2025-04-03T05:27:31Z

storages/backends/s3.py

+                max_pool_connections=64,  # thread-safe
+                tcp_keepalive=True,
+                retries={"max_attempts": 6, "mode": "adaptive"},


fyi, these values have been empirically chosen using a c6 family EC2 machine talking directly to s3 from the AWS network using this code. increasing threads beyond 64 starts to become unstable and does not yield a total throughput increase.

Any idea what the defaults are?

People run on all kinds of hardware and I'm mostly inclined to say this should be configured by the user if they want.

the default is no retries, no keepalive, 10 conns. the above is the sensible default for multithreaded applications like django servers imo:

10 conns is a bottleneck for IO bound views (this used to not be an issue because, before this PR, each thread created its own boto3 resource).

tcp_keepalive is a huge speedup at no additional cost

(especially aws) s3 infrastructure is prone to intermittent failures at high throughput.

people can pass their own client_config if the need arises, but adding these doesn't put load on people's hardware and only makes things more robust and production-ready.

ddelange · 2025-04-03T12:10:21Z

Alright, this last commit took some trial and error. The test that the commit adds was essential, and prompted that change.

ddelange · 2025-04-04T07:13:12Z

Hi @jschneier 👋

This is ready for review

ddelange · 2025-04-14T11:48:13Z

storages/backends/s3.py

+try:
+    from functools import cached_property
+except ImportError:  # python_version<='3.7'
+    try:
+        from backports.cached_property import cached_property
+    except (ImportError, ModuleNotFoundError) as e:
+        msg = "Could not import backports.cached_property. Did you run 'pip install django-storages[s3]'?"  # noqa: E501
+        raise ImproperlyConfigured(msg) from e


@jschneier is the plan to support python 3.7 much longer? since it's end of life for almost 2 years now

It was supported because I had not yet dropped Django3.2 but that is now done.

jschneier · 2025-04-28T01:05:27Z

Thanks for this PR. My main hesitation is taking a dependency on a 3rd party library, namely cachetools in this case. Generally speaking I prefer not to do that for all the typical reasons.

Is there no other way to solve the problem, even in a less good way?

…into memory-leak * 'master' of https://github.com/jschneier/django-storages: Add moto5 support (jschneier#1464) Drop support for Django3.2 and Django4.1 (jschneier#1505) [ci] Update CI to use Ubuntu 22.04 for testing (jschneier#1502) Bump version for release (jschneier#1497) Release version 1.14.6 (jschneier#1496) [s3] Default `url_protocol` to `https:` if set to None (jschneier#1483)

ddelange · 2025-04-28T08:09:59Z

Hi @jschneier 👋

I've added a commit to remove dependency on cachetools (and cached_property). ready for review again!

ddelange · 2025-05-15T06:24:15Z

@jschneier kind reminder :)

Fix multithreading s3 memory leak

bb59029

ddelange force-pushed the memory-leak branch from 4fd1e22 to bb59029 Compare April 1, 2025 20:07

ddelange added 5 commits April 1, 2025 23:32

Fix CI

c1016d1

Run black

6a36077

Remove threading test

faa924d

Demolish

ae1cdd9

Override AWS_QUERYSTRING_AUTH

c941ae7

ddelange commented Apr 2, 2025

View reviewed changes

ddelange added 4 commits April 2, 2025 12:31

PR suggestion

27fa7c3

Run black

5c9c2b2

Fix more tests

9336408

Separate cachetools import

872a66a

ddelange mentioned this pull request Apr 2, 2025

Fix multithreading s3 memory leak ddelange/django-storages#1

Closed

ddelange commented Apr 3, 2025

View reviewed changes

ddelange force-pushed the memory-leak branch 4 times, most recently from 29907eb to 7a9a7d7 Compare April 3, 2025 11:54

Maintain a separate ttl_cache for each instance

b834c75

ddelange force-pushed the memory-leak branch from 7a9a7d7 to b834c75 Compare April 3, 2025 12:09

ddelange force-pushed the memory-leak branch 5 times, most recently from 0dd808f to 348bee8 Compare April 4, 2025 07:10

Fix pickling and add docs

4623ca7

ddelange force-pushed the memory-leak branch from 348bee8 to 4623ca7 Compare April 4, 2025 07:12

ddelange commented Apr 14, 2025

View reviewed changes

ddelange added 2 commits April 28, 2025 11:06

Remove dependency on cachetools

821d51a

ddelange force-pushed the memory-leak branch from e6d5133 to ce39fc3 Compare April 28, 2025 08:07

ddelange requested a review from jschneier April 28, 2025 08:10

ddelange added 3 commits April 28, 2025 18:31

Improve diff

fc51725

Bring back code comments

707d630

Simplify

5ee1760

ddelange mentioned this pull request Jun 10, 2025

Excessive memory usage on multithreading boto/boto3#1670

Open

ddelange added 2 commits June 23, 2025 12:52

Add reference to boto3 docs

6eab2c5

Break up long line

fb49bce

Uh oh!

Fix multithreading s3 memory leak #1495

Are you sure you want to change the base?

Fix multithreading s3 memory leak #1495

Uh oh!

Conversation

ddelange commented Apr 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jschneier commented Apr 2, 2025

Uh oh!

ddelange Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ddelange Apr 2, 2025

Choose a reason for hiding this comment

Uh oh!

ddelange Apr 2, 2025

Choose a reason for hiding this comment

Uh oh!

ddelange commented Apr 2, 2025

Uh oh!

ddelange commented Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ddelange Apr 3, 2025

Choose a reason for hiding this comment

Uh oh!

jschneier Apr 28, 2025

Choose a reason for hiding this comment

Uh oh!

ddelange Apr 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ddelange commented Apr 3, 2025

Uh oh!

ddelange commented Apr 4, 2025

Uh oh!

ddelange Apr 14, 2025

Choose a reason for hiding this comment

Uh oh!

jschneier Apr 28, 2025

Choose a reason for hiding this comment

Uh oh!

jschneier commented Apr 28, 2025

Uh oh!

ddelange commented Apr 28, 2025

Uh oh!

ddelange commented May 15, 2025

Uh oh!

Uh oh!

ddelange commented Apr 1, 2025 •

edited

Loading

ddelange Apr 2, 2025 •

edited

Loading

ddelange commented Apr 3, 2025 •

edited

Loading

ddelange Apr 28, 2025 •

edited

Loading