Skip to content

Reading local Athena cache throws RuntimeError when multi-threading #3136

Closed
@david-mateo

Description

@david-mateo

Describe the bug

The Athena cache manager iterates over a dictionary of cache entries. If several threads are updating the cache in parallel, one may get a RuntimeError: dictionary changed size during iteration.

How to Reproduce

Run enough awswrangler.athena.read_sql_query in parallel with cache on and different queries to trigger a RuntimeError on _LocalMetadataCacheManager.get_queries

Expected behavior

The presence of a lock on the writing part of the cache manager introduced in #2299 seems to indicate that awswrangler.athena.read_sql_query is intended to be thread-safe when using the local cache.
I would expect thread-safety to be extended to reading the cache by using the same lock on the reading method.

Your project

No response

Screenshots

No response

OS

Linux

Python version

3.12.2

AWS SDK for pandas version

3.11.0

Additional context

related to #2296

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions