(@scale): Reading a large number of small S3 objects is slow and might eventually fail

### Describe the bug

In distributed mode, reading a large number of small S3 objects (e.g 20M files) is slow and might eventually fail.

This is caused by the [list objects](https://github.com/aws/aws-sdk-pandas/blob/feat/dynamodb-distributed-parallel-scan/awswrangler/s3/_list.py#L76) call which is not currently parallelised and represents a bottleneck.

As a side note, no information is surfaced to the user during the list object call, meaning they don't have visibility into why the job is hanging. Additional logging would improve that.

### How to Reproduce

## Setup
* 20 million files with .csv extension 
* each file contains 5 lines with 4 columns

## Script
```
import awswrangler as wr
logging.getLogger("awswrangler").setLevel(logging.DEBUG)

df = wr.s3.read_csv("s3://test/small-files/input/20m-csv-partition/")
print(df.head())
```

## Logs
```
2023-02-02 01:00:33,152 - awswrangler._config - DEBUG - Applying default config argument verify with value None.
2023-02-02 01:00:33,155 - botocore.credentials - INFO - Found credentials in shared credentials file: ~/.aws/credentials
2023-02-02 01:00:33,268 - awswrangler.s3._list - DEBUG - args: {'Bucket': 'test', 'Prefix': 'small-files/input/20m-csv-partition/', 'PaginationConfig': {'PageSize': 1000}}
2023-02-02 01:00:33,550 - awswrangler.s3._list - DEBUG - Skipping empty file: s3://test/small-files/input/20m-csv-partition/
2023-02-02 02:00:16,026 - __main__ - INFO -   File "<stdin>", line 8, in <module>
  File "/opt/amazon/python3.9-ray/lib/python3.9/site-packages/awswrangler/s3/_read_text.py", line 281, in read_csv
    return _read_text_format(
  File "/opt/amazon/python3.9-ray/lib/python3.9/site-packages/awswrangler/s3/_read_text.py", line 91, in _read_text_format
    paths: List[str] = _path2list(
  File "/opt/amazon/python3.9-ray/lib/python3.9/site-packages/awswrangler/s3/_list.py", line 31, in _path2list
    paths: List[str] = list_objects(  # type: ignore
  File "/opt/amazon/python3.9-ray/lib/python3.9/site-packages/awswrangler/s3/_list.py", line 358, in list_objects
    return [path for paths in result_iterator for path in paths]
  File "/opt/amazon/python3.9-ray/lib/python3.9/site-packages/awswrangler/s3/_list.py", line 358, in <listcomp>
    return [path for paths in result_iterator for path in paths]
  File "/opt/amazon/python3.9-ray/lib/python3.9/site-packages/awswrangler/s3/_list.py", line 110, in _list_objects
    for page in response_iterator:  # pylint: disable=too-many-nested-blocks
  File "/opt/amazon/python3.9-ray/lib/python3.9/site-packages/botocore/paginate.py", line 269, in __iter__
    response = self._make_request(current_kwargs)
  File "/opt/amazon/python3.9-ray/lib/python3.9/site-packages/botocore/paginate.py", line 357, in _make_request
    return self._method(**current_kwargs)
  File "/opt/amazon/python3.9-ray/lib/python3.9/site-packages/botocore/client.py", line 530, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/opt/amazon/python3.9-ray/lib/python3.9/site-packages/botocore/client.py", line 960, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (ExpiredToken) when calling the ListObjectsV2 operation: The provided token has expired.
2023-02-02 02:00:16,026 - __main__ - WARNING - botocore.exceptions.ClientError: An error occurred (ExpiredToken) when calling the ListObjectsV2 operation: The provided token has expired.
```
Script eventually fails with an expired token

### Expected behavior

_No response_

### Your project

_No response_

### Screenshots

_No response_

### OS

Unix

### Python version

3.9

### AWS SDK for pandas version

3.0.0rc2

### Tasks
- [x] Create 20M CSV files in S3 bucket ([APG](https://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/generate-test-data-using-an-aws-glue-job-and-python.html))
- [x] Test with `ray.data.read_csv` and compare performance
- [x] Consider delegating path listing to Ray or see if we can replicate the same logic
- [ ] Explore parallelising S3 list objects call

Ray implementation: https://github.com/ray-project/ray/blob/master/python/ray/data/datasource/file_meta_provider.py#L189

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

(@scale): Reading a large number of small S3 objects is slow and might eventually fail #1982

Describe the bug

How to Reproduce

Setup

Script

Logs

Expected behavior

Your project

Screenshots

OS

Python version

AWS SDK for pandas version

Tasks

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

(@scale): Reading a large number of small S3 objects is slow and might eventually fail #1982

Description

Describe the bug

How to Reproduce

Setup

Script

Logs

Expected behavior

Your project

Screenshots

OS

Python version

AWS SDK for pandas version

Tasks

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions