Open
Description
Describe the bug, including details regarding any error messages, version, and platform.
When constructing a S3FileSystem object with a region explicitly specified arrow will still end up causing aws metadata lookup operations, unless explicitly (and globally) disabled with the AWS_EC2_METADATA_DISABLED
environment variable.
This goes against the documented (and I believe intended?) behavior of the region
kwarg:
AWS region to connect to. If not set, the AWS SDK will attempt to determine the region using heuristics such as environment variables, configuration profile, EC2 metadata, or default to ‘us-east-1’ when SDK version <1.8.
pyarrow.fs.S3FileSystem(
access_key='key',
secret_key='secret',
endpoint_override='https://my.appliance.uri',
region='region',
)
using py-spy I can observe the following stack:
Aws::Http::CurlHttpClient::MakeRequest (libaws-cpp-sdk-core.so)
Aws::Internal::AWSHttpResourceClient::GetResourceWithAWSWebServiceResult[abi:cxx11] (libaws-cpp-sdk-core.so)
Aws::Internal::EC2MetadataClient::GetCurrentRegion[abi:cxx11] (libaws-cpp-sdk-core.so)
Aws::Client::ClientConfiguration::ClientConfiguration (libaws-cpp-sdk-core.so)
Aws::S3::S3ClientConfiguration::S3ClientConfiguration (libaws-cpp-sdk-s3.so)
__gnu_cxx::new_allocator<arrow::fs::S3FileSystem::Impl>::construct<arrow::fs::S3FileSystem::Impl, arrow::fs::S3Options const&, arrow::io::IOContext const&> (libarrow.so.1801.0.0)
arrow::fs::S3FileSystem::S3FileSystem (libarrow.so.1801.0.0)
arrow::fs::S3FileSystem::Make (libarrow.so.1801.0.0)
S3FileSystem___init__ (pyarrow/_s3fs.cpython-311-x86_64-linux-gnu.so)
This is caused by default construction of S3ClientConfiguration
in ClientBuilder
.
On our machines (which aren't in aws and have no idms running) this takes 6+ seconds.
A workaround is something as follows:
#ifdef ARROW_S3_HAS_S3CLIENT_CONFIGURATION
Aws::S3::S3ClientConfiguration client_config_{Aws::Client::ClientConfigurationInitValues{false}};
#else
Aws::Client::ClientConfiguration client_config_{Aws::Client::ClientConfigurationInitValues{false}};
#endif
which disables idms during configuration construction.
Some additional work would be required to add back in IDMS lookup of region when otherwise not specified.
Component(s)
C++