Skip to content

[C++] S3FileSystem construction causes IDMS lookups for region even when specified #46214

Open
@apmorton

Description

@apmorton

Describe the bug, including details regarding any error messages, version, and platform.

When constructing a S3FileSystem object with a region explicitly specified arrow will still end up causing aws metadata lookup operations, unless explicitly (and globally) disabled with the AWS_EC2_METADATA_DISABLED environment variable.

This goes against the documented (and I believe intended?) behavior of the region kwarg:

AWS region to connect to. If not set, the AWS SDK will attempt to determine the region using heuristics such as environment variables, configuration profile, EC2 metadata, or default to ‘us-east-1’ when SDK version <1.8.
pyarrow.fs.S3FileSystem(
    access_key='key',
    secret_key='secret',
    endpoint_override='https://my.appliance.uri',
    region='region',
)

using py-spy I can observe the following stack:

    Aws::Http::CurlHttpClient::MakeRequest (libaws-cpp-sdk-core.so)
    Aws::Internal::AWSHttpResourceClient::GetResourceWithAWSWebServiceResult[abi:cxx11] (libaws-cpp-sdk-core.so)
    Aws::Internal::EC2MetadataClient::GetCurrentRegion[abi:cxx11] (libaws-cpp-sdk-core.so)
    Aws::Client::ClientConfiguration::ClientConfiguration (libaws-cpp-sdk-core.so)
    Aws::S3::S3ClientConfiguration::S3ClientConfiguration (libaws-cpp-sdk-s3.so)
    __gnu_cxx::new_allocator<arrow::fs::S3FileSystem::Impl>::construct<arrow::fs::S3FileSystem::Impl, arrow::fs::S3Options const&, arrow::io::IOContext const&> (libarrow.so.1801.0.0)
    arrow::fs::S3FileSystem::S3FileSystem (libarrow.so.1801.0.0)
    arrow::fs::S3FileSystem::Make (libarrow.so.1801.0.0)
    S3FileSystem___init__ (pyarrow/_s3fs.cpython-311-x86_64-linux-gnu.so)

This is caused by default construction of S3ClientConfiguration in ClientBuilder.
On our machines (which aren't in aws and have no idms running) this takes 6+ seconds.

A workaround is something as follows:

#ifdef ARROW_S3_HAS_S3CLIENT_CONFIGURATION
  Aws::S3::S3ClientConfiguration client_config_{Aws::Client::ClientConfigurationInitValues{false}};
#else
  Aws::Client::ClientConfiguration client_config_{Aws::Client::ClientConfigurationInitValues{false}};
#endif

which disables idms during configuration construction.

Some additional work would be required to add back in IDMS lookup of region when otherwise not specified.

Component(s)

C++

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions