Skip to content

GH-46767: [C++] Enable EqualOptions::use_atol_ for arrow::Array, arrow::Scalar, arrow::RecordBatch, and arrow::ChuckedArray #46779

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

andishgar
Copy link
Contributor

@andishgar andishgar commented Jun 11, 2025

Rationale for this change

Enable EqualOptions::use_atol for arrow::Array and arrow::Scalar.

What changes are included in this PR?

  • Changed the default value of EqualOptions::use_atol to false.
  • Enabled EqualOptions::use_atol_ for the following methods.
    • arrow::ScalarEquals
    • arrow::ArrayRangeEquals
    • arrow:: ArrayEquals

Are these changes tested?

Yes, I ran the relevant unit tests.

Are there any user-facing changes?

The default value of EqualOptions::use_atol is changed to false

This PR includes breaking changes to public APIs. (If there are any breaking changes to public APIs, please explain which changes are breaking. If not, you can remove this.)

@andishgar andishgar marked this pull request as draft June 11, 2025 11:52
@andishgar andishgar marked this pull request as ready for review June 11, 2025 18:47
@andishgar
Copy link
Contributor Author

In this pull request, I initially set the default value of arrow::EqualOptions::use_atol to true. However, during the course of this work, I realized that the correct default should be false, to align with the default behavior of arrow::ScalarEquals, arrow::ArrayRangeEquals, and arrow::ArrayEquals.

@andishgar andishgar marked this pull request as draft June 16, 2025 13:37
@andishgar andishgar changed the title GH-46767: [C++] Enable EqualOptions::use_atol_ for arrow::Array and arrow::Scalar GH-46767: [C++] Enable EqualOptions::use_atol_ for arrow::Array, arrow::Scalar, arrow::RecordBatch, arrow::ChuckedArray Jun 16, 2025
@andishgar andishgar changed the title GH-46767: [C++] Enable EqualOptions::use_atol_ for arrow::Array, arrow::Scalar, arrow::RecordBatch, arrow::ChuckedArray GH-46767: [C++] Enable EqualOptions::use_atol_ for arrow::Array, arrow::Scalar, arrow::RecordBatch, and arrow::ChuckedArray Jun 16, 2025
@andishgar
Copy link
Contributor Author

@kou I have a question regarding arrow::Table equality. The method arrow::RecordBatch::ApproxEquals does not consider schema or metadata in the comparison. Shouldn't arrow::Table have a similar method ?

@andishgar andishgar marked this pull request as ready for review June 16, 2025 15:16
@kou
Copy link
Member

kou commented Jun 17, 2025

I'm OK with it but I like arrow::Table::Equals(const Table& other, const EqualOptions& opts = EqualOptions::Defaults()) not arrow::Table::ApproxEquals(). We need to add more options to EqualOptions for it.

@andishgar
Copy link
Contributor Author

I'm OK with it but I like arrow::Table::Equals(const Table& other, const EqualOptions& opts = EqualOptions::Defaults()) not arrow::Table::ApproxEquals(). We need to add more options to EqualOptions for it.

@kou, thank you for your response. I'll open an issue to address this.

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR changes the default behavior of the tolerance-based comparison flag (use_atol) to be opt-in, and propagates that flag through Array, Scalar, RecordBatch, and ChunkedArray equality routines, with updated documentation and tests.

  • Change EqualOptions::use_atol_ default to false and wire it into ArrayEquals, ArrayRangeEquals, ScalarEquals, RecordBatch::Equals, and ChunkedArray::Equals.
  • Update and add unit tests for floating‐point comparisons to exercise the new use_atol flag in scalar, array, record batch, and statistics tests.
  • Revise documentation comments to explain the interplay between exact vs. approximate equality and the use_atol option.

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.

Show a summary per file
File Description
cpp/src/arrow/scalar_test.cc Added TestUseAtol to validate ScalarEquals with and without use_atol
cpp/src/arrow/record_batch_test.cc Updated record‐batch tests to assert Equals with use_atol(true)
cpp/src/arrow/record_batch.h Documented use_atol behavior for RecordBatch::Equals and ApproxEquals
cpp/src/arrow/compare.h Changed default use_atol_ to false and added flags/docs for arrays and scalars
cpp/src/arrow/compare.cc Propagate options.use_atol() into ArrayEquals, ArrayRangeEquals, and ScalarEquals
cpp/src/arrow/chunked_array.h Documented use_atol behavior on ChunkedArray::Equals and ApproxEquals
cpp/src/arrow/array/statistics_test.cc Adjusted statistics tests to use use_atol(true) for tolerance-based equality
cpp/src/arrow/array/array_test.cc Added CheckFloatApproxEqualsWithAtol and test case for floating arrays
Comments suppressed due to low confidence (3)

cpp/src/arrow/record_batch.h:124

  • Split this into two sentences or add a period after ApproxEquals. to separate the clause, e.g., ... ApproxEquals. Schema comparison is included ... for clarity and correct grammar.
  /// using \ref arrow::RecordBatch::ApproxEquals Although, Schema comparison is included,

cpp/src/arrow/chunked_array.h:164

  • Add unit tests for ChunkedArray::Equals and ChunkedArray::ApproxEquals to verify that the new use_atol flag correctly toggles exact vs. approximate comparisons on chunked arrays.
  /// Setting \ref arrow::EqualOptions::use_atol to true is equivalent to

cpp/src/arrow/compare.h:107

  • [nitpick] The use_atol behavior is documented separately for each overload; consider centralizing or referencing a single description to reduce duplicated comment blocks and ease future maintenance.
/// Returns true if the arrays are exactly equal

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@@ -160,12 +160,22 @@ class ARROW_EXPORT ChunkedArray {
///
/// Two chunked arrays can be equal only if they have equal datatypes.
/// However, they may be equal even if they have different chunkings.
///
/// Setting \ref arrow::EqualOptions::use_atol to true is equivalent to
/// using \ref arrow::ChunkedArray::ApproxEquals
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// using \ref arrow::ChunkedArray::ApproxEquals
/// using \ref arrow::ChunkedArray::ApproxEquals.

/// \brief Determine if two chunked arrays are equal.
///
/// Setting \ref arrow::EqualOptions::use_atol to true is equivalent to
/// using \ref arrow::ChunkedArray::ApproxEquals
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// using \ref arrow::ChunkedArray::ApproxEquals
/// using \ref arrow::ChunkedArray::ApproxEquals.

@@ -120,6 +120,10 @@ class ARROW_EXPORT RecordBatch {

/// \brief Determine if two record batches are exactly equal
///
/// Setting \ref arrow::EqualOptions::use_atol to true is equivalent to
/// using \ref arrow::RecordBatch::ApproxEquals Although, Schema comparison is included,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// using \ref arrow::RecordBatch::ApproxEquals Although, Schema comparison is included,
/// using \ref arrow::RecordBatch::ApproxEquals. Although, schema comparison is included,

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting review Awaiting review labels Jun 17, 2025
@andishgar
Copy link
Contributor Author

@kou Thank you for the review. I’ll look into it

@andishgar
Copy link
Contributor Author

andishgar commented Jun 18, 2025

@kou I'm sorry for writing this message under such circumstances. Due to the ongoing war between my country and Israel, I may not be able to finish the PR until the situation stabilizes. My government is attempting to impose internet censor, and I was only able to send this message after several hours of trying to connect through a filter breaker.
I have no problem with reassigning my PRs and issues to someone else.

@kou
Copy link
Member

kou commented Jun 19, 2025

Oh... You don't need to worry about this PR. Please put your safety first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants