Skip to content

Add support for Delta Lake Source #691

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Feb 17, 2025
Merged

Add support for Delta Lake Source #691

merged 10 commits into from
Feb 17, 2025

Conversation

Ulimo
Copy link
Contributor

@Ulimo Ulimo commented Feb 16, 2025

This adds support to read tables from delta lake.

It contains the implementation to read delta lake tables following https://github.com/delta-io/delta/blob/master/PROTOCOL.md

Features supported:

  • Calculate change data from add/remove actions
  • Use cdc files if they exist for change data
  • Deletion vectors
  • Partitioned data
  • Column mapping

This PR contains a breaking API change:

TryGetTableInformation(string tableName, [NotNullWhen(true)] out TableMetadata? tableMetadata)
becomes:
TryGetTableInformation(IReadOnlyList<string> tableName, [NotNullWhen(true)] out TableMetadata? tableMetadata)

This is required to allow delta lake to use '/' delimited name between the parts to allow easy access of subfolders.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark

Benchmark suite Current: deed077 Previous: 22d4209 Ratio
FlowtideDotNet.Benchmarks.Stream.StreamBenchmark.InnerJoin 466090440 ns (± 11365598.3825656) 458826533.3333333 ns (± 10640413.19005047) 1.02
FlowtideDotNet.Benchmarks.Stream.StreamBenchmark.LeftJoin 539801570 ns (± 20993231.05526054) 564994640 ns (± 28550338.329818636) 0.96
FlowtideDotNet.Benchmarks.Stream.StreamBenchmark.ProjectionAndNormalization 161675260 ns (± 13623058.20776345) 169842820 ns (± 7144067.768715523) 0.95
FlowtideDotNet.Benchmarks.Stream.StreamBenchmark.SumAggregation 167761000 ns (± 15330186.068668572) 171581250 ns (± 12415362.812660772) 0.98
FlowtideDotNet.Benchmarks.Stream.StreamBenchmark.ListAggWithMapAggregation 1999235820 ns (± 103136689.41111112) 1982296640 ns (± 123349811.75190067) 1.01

This comment was automatically generated by workflow using github-action-benchmark.

@Ulimo Ulimo merged commit b37321e into main Feb 17, 2025
7 checks passed
@Ulimo Ulimo deleted the delta_lake_source_arrow branch February 17, 2025 15:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant