Version 0.13.0
Major changes
New serializer to improve serialization speed
A new custom serializer has been implemented that follows the Apache Arrow serialization while minimizing extra allocations and memory copies.
Additionally, the default compression method was also changed from using ZLib to Zstd.
This change was also made to improve serialization performance.
Support for pause & resume
A new feature has been added to allow pausing and resuming data streams, making it easier to conduct maintenance or temporarily halt processing without losing state.
For more information, visit https://koralium.github.io/flowtide/docs/deployment/pauseresume.
Integer column changed from 64 bits to dynamic size
The integer column was changed to now instead select the bit size based on the data inside of the column.
This change reduces memory usage for columns with smaller integer values. Bit size is determined on a per-page basis, so pages with larger values will only use higher bit sizes when necessary.
Delta Lake Support
This version adds support to both read and write to the Delta Lake format. This allows easy integration
to data lake storage. To learn more about delta lake support, please visit: https://koralium.github.io/flowtide/docs/connectors/deltalake
Custom data source & sink changed to use column based events
Both the custom data source and sink have now been changed to use column based events.
This improves connector performance by eliminating the need to convert data between column-based and row-based formats during streaming.
Minor changes
Elasticsearch connector change from Nest to Elastic.Clients.Elasticsearch
The Elasticsearch connector has been updated from the deprecated Nest
package to Elastic.Clients.Elasticsearch
. This change requires stream configurations to be adjusted for the new connection settings.
Additionally, connection settings are now provided via a function, enabling dynamic credential management, such as rolling passwords.
Add support for custom stream listeners
Applications can now listen to stream events like checkpoints, state changes, and failures, allowing for custom exit strategies or monitoring logic.
Example:
.AddCustomOptions(s =>
{
s.WithExitProcessOnFailure();
});
Cache lookup table for state clients
An internal optimization adds a small lookup table for state client page access, reducing contention on the global LRU cache. This change has shown a 10–12% performance improvement in benchmarks.
What's Changed
- Add custom arrow serializer to help improve serialization speeds by @Ulimo in #670
- Add support to pause and resume a stream by @Ulimo in #674
- Add new event listener abstractions and error listeners for killing application by @bpfz in #680
- Add support to create object state from state manager client by @Ulimo in #679
- Change serializers and storage interfaces to use IBufferWriter and ReadOnlyMemory by @Ulimo in #672
- Remove storing state in the checkpoint event by @Ulimo in #681
- Add logo and diagram to readme by @Ulimo in #684
- Add logo and diagram by @Ulimo in #685
- Change int64 column to be dynamic integer column by @Ulimo in #687
- Add batch converter to and from dotnet objects by @Ulimo in #688
- Change test mock source and sink to use column based format, also fix small bugs that occured from it by @Ulimo in #689
- Update generic data source and sink to use column based data by @Ulimo in #690
- Add types for stream notifications by @bpfz in #683
- Add support for Delta Lake Source by @Ulimo in #691
- Add initial version of the delta lake sink by @Ulimo in #692
- [DeltaLake] Improve performance with deletion vectors by @Ulimo in #693
- Upgrade packages in cosmosdb and delta lake by @Ulimo in #694
- [Elasticsearch] Change from nest to new nuget by @Ulimo in #695
- [DeltaLake] Fix so new columns are read as null when schema is evolved for old data files by @Ulimo in #698
- [DeltaLake] Ignore compacted entries in the delta log by @Ulimo in #700
- [SQL] Upgrade sql parser nuget to 0.6.3 by @Ulimo in #701
- [MongoDB] Upgrade to 3.2.1 driver version by @Ulimo in #696
- [Bugfix] Add that the tree is commited in grouped write operator by @Ulimo in #702
- Add possibility to set max page count on storage by @Ulimo in #704
- Add missing license headers to files by @Ulimo in #705
- Add a cache table for pages in state client to help improve performance by @Ulimo in #707
- Change so compression memory allocation shows up in metrics by @Ulimo in #708
- Preperation of release 0.13.0 by @Ulimo in #709
Full Changelog: v0.12.0...v0.13.0