-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Python: Adding USearch memory connector #2358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is exciting! We are also working on C# bindings for USearch to allow broader integration with SK 🤗 cc @dluc |
Fix: removing cast to `str` due to patch in USearch
@microsoft-github-policy-service agree |
Refactor: method naming Docs: update to fit changes
Docs: clarification
awesome, thank you @ashvardanian - I'll take a look asap (FYI, there's a quick git conflict to fix when you have a chance) |
Hey, @dluc! @AleksandrKent has updated the poetry file. It seems to be the only collision. But it will re-appear as soon as you have any other dependency updates, so we should try merging this sooner. Please let us know if anything has to be polished. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this contribution :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing file headers
python/semantic_kernel/connectors/memory/usearch/usearch_memory_store.py
Show resolved
Hide resolved
### Motivation and Context The integration of [USearch](https://github.com/unum-cloud/usearch) as a memory connector to Semantic Kernel (SK). ### Description The USearch `Index` does not natively have the ability to store different collections, and it only stores embeddings without other attributes like `MemoryRecord`. The `USearchMemoryStore` class encapsulates these capabilities. It uses the USearch `Index` to store a collection of embeddings under unique IDs, with original collection names mapped to those IDs. Other `MemoryRecord ` attributes are stored in a `pyarrow.Table`, which is mapped to each collection. It's important to note the current behavior when a user removes a record or upserts a new one with an existing ID: the old row is not removed from the `pyarrow.Table`. This is done for performance reasons but could lead to the table growing in size. By default, `USearchMemoryStore` operates as an in-memory store. To enable persistence, you must set the persist mode with calling appropriate `__init__ `, supplying a path to the directory for the persist files. For each collection, two files will be created: `{collection_name}.usearch` and `{collection_name}.parquet`. Changes will only be dumped to the disk when `close_async` is called. Due to the interface provided by the base class `MemoryStoreBase`, this happens implicitly when using a context manager, or it may be called explicitly. Since collection names are used to store files on disk, all names are converted to lowercase. To ensure efficient use of memory, you should call `close_async`. --------- Co-authored-by: Abby Harrison <[email protected]> Co-authored-by: Abby Harrison <[email protected]> Co-authored-by: Devis Lucato <[email protected]>
Motivation and Context
The integration of USearch as a memory connector to Semantic Kernel (SK).
Description
The USearch
Index
does not natively have the ability to store different collections, and it only stores embeddings without other attributes likeMemoryRecord
.The
USearchMemoryStore
class encapsulates these capabilities. It uses the USearchIndex
to store a collection of embeddings under unique IDs, with original collection names mapped to those IDs. OtherMemoryRecord
attributes are stored in apyarrow.Table
, which is mapped to each collection.It's important to note the current behavior when a user removes a record or upserts a new one with an existing ID: the old row is not removed from the
pyarrow.Table
. This is done for performance reasons but could lead to the table growing in size.By default,
USearchMemoryStore
operates as an in-memory store. To enable persistence, you must set the persist mode with calling appropriate__init__
, supplying a path to the directory for the persist files. For each collection, two files will be created:{collection_name}.usearch
and{collection_name}.parquet
. Changes will only be dumped to the disk whenclose_async
is called. Due to the interface provided by the base classMemoryStoreBase
, this happens implicitly when using a context manager, or it may be called explicitly.Since collection names are used to store files on disk, all names are converted to lowercase.
To ensure efficient use of memory, you should call
close_async
.Contribution Checklist