Skip to content

[ENH]: Return database id in get collections call from sysdb #4686

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

sanketkedia
Copy link
Contributor

@sanketkedia sanketkedia commented May 30, 2025

Description of changes

Summarize the changes made by this PR.

  • Improvements & Bug fixes
    • Organizing data of collections in s3 by s3 prefixes requires knowledge about database id in the write path. This PR populates and returns database id from sysdb in get_collections rpc
    • We skip serializing and returning this from FE to client
    • Even though the collection model is the same for both local and distributed, we don't set it in local currently. However, it can be done if needed
  • New functionality
    • ...

Test plan

How are these changes tested?

  • Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Documentation Changes

None

Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

Copy link
Contributor Author

sanketkedia commented May 30, 2025

@sanketkedia sanketkedia requested a review from HammadB May 30, 2025 19:54
@sanketkedia sanketkedia marked this pull request as ready for review May 30, 2025 19:54
Copy link
Contributor

propel-code-bot bot commented May 30, 2025

Expose Database ID in SysDB get_collections and Collection Model for Storage Organization

This PR adds database ID information to the Rust, Go, and Protobuf representations of a Collection, and ensures that SysDB includes the database ID when returning collection metadata (e.g., through get_collections). This allows downstream systems (e.g., storage backends, S3 planners) to organize data by database context, which is particularly necessary for S3 prefix-based storage strategies. The change propagates through the full stack: protobuf definitions, sysdb implementations (SQLite and gRPC), Rust domain types, exposure in API types, test reference models, and relevant conversion/utilities. Compatibility with older sysdb deployments and future-proofing for mixed-version scenarios is addressed in the implementation and test code.

Key Changes:
• Protobuf (chroma.proto): Added optional database_id field to the Collection message.
• Rust types (collection.rs, api_types.rs): Added a newtyped DatabaseUuid to the Collection struct, serialization/deserialization logic, and updated conversions for compatibility with collections lacking a database ID.
• SysDB storage implementation (sqlite.rs, sysdb.rs): SQLite lookups and result rows now fetch and propagate database_id; error handling improved on parsing failures.
• Go sysdb implementation: convertCollectionToProto attaches the database ID field when serializing Collection proto.
• APITypes: Added DatabaseIdParseError and appropriate error handling for parsing/serializing collection-level database IDs.
• Unit/integration tests updated to verify parsing, fallback, and end-to-end propagation of database_id.

Affected Areas:
• SysDB storage and API (both SQLite and gRPC, and Go/Rust backends)
• Collection metadata model in both Rust and Go
• Proto interfaces and cross-service contracts (chroma.proto)
• Test coverage for sysdb and collection parsing

This summary was automatically generated by @propel-code-bot

@sanketkedia sanketkedia force-pushed the 05-30-_enh_return_database_id_in_get_collections_call_from_sysdb branch from 4411701 to f9a9e60 Compare June 10, 2025 15:41
Copy link
Collaborator

@HammadB HammadB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed offline - maybe handle b/w compat with Optional on database id on collection

@sanketkedia sanketkedia force-pushed the 05-30-_enh_return_database_id_in_get_collections_call_from_sysdb branch from a7a0d89 to a577316 Compare June 19, 2025 23:14
@sanketkedia sanketkedia merged commit 72985bd into main Jun 20, 2025
112 of 115 checks passed
chroma-droid pushed a commit that referenced this pull request Jun 20, 2025
## Description of changes

_Summarize the changes made by this PR._

- Improvements & Bug fixes
  - Organizing data of collections in s3 by s3 prefixes requires knowledge about database id in the write path. This PR populates and returns database id from sysdb in get_collections rpc
  - We skip serializing and returning this from FE to client
  - Even though the collection model is the same for both local and distributed, we don't set it in local currently. However, it can be done if needed
- New functionality
  - ...

## Test plan
_How are these changes tested?_
- [x] Tests pass locally with `pytest` for python, `yarn test` for js, `cargo test` for rust

## Documentation Changes
None
sanketkedia added a commit that referenced this pull request Jun 20, 2025
This PR cherry-picks the commit 72985bd
onto release/2025-06-13. If there are unresolved conflicts, please
resolve them manually.

Co-authored-by: Sanket Kedia <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants