Skip to content

[ENH]: garbage collection fixes to get manual E2E test working #4152

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 7, 2025

Conversation

codetheweb
Copy link
Contributor

@codetheweb codetheweb commented Apr 4, 2025

Description of changes

  • Removed variables from tracing calls which are unbounded in size (Jaeger dropped spans while testing because they were too large).
  • Added a concurrency limit for S3 deletion calls and consumed futures as a stream instead of collecting them first. Greatly reduced peak memory usage and fixed an OOM in the garbage collector service when running in Tilt with a memory limit of 512MiB.
  • Fixed the garbage collector component to compute the absolute cutoff time every run instead of computing once on startup. Prior to this change, the absolute cutoff time was constant over the lifetime of the service, which meant versions created after service startup were never GC'ed.

Test plan

How are these changes tested?

Tested by creating a collection with 10 versions (using SciDocs) and observed that garbage collection was successful.

  • Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Documentation Changes

Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs repository?

n/a

Copy link

github-actions bot commented Apr 4, 2025

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

@codetheweb codetheweb force-pushed the feat-better-gc-tracing branch from e600e31 to 23ded28 Compare April 4, 2025 17:29
@codetheweb codetheweb force-pushed the feat-gc-manual-e2e-test-fixes branch from de70cd3 to 5108577 Compare April 4, 2025 17:29
@codetheweb codetheweb force-pushed the feat-better-gc-tracing branch from 23ded28 to 3ee3a6b Compare April 4, 2025 17:56
@codetheweb codetheweb force-pushed the feat-gc-manual-e2e-test-fixes branch from 5108577 to ce1ff5e Compare April 4, 2025 17:56
if !all_files.is_empty() {
let mut delete_stream = futures::stream::iter(all_files.clone())
.map(move |file_path| self.delete_with_path(file_path))
.buffer_unordered(100);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

greatly reduces peak memory usage when GC'ing a large set of files--went from a peak of > 512MB (was OOMing) to a peak < 75MB

I don't have a principled reason for setting this to 100. Seems to work well for now though. I would be open to changing this to use the admission controlled S3 impl, just unsure if it makes sense to use the same admission controller for uploads/downloads and deletes.

@codetheweb codetheweb marked this pull request as ready for review April 4, 2025 18:22
@codetheweb codetheweb requested a review from HammadB April 4, 2025 18:22
@codetheweb codetheweb force-pushed the feat-better-gc-tracing branch from 3ee3a6b to 0dd3668 Compare April 4, 2025 20:22
@codetheweb codetheweb force-pushed the feat-gc-manual-e2e-test-fixes branch from ce1ff5e to 3f8ab80 Compare April 4, 2025 20:22
@codetheweb codetheweb force-pushed the feat-better-gc-tracing branch from 0dd3668 to 2b2e0c5 Compare April 4, 2025 21:58
@codetheweb codetheweb force-pushed the feat-gc-manual-e2e-test-fixes branch from 3f8ab80 to 5efd3ae Compare April 4, 2025 21:58
@codetheweb codetheweb force-pushed the feat-better-gc-tracing branch from 2b2e0c5 to 08d6da5 Compare April 4, 2025 21:59
@codetheweb codetheweb force-pushed the feat-gc-manual-e2e-test-fixes branch from 5efd3ae to cf8aa16 Compare April 4, 2025 22:00
@codetheweb codetheweb force-pushed the feat-better-gc-tracing branch from 08d6da5 to 88d46bb Compare April 4, 2025 23:38
@codetheweb codetheweb force-pushed the feat-gc-manual-e2e-test-fixes branch from cf8aa16 to 628e81f Compare April 4, 2025 23:38
@codetheweb codetheweb force-pushed the feat-better-gc-tracing branch from 88d46bb to 21544ef Compare April 5, 2025 00:29
@codetheweb codetheweb force-pushed the feat-gc-manual-e2e-test-fixes branch from 628e81f to d3850af Compare April 5, 2025 00:30
@codetheweb codetheweb force-pushed the feat-better-gc-tracing branch 2 times, most recently from 3039249 to c56d313 Compare April 7, 2025 16:34
@codetheweb codetheweb force-pushed the feat-gc-manual-e2e-test-fixes branch from d3850af to 8053f73 Compare April 7, 2025 16:34
@codetheweb codetheweb changed the base branch from feat-better-gc-tracing to graphite-base/4152 April 7, 2025 17:17
@codetheweb codetheweb force-pushed the graphite-base/4152 branch from c56d313 to 3c54906 Compare April 7, 2025 17:17
@codetheweb codetheweb force-pushed the feat-gc-manual-e2e-test-fixes branch from 8053f73 to 928480d Compare April 7, 2025 17:17
@graphite-app graphite-app bot changed the base branch from graphite-base/4152 to main April 7, 2025 17:18
@codetheweb codetheweb force-pushed the feat-gc-manual-e2e-test-fixes branch from 928480d to c1d52de Compare April 7, 2025 17:18
@codetheweb codetheweb merged commit 16a5dbc into main Apr 7, 2025
84 checks passed
Copy link
Contributor Author

Merge activity

  • Apr 7, 3:26 PM EDT: A user merged this pull request with Graphite.

philipkiely-baseten pushed a commit to philipkiely-baseten/chroma that referenced this pull request Apr 8, 2025
…a-core#4152)

## Description of changes

- Removed variables from tracing calls which are unbounded in size (Jaeger dropped spans while testing because they were too large).
- Added a concurrency limit for S3 deletion calls and consumed futures as a stream instead of collecting them first. Greatly reduced peak memory usage and fixed an OOM in the garbage collector service when running in Tilt with a memory limit of 512MiB.
- Fixed the garbage collector component to compute the absolute cutoff time every run instead of computing once on startup. Prior to this change, the absolute cutoff time was constant over the lifetime of the service, which meant versions created after service startup were never GC'ed.

## Test plan
*How are these changes tested?*

Tested by creating a collection with 10 versions (using SciDocs) and observed that garbage collection was successful.

- [x] Tests pass locally with `pytest` for python, `yarn test` for js, `cargo test` for rust

## Documentation Changes
*Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the [docs repository](https://github.com/chroma-core/docs)?*

n/a
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants