Skip to content

test: create e2e environment; stop testing spacy in unit tests #9212

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Apr 11, 2025

Conversation

anakin87
Copy link
Member

@anakin87 anakin87 commented Apr 10, 2025

Related Issues

Proposed Changes:

  • create a e2e hatch environment, which inherits from test and installs the additional dependencies needed for e2e tests (Spacy in this case)
  • remove Spacy from test environment

How did you test it?

CI: unit, integration and e2e test correctly run.

Notes for the reviewer

CI speed improvements in this case are difficult to measure precisely due to GitHub runner variability, and I want to avoid cherry-picking data.

I verified locally that test and e2e are separate environments, with test not installing Spacy (and not downloading the 400 Mb model).

Even if the improvements are small or difficult to measure, I would still incorporate this change. The dependency resolution for Spacy is not always trivial and has caused CI slowdowns in the past (#9159).

Checklist

  • I have read the contributors guidelines and the code of conduct
  • I have updated the related issue with new insights and changes
  • I added unit tests and updated the docstrings
  • I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test: and added ! in case the PR includes breaking changes.
  • I documented my code
  • I ran pre-commit hooks and fixed any issue

@coveralls
Copy link
Collaborator

coveralls commented Apr 10, 2025

Pull Request Test Coverage Report for Build 14400855624

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 10 unchanged lines in 2 files lost coverage.
  • Overall coverage decreased (-0.08%) to 90.304%

Files with Coverage Reduction New Missed Lines %
core/pipeline/async_pipeline.py 3 68.46%
components/extractors/named_entity_extractor.py 7 64.52%
Totals Coverage Status
Change from base Build 14399545434: -0.08%
Covered Lines: 10673
Relevant Lines: 11819

💛 - Coveralls

@anakin87 anakin87 changed the title ci: create e2e environment; stop testing spacy in unit tests ci: create e2e environment; stop testing spacy in unit tests Apr 10, 2025
pyproject.toml Outdated
extra-dependencies = [
dependencies = [
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test inherit from default but does not need its dependencies (reno, ruff, ...).
Using dependencies here then allows to use extra-dependencies in the e2e environment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to prevent tool.hatch.envs.test from inheritting from tool.hatch.envs.default and only install the project dependencies because it doesn't need its dependencies (reno, ruff, ...) then we could add here

[tool.hatch.envs.test]
template = "test"

or

[tool.hatch.envs.test]
template = "project"

I believe the former should work based on https://hatch.pypa.io/1.9/config/environment/overview/#self-referential-environments

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I've now made test a self-referential environment:

  • previously, it inherited from default and overrode dependencies
  • now it does not inherit from default; specifies its dependencies

it has a similar effect, but maybe it is clearer

@@ -95,13 +95,6 @@ extra-dependencies = [
"openai-whisper>=20231106", # LocalWhisperTranscriber
"arrow>=1.3.0", # Jinja2TimeExtension

# NamedEntityExtractor
"spacy>=3.8,<3.9",
"spacy-curated-transformers>=0.2,<=0.3",
Copy link
Member Author

@anakin87 anakin87 Apr 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"spacy-curated-transformers>=0.2,<=0.3",: this is no longer needed.

@anakin87 anakin87 marked this pull request as ready for review April 10, 2025 15:27
@anakin87 anakin87 requested a review from a team as a code owner April 10, 2025 15:27
@anakin87 anakin87 requested review from mpangrazzi and julian-risch and removed request for a team and mpangrazzi April 10, 2025 15:27
@anakin87 anakin87 changed the title ci: create e2e environment; stop testing spacy in unit tests test: create e2e environment; stop testing spacy in unit tests Apr 10, 2025
Copy link
Member

@julian-risch julian-risch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks quite good to me already. I have two comments I would like to briefly discuss. believe we can change something so that test and e2 don't install reno, ruff etc if they are really not needed.
And I have a question about files_yaml and how it combines multiple file paths.

pyproject.toml Outdated
extra-dependencies = [
dependencies = [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to prevent tool.hatch.envs.test from inheritting from tool.hatch.envs.default and only install the project dependencies because it doesn't need its dependencies (reno, ruff, ...) then we could add here

[tool.hatch.envs.test]
template = "test"

or

[tool.hatch.envs.test]
template = "project"

I believe the former should work based on https://hatch.pypa.io/1.9/config/environment/overview/#self-referential-environments

Copy link
Member

@julian-risch julian-risch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 👍

@anakin87
Copy link
Member Author

Unfortunately, using a self-referential environment has some consequences on DataDog tracing: https://github.com/deepset-ai/haystack/actions/runs/14399778757/job/40383320064

I tried to debug this for some time, but it's not easy...

Since we haven't changed anything on the Tracer, it's not a big deal (we are not introducing bugs).

For this reason, we decided to NOT use a self-referential environment (but get the same effect by overriding dependencies).

@anakin87 anakin87 enabled auto-merge (squash) April 11, 2025 10:20
@anakin87 anakin87 merged commit 8bf41a8 into main Apr 11, 2025
22 checks passed
@anakin87 anakin87 deleted the e2e-spacy branch April 11, 2025 10:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants