Skip to content

[CPU] Enable Weightless models cache #29304

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

nshchego
Copy link
Contributor

@nshchego nshchego commented Mar 6, 2025

Details:

  • CPU plugin. Minimizing the size of cached blob by reusing weights from the original bin file.
  • Some API was extended to pass original weights
  • IR serializer and deserializer were modified to handle both weights sources due to the CPU plugin uses them to write/read cache file.

Tickets:

  • 161826

@github-actions github-actions bot added category: inference OpenVINO Runtime library - Inference category: Core OpenVINO Core (aka ngraph) category: CPU OpenVINO CPU plugin category: build OpenVINO cmake script / infra category: transformations OpenVINO Runtime library - Transformations category: samples OpenVINO Runtime Samples category: IR FE OpenVINO IR v10 / v11 FrontEnd labels Mar 6, 2025
@nshchego nshchego force-pushed the cpu/weightless_cache branch from 8582cc3 to ea30e62 Compare March 6, 2025 04:18
@github-actions github-actions bot removed the category: samples OpenVINO Runtime Samples label Mar 6, 2025
@nshchego nshchego force-pushed the cpu/weightless_cache branch 3 times, most recently from ab254a4 to ea0e3f7 Compare March 6, 2025 04:40
@@ -41,6 +41,10 @@ class ModelDeserializer {

void operator>>(std::shared_ptr<ov::Model>& model);

void set_weights_path(std::string& weights_path) {
m_weights_path = weights_path;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please, note that case when model is compiled as compile_model(ov::Model) should also be supported (see PR #29107, NPU & GPU work is in progress) via hint::model_ptr

std::shared_ptr<char[]> new_buf(new char[actual_size]);
data = new_buf.get();
weights_buf = std::make_shared<ov::SharedBuffer<std::shared_ptr<char[]>>>(data, actual_size, new_buf);
convert_dt(el_type, original_dt, data, m_weights->get_ptr<char>() + offset, el_num);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we perform constants conversion directly in IR FE via suboptimal way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we need to get converted values during nodes creation, otherwise some nodes could not pass 'validate_and_infer_types' and graph compilation fails.

Copy link
Contributor

@ilya-lavrenov ilya-lavrenov Mar 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that constants conversion from one type to another is responsibility of IR reader.
Should original saving logic implement such conversion steps as constant subgraphs which are read as is?

Later, plugin can fold such subgraphs to get constants in desired precision.

Or at least original_precision should be applied on plugin level with faster functions than manual conversions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @ilya-lavrenov , the de-serializer just should read xml and additional convert should not be there. The plugin should apply any conversion if required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do understand your concern, but precision forcing may lead to precision propagation. That will modify the graph that the plugin saved before and will require transformations pipeline. That makes model caching senseless.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean not modify graph but use correct weights and apply only conversion only on original weight if required but not in (de)serialization part

@nshchego nshchego force-pushed the cpu/weightless_cache branch 3 times, most recently from 1569b18 to e5800b0 Compare March 12, 2025 13:39
@nshchego nshchego marked this pull request as ready for review March 13, 2025 09:44
@nshchego nshchego requested review from a team as code owners March 13, 2025 09:44
@nshchego nshchego requested review from itikhono and removed request for a team March 13, 2025 09:44
@praasz praasz self-assigned this Mar 13, 2025
@nshchego nshchego force-pushed the cpu/weightless_cache branch from e5800b0 to 96d3c54 Compare March 17, 2025 08:21
@nshchego nshchego requested a review from a team as a code owner March 17, 2025 08:21
@nshchego nshchego force-pushed the cpu/weightless_cache branch 16 times, most recently from 8c52b1e to 2327eb1 Compare April 28, 2025 10:14
@nshchego nshchego force-pushed the cpu/weightless_cache branch from 2327eb1 to fe5ebe7 Compare April 29, 2025 18:09
Copy link
Contributor

This PR will be closed in a week because of 2 weeks of no activity.

@github-actions github-actions bot added the Stale label May 14, 2025
Copy link
Contributor

This PR was closed because it has been stalled for 2 week with no activity.

@github-actions github-actions bot closed this May 21, 2025
@nshchego nshchego reopened this May 26, 2025
@github-actions github-actions bot removed the Stale label May 27, 2025
@praasz praasz added this to the 2025.3 milestone May 28, 2025
@praasz praasz added the no_stale Do not mark as stale label May 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: build OpenVINO cmake script / infra category: Core OpenVINO Core (aka ngraph) category: CPP API OpenVINO CPP API bindings category: CPU OpenVINO CPU plugin category: IE Tests OpenVINO Test: plugins and common category: inference OpenVINO Runtime library - Inference category: IR FE OpenVINO IR v10 / v11 FrontEnd category: samples OpenVINO Runtime Samples category: transformations OpenVINO Runtime library - Transformations no_stale Do not mark as stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants