Skip to content

Layer-Wise Distillation #1272

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 31 commits into from
Jan 10, 2023
Merged

Layer-Wise Distillation #1272

merged 31 commits into from
Jan 10, 2023

Conversation

rahul-tuli
Copy link
Member

This PR represents the main branch for all layer-wise distillation work

Copy link
Contributor

@KSGulin KSGulin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Just a few minor comments

Co-authored-by: Konstantin Gulin <[email protected]>
Copy link
Contributor

@bfineran bfineran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great @rahul-tuli @corey-nm - few small comments and need to add that small change for serialization then LGTM!

@rahul-tuli rahul-tuli force-pushed the layer-wise-distillation branch from 006a097 to 96facf9 Compare January 5, 2023 19:49
Copy link
Contributor

@bfineran bfineran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new state dict logic looks much better - LGTM pending comments

bfineran
bfineran previously approved these changes Jan 10, 2023
Copy link
Contributor

@bfineran bfineran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great work @rahul-tuli @corey-nm

Copy link
Contributor

@corey-nm corey-nm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

woohoo!

@rahul-tuli rahul-tuli merged commit 112753a into main Jan 10, 2023
@rahul-tuli rahul-tuli deleted the layer-wise-distillation branch January 10, 2023 21:58
rahul-tuli added a commit that referenced this pull request Jan 10, 2023
* Initial Commit with Alex's Work

* Update `student_names` -> `student_layer_names`
Update `teacher_names` -> `teacher_layer_names`

* Intermediate commit

* Styling

* Reorg initialize

* More cleanups

* Update docstring

* Moving finalize logic to update

* Tests passing a bit

* Fixing lifecycle tests

* Changing projection to dict

* Cleanup

* Adding quantization hooks test

* Add failing test for optimizer serialization

* Monkey patching  optimizer state_dict method

* Apply suggestions from code review

Co-authored-by: Konstantin Gulin <[email protected]>

* Update src/sparseml/pytorch/sparsification/distillation/modifier_per_layer.py

* Adding missing docstrings

* Respond to review on modifier/optimizer state_dict

* Add a test for modifier load before forward pass

* Updating comments

* Fix failing test

* Add more asserts based on @bfineran 's  comments

* * Rename `_DISTILL_PARAM_GROUP_KEY` -> `DISTILL_PARAM_GROUP_KEY`
* Add to `DISTILL_PARAM_GROUP_KEY` to `__all__`

* Move state dict patching to a helper function

* Quality

Co-authored-by: Corey Lowman <[email protected]>
Co-authored-by: corey-nm <[email protected]>
Co-authored-by: Konstantin Gulin <[email protected]>
corey-nm added a commit that referenced this pull request Jan 11, 2023
* Saving all hooks during quantization block fusing (#1280)

* Saving all hooks during quantization block fusing

* Clean up delete get block hooks

* Layer-Wise Distillation (#1272)

* Initial Commit with Alex's Work

* Update `student_names` -> `student_layer_names`
Update `teacher_names` -> `teacher_layer_names`

* Intermediate commit

* Styling

* Reorg initialize

* More cleanups

* Update docstring

* Moving finalize logic to update

* Tests passing a bit

* Fixing lifecycle tests

* Changing projection to dict

* Cleanup

* Adding quantization hooks test

* Add failing test for optimizer serialization

* Monkey patching  optimizer state_dict method

* Apply suggestions from code review

Co-authored-by: Konstantin Gulin <[email protected]>

* Update src/sparseml/pytorch/sparsification/distillation/modifier_per_layer.py

* Adding missing docstrings

* Respond to review on modifier/optimizer state_dict

* Add a test for modifier load before forward pass

* Updating comments

* Fix failing test

* Add more asserts based on @bfineran 's  comments

* * Rename `_DISTILL_PARAM_GROUP_KEY` -> `DISTILL_PARAM_GROUP_KEY`
* Add to `DISTILL_PARAM_GROUP_KEY` to `__all__`

* Move state dict patching to a helper function

* Quality

Co-authored-by: Corey Lowman <[email protected]>
Co-authored-by: corey-nm <[email protected]>
Co-authored-by: Konstantin Gulin <[email protected]>

Co-authored-by: corey-nm <[email protected]>
Co-authored-by: Corey Lowman <[email protected]>
Co-authored-by: Konstantin Gulin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants