-
Notifications
You must be signed in to change notification settings - Fork 152
Layer-Wise Distillation #1272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Layer-Wise Distillation #1272
Conversation
Update `teacher_names` -> `teacher_layer_names`
src/sparseml/pytorch/sparsification/distillation/modifier_per_layer.py
Outdated
Show resolved
Hide resolved
src/sparseml/pytorch/sparsification/distillation/modifier_per_layer.py
Outdated
Show resolved
Hide resolved
src/sparseml/pytorch/sparsification/distillation/modifier_per_layer.py
Outdated
Show resolved
Hide resolved
src/sparseml/pytorch/sparsification/distillation/modifier_per_layer.py
Outdated
Show resolved
Hide resolved
src/sparseml/pytorch/sparsification/distillation/modifier_per_layer.py
Outdated
Show resolved
Hide resolved
src/sparseml/pytorch/sparsification/distillation/modifier_per_layer.py
Outdated
Show resolved
Hide resolved
src/sparseml/pytorch/sparsification/distillation/modifier_per_layer.py
Outdated
Show resolved
Hide resolved
src/sparseml/pytorch/sparsification/distillation/modifier_per_layer.py
Outdated
Show resolved
Hide resolved
src/sparseml/pytorch/sparsification/distillation/modifier_per_layer.py
Outdated
Show resolved
Hide resolved
src/sparseml/pytorch/sparsification/distillation/modifier_per_layer.py
Outdated
Show resolved
Hide resolved
tests/sparseml/pytorch/sparsification/distillation/test_per_layer_distillation.py
Outdated
Show resolved
Hide resolved
tests/sparseml/pytorch/sparsification/distillation/test_per_layer_distillation.py
Outdated
Show resolved
Hide resolved
tests/sparseml/pytorch/sparsification/distillation/test_per_layer_distillation.py
Show resolved
Hide resolved
tests/sparseml/pytorch/sparsification/distillation/test_per_layer_distillation.py
Show resolved
Hide resolved
tests/sparseml/pytorch/sparsification/distillation/test_per_layer_distillation.py
Show resolved
Hide resolved
tests/sparseml/pytorch/sparsification/distillation/test_per_layer_distillation.py
Outdated
Show resolved
Hide resolved
src/sparseml/pytorch/sparsification/distillation/modifier_per_layer.py
Outdated
Show resolved
Hide resolved
src/sparseml/pytorch/sparsification/distillation/modifier_per_layer.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Just a few minor comments
src/sparseml/pytorch/sparsification/distillation/modifier_per_layer.py
Outdated
Show resolved
Hide resolved
src/sparseml/pytorch/sparsification/distillation/modifier_per_layer.py
Outdated
Show resolved
Hide resolved
Co-authored-by: Konstantin Gulin <[email protected]>
src/sparseml/pytorch/sparsification/distillation/modifier_per_layer.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great @rahul-tuli @corey-nm - few small comments and need to add that small change for serialization then LGTM!
tests/sparseml/pytorch/sparsification/distillation/test_per_layer_distillation.py
Show resolved
Hide resolved
tests/sparseml/pytorch/sparsification/distillation/test_per_layer_distillation.py
Show resolved
Hide resolved
src/sparseml/pytorch/sparsification/distillation/modifier_per_layer.py
Outdated
Show resolved
Hide resolved
src/sparseml/pytorch/sparsification/distillation/modifier_per_layer.py
Outdated
Show resolved
Hide resolved
src/sparseml/pytorch/sparsification/distillation/modifier_per_layer.py
Outdated
Show resolved
Hide resolved
006a097
to
96facf9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new state dict logic looks much better - LGTM pending comments
src/sparseml/pytorch/sparsification/distillation/modifier_per_layer.py
Outdated
Show resolved
Hide resolved
src/sparseml/pytorch/sparsification/distillation/modifier_per_layer.py
Outdated
Show resolved
Hide resolved
* Add to `DISTILL_PARAM_GROUP_KEY` to `__all__`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great work @rahul-tuli @corey-nm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
woohoo!
* Initial Commit with Alex's Work * Update `student_names` -> `student_layer_names` Update `teacher_names` -> `teacher_layer_names` * Intermediate commit * Styling * Reorg initialize * More cleanups * Update docstring * Moving finalize logic to update * Tests passing a bit * Fixing lifecycle tests * Changing projection to dict * Cleanup * Adding quantization hooks test * Add failing test for optimizer serialization * Monkey patching optimizer state_dict method * Apply suggestions from code review Co-authored-by: Konstantin Gulin <[email protected]> * Update src/sparseml/pytorch/sparsification/distillation/modifier_per_layer.py * Adding missing docstrings * Respond to review on modifier/optimizer state_dict * Add a test for modifier load before forward pass * Updating comments * Fix failing test * Add more asserts based on @bfineran 's comments * * Rename `_DISTILL_PARAM_GROUP_KEY` -> `DISTILL_PARAM_GROUP_KEY` * Add to `DISTILL_PARAM_GROUP_KEY` to `__all__` * Move state dict patching to a helper function * Quality Co-authored-by: Corey Lowman <[email protected]> Co-authored-by: corey-nm <[email protected]> Co-authored-by: Konstantin Gulin <[email protected]>
* Saving all hooks during quantization block fusing (#1280) * Saving all hooks during quantization block fusing * Clean up delete get block hooks * Layer-Wise Distillation (#1272) * Initial Commit with Alex's Work * Update `student_names` -> `student_layer_names` Update `teacher_names` -> `teacher_layer_names` * Intermediate commit * Styling * Reorg initialize * More cleanups * Update docstring * Moving finalize logic to update * Tests passing a bit * Fixing lifecycle tests * Changing projection to dict * Cleanup * Adding quantization hooks test * Add failing test for optimizer serialization * Monkey patching optimizer state_dict method * Apply suggestions from code review Co-authored-by: Konstantin Gulin <[email protected]> * Update src/sparseml/pytorch/sparsification/distillation/modifier_per_layer.py * Adding missing docstrings * Respond to review on modifier/optimizer state_dict * Add a test for modifier load before forward pass * Updating comments * Fix failing test * Add more asserts based on @bfineran 's comments * * Rename `_DISTILL_PARAM_GROUP_KEY` -> `DISTILL_PARAM_GROUP_KEY` * Add to `DISTILL_PARAM_GROUP_KEY` to `__all__` * Move state dict patching to a helper function * Quality Co-authored-by: Corey Lowman <[email protected]> Co-authored-by: corey-nm <[email protected]> Co-authored-by: Konstantin Gulin <[email protected]> Co-authored-by: corey-nm <[email protected]> Co-authored-by: Corey Lowman <[email protected]> Co-authored-by: Konstantin Gulin <[email protected]>
This PR represents the main branch for all layer-wise distillation work