Skip to content
This repository was archived by the owner on Jun 3, 2025. It is now read-only.

Enable the easy download of the deployment.tar.gz #379

Merged
merged 14 commits into from
Nov 14, 2023

Conversation

dbogunowicz
Copy link
Contributor

@dbogunowicz dbogunowicz commented Oct 24, 2023

Reverting to the state of #375 in order to enable, in the simplest way possible, the fetching of only deployment.tar.gz.
The crux of this PR is adding self.deployment_tar attribute to the Model and allowing it to be correctly initialized not only from the stub but also from the local directory.
Added unit tests that test this feature.

Manual tests:

path_where_models_are = "/home/ubuntu/.cache/sparsezoo/neuralmagic/codegen_mono-350m-bigpython_bigquery_thepile-pruned50_quantized"
if os.path.exists(path_where_models_are):
    shutil.rmtree(path_where_models_are)
    print(f"The path {path_where_models_are} has been cleaned up")
else:
    print(f"The path {path_where_models_are} does not exist")


print("1: Creating the deployment from stub")
model = Model("zoo:codegen_mono-350m-bigpython_bigquery_thepile-pruned50_quantized")
# local path to the `deployment.tar.gz` should be None (we have not downloaded anything yet)
print("Deployment directory tar path (before download): ", model.deployment_tar._path)
# grab the deployment_dir_path (this downloads the deployment.tar.gz and unpacks it)
deployment_dir_path = model.deployment_directory_path

# all should be pointing to `directory`
assert deployment_dir_path == model.deployment_tar.path == model.deployment.path
print("Deployment directory tar path: ", model.deployment_tar._path)
print("Deployment directory path: ", model.deployment.path)
print(f"deployment_dir_path: {deployment_dir_path}")

# all should display `directory` files
print("Contents of the deployment tar directory: ", os.listdir(model.deployment_tar.path))
print("Contents of the deployment directory: ", os.listdir(model.deployment.path))

# should only contain `deployment` and `deployment.tar.gz`
print("Contents of the model directory: ", os.listdir(model._path))

print("2: Creating the deployment from local path")
model = Model("zoo:codegen_mono-350m-bigpython_bigquery_thepile-pruned50_quantized")
print("Deployment directory tar path: ", model.deployment_tar.path)
deployment_dir_path = model.deployment_directory_path

assert deployment_dir_path == model.deployment_tar.path
print("Deployment directory tar path (after unzipping): ", model.deployment_tar._path)
print("Deployment directory path: ", model.deployment.path)
print(f"deployment_dir_path: {deployment_dir_path}")
print("Contents of the deployment tar directory: ", os.listdir(model.deployment_tar.path))
print("Contents of the deployment directory: ", os.listdir(model.deployment.path))
print("Contents of the model directory: ", os.listdir(model._path))

out:

The path /home/ubuntu/.cache/sparsezoo/neuralmagic/codegen_mono-350m-bigpython_bigquery_thepile-pruned50_quantized has been cleaned up
1: Creating the deployment from stub
Deployment directory tar path (before download):  None
Downloading (…)ed/deployment.tar.gz: 100%|██████████| 265M/265M [00:23<00:00, 12.1MB/s]
Deployment directory tar path:  /home/ubuntu/.cache/sparsezoo/neuralmagic/codegen_mono-350m-bigpython_bigquery_thepile-pruned50_quantized/deployment
Deployment directory path:  /home/ubuntu/.cache/sparsezoo/neuralmagic/codegen_mono-350m-bigpython_bigquery_thepile-pruned50_quantized/deployment
deployment_dir_path: /home/ubuntu/.cache/sparsezoo/neuralmagic/codegen_mono-350m-bigpython_bigquery_thepile-pruned50_quantized/deployment
Contents of the deployment tar directory:  ['vocab.json', 'model.onnx', 'config.json', 'merges.txt', 'tokenizer_config.json', 'special_tokens_map.json', 'tokenizer.json']
Contents of the deployment directory:  ['vocab.json', 'model.onnx', 'config.json', 'merges.txt', 'tokenizer_config.json', 'special_tokens_map.json', 'tokenizer.json']
Contents of the model directory:  ['deployment', 'deployment.tar.gz']
2: Creating the deployment from local path
Deployment directory tar path:  /home/ubuntu/.cache/sparsezoo/neuralmagic/codegen_mono-350m-bigpython_bigquery_thepile-pruned50_quantized/deployment.tar.gz
Deployment directory tar path (after unzipping):  /home/ubuntu/.cache/sparsezoo/neuralmagic/codegen_mono-350m-bigpython_bigquery_thepile-pruned50_quantized/deployment
Deployment directory path:  /home/ubuntu/.cache/sparsezoo/neuralmagic/codegen_mono-350m-bigpython_bigquery_thepile-pruned50_quantized/deployment
deployment_dir_path: /home/ubuntu/.cache/sparsezoo/neuralmagic/codegen_mono-350m-bigpython_bigquery_thepile-pruned50_quantized/deployment
Contents of the deployment tar directory:  ['vocab.json', 'model.onnx', 'config.json', 'merges.txt', 'tokenizer_config.json', 'special_tokens_map.json', 'tokenizer.json']
Contents of the deployment directory:  ['vocab.json', 'model.onnx', 'config.json', 'merges.txt', 'tokenizer_config.json', 'special_tokens_map.json', 'tokenizer.json']
Contents of the model directory:  ['deployment', 'deployment.tar.gz']

@dbogunowicz dbogunowicz force-pushed the feature/damian/simplify branch from e49483f to 3139080 Compare October 25, 2023 15:46
@dbogunowicz dbogunowicz changed the title [WiP] Fix the download of deployment tarball (simple) Enable the easy download of the deployment.tar.gz Oct 26, 2023
bfineran
bfineran previously approved these changes Nov 2, 2023
rahul-tuli
rahul-tuli previously approved these changes Nov 2, 2023
@bfineran bfineran dismissed stale reviews from rahul-tuli and themself via 7b614c8 November 6, 2023 15:35
@dbogunowicz dbogunowicz reopened this Nov 14, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants