Skip to content

Saving and loading tokenizers with torch.save fails #5292

Closed
@mittalsuraj18

Description

@mittalsuraj18

🐛 Bug

Information

Model I am using (Bert, XLNet ...): Albert

Language I am using the model on (English, Chinese ...): English

The problem arises when using:

  • the official example scripts: (give details below)
  • my own modified scripts: (give details below)

The tasks I am working on is:

  • an official GLUE/SQUaD task: (give the name)
  • my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

  1. Load albert base tokenizer using AutoTokenizer.from_pretrained
  2. Save it to a file using torch.save
  3. Delete ~/.cache/torch/transformers directory
  4. Now try to load from the file using torch.load
  5. Loading fails as the cached file does not exist
import transformers
import torch
token = transformers.AutoTokenizer.from_pretrained("albert-base-v2")
torch.save({"token":token}, "./token.pt")

Delete ~/.cache/torch/ directory
Then Run

import torch
torch.load("./token.pt")

Expected behavior

Tokenizer should load successfully.

Environment info

  • transformers version: 2.11.0
  • Platform: Linux-4.19.104-microsoft-standard-x86_64-with-debian-bullseye-sid
  • Python version: 3.7.6
  • PyTorch version (GPU?): 1.3.1+cpu (False)
  • Tensorflow version (GPU?): not installed (NA)
  • Using GPU in script?: no
  • Using distributed or parallel set-up in script?: no

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions