Load and Save a Checkpoint in LiBai

Instead of directly using flow.save() and flow.load(), LiBai provides the checkpoint module to deal with the complex situations when saving/loading model.

Typically, you don’t need to write the relative code to load/save weights trained from LiBai when using LiBai’s DefaultTrainer and LazyConfig. For more details, see Training & Evaluation in Command Line which introduces weight load and resume training settings in config.py or in command line for standard training.

Here we introduce how to load&save weights according to your custom needs. Suppose you have a model trained with LiBai.

# your model directory
├── config.yaml
├── last_checkpoint
├── log.txt
├── log.txt.rank1
├── log.txt.rank2
├── log.txt.rank3
├── metrics.json
├── model_0000009
│   ├── graph
│   ├── lr_scheduler
│   └── model
├── model_0000019
│   ├── graph
│   ├── lr_scheduler
│   └── model
├── model_best
│   ├── graph
│   ├── lr_scheduler
│   └── model
└── model_final
    ├── graph
    ├── lr_scheduler
    └── model

The following code shows how to load/save model weights:

from libai.utils.checkpoint import Checkpointer
from path.to.your.build_model import build_model

model = build_model(cfg.model)
# load model weights
Checkpointer(model).load(path_to_model) # path_to_model should be "output/finetune_qqp/model_final" 

# save model weights
checkpointer = Checkpointer(model, save_dir="output/")
checkpointer.save("model_999")  # save to output/model_999

You can also save other informations (e.g. optim, scheduler) other than model weights by using checkpointer. See libai.utils.checkpoint for more details.