This is an experimental feature so the API may change in the future.
Keepsake works with any machine learning framework, but it includes a callback that makes it easier to use with PyTorch Lightning.
KeepsakeCallback behaves like PyTorch Lightning's ModelCheckpoint callback, but in addition to exporting a model at the end of each epoch, it also:
keepsake.init() at the start of training to create an experiment, andexperiment.checkpoint() after saving the model at on_validation_end. If no validation is defined, then the checkpoint is saved at on_epoch_end. All metrics that have been logged during training with self.log() are saved to the Keepsake checkpoint.Here is a simple example:
import torchfrom torch.nn import functional as Fimport pytorch_lightning as plfrom torch.utils.data import DataLoader, random_split, Subsetfrom torchvision.datasets import MNISTfrom torchvision import transformsfrom keepsake.pl_callback import KeepsakeCallbackclass MyModel(pl.LightningModule):def __init__(self):super().__init__()self.layer_1 = torch.nn.Linear(28 * 28, 128)self.layer_2 = torch.nn.Linear(128, 10)self.batch_size = 8def forward(self, x):batch_size = x.size()[0]x = x.view(batch_size, -1)x = F.relu(self.layer_1(x))x = self.layer_2(x)return F.log_softmax(x, dim=1)def prepare_data(self):# download onlyMNIST("/tmp/keepsake-test-mnist",train=True,download=True,transform=transforms.ToTensor(),)def setup(self, stage):# transformtransform = transforms.Compose([transforms.ToTensor()])mnist_train = MNIST("/tmp/keepsake-test-mnist", train=True, download=False, transform=transform)mnist_train = Subset(mnist_train, range(100))# train/val splitmnist_train, mnist_val = random_split(mnist_train, [80, 20])# assign to use in dataloadersself.train_dataset = mnist_trainself.val_dataset = mnist_valdef train_dataloader(self):return DataLoader(self.train_dataset, batch_size=self.batch_size)def training_step(self, batch, batch_idx):x, y = batchlogits = self(x)loss = F.nll_loss(logits, y)self.log("train_loss", x, on_step=True, on_epoch=True, logger=False)return lossdef configure_optimizers(self):return torch.optim.Adam(self.parameters(), lr=1e-3)dense_size = 784learning_rate = 0.1model = MyModel()trainer = pl.Trainer(checkpoint_callback=False,callbacks=[KeepsakeCallback(params={"dense_size": dense_size, "learning_rate": learning_rate,},primary_metric=("train_loss", "minimize"),period=5,)],max_epochs=100,)trainer.fit(model)
The KeepsakeCallback class takes the following arguments, all optional:
filepath: The path where the exported model is saved. This path is also saved by experiment.checkpoint() at the end of each epoch. If it is None, the model is not saved, and the callback just gathers code and metrics. Default: model.hdf5params: A dictionary of hyperparameters that will be recorded to the experiment at the start of training.primary_metric: A tuple in the format (metric_name, goal), where goal is either minimize, or maximize. For example, ("mean_absolute_error", "minimize").period: The callback saves the model at end of this many epochs. Default: 1save_weights_only: If True, then only the model’s weights will be saved, else the full model is saved. Default: False