Quickstart
To get started, simply import and use the pytorch-accelerated pytorch_accelerated.trainer.Trainer
, as demonstrated in the following snippet,
and then launch training using the accelerate CLI as described below:
# examples/vision/train_mnist.py
import os
from torch import nn, optim
from torch.utils.data import random_split
from torchvision import transforms
from torchvision.datasets import MNIST
from pytorch_accelerated import Trainer
class MNISTModel(nn.Module):
def __init__(self):
super().__init__()
self.main = nn.Sequential(
nn.Linear(in_features=784, out_features=128),
nn.ReLU(),
nn.Linear(in_features=128, out_features=64),
nn.ReLU(),
nn.Linear(in_features=64, out_features=10),
)
def forward(self, input):
return self.main(input.view(input.shape[0], -1))
def main():
dataset = MNIST(os.getcwd(), download=True, transform=transforms.ToTensor())
train_dataset, validation_dataset, test_dataset = random_split(dataset, [50000, 5000, 5000])
model = MNISTModel()
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
loss_func = nn.CrossEntropyLoss()
trainer = Trainer(
model,
loss_func=loss_func,
optimizer=optimizer,
)
trainer.train(
train_dataset=train_dataset,
eval_dataset=validation_dataset,
num_epochs=8,
per_device_batch_size=32,
)
trainer.evaluate(
dataset=test_dataset,
per_device_batch_size=64,
)
if __name__ == "__main__":
main()
To launch training using the accelerate CLI on your machine(s), run:
accelerate config --config_file accelerate_config.yaml
and answer the questions asked. This will generate a config file that will be used to properly set the default options when doing:
accelerate launch --config_file accelerate_config.yaml train.py [--training-args]
Note
Using the accelerate CLI is completely optional, training can also be launched in the usual way using:
python train.py / python -m torch.distributed ...
depending on your infrastructure configuration, for users who would like to maintain a more fine-grained control over the launch command.
Running in a Notebook
Accelerate also provides a notebook_launcher()
function, that can be used to launch distributed training from a notebook; which is especially useful for Colab or Kaggle notebooks.
To train a model using pytorch_accelerated from a notebook, just define the Trainer()
in a training_function,
and use this as an argument into notebook_launcher
. To run the example in above in a notebook, we would use:
notebook_launcher(main, num_processes=num_gpus)
More information about training in a notebook can be found here
Debugging with an IDE
Whilst pytorch_accelerated
is primarily designed to be launched using the accelerate CLI,
sometimes it’s useful to debug a training script in your favourite editor to see exactly what’s going on!
In these cases, we can simply use the notebook_launcher()
function as described above. To debug the example above, after setting some breakpoints,
replace the lines:
if __name__ == "__main__":
main()
with:
notebook_launcher(main, num_processes=num_gpus)
Next steps
More complex training examples can be seen in the examples folder here
Alternatively, if you would prefer to read more about the Trainer
, you can do so here: Trainer.