Introduction
When training large machine learning models, it can take a long time and use a lot of computer power. Data parallelism and model parallelism are two ways to split the work across multiple computers or processors to make training faster and more efficient.
When your dataset is very large and you want to split it across multiple GPUs to train faster.
When your model is too big to fit into the memory of a single GPU and needs to be split across multiple GPUs.
When you want to reduce training time by using multiple processors working together.
When you want to scale your training to use cloud resources efficiently.
When debugging how your model behaves when split across devices.