зеркало из https://github.com/microsoft/DeepSpeed.git
Fix tutorial links (#5714)
This commit is contained in:
Родитель
dd7a5be53d
Коммит
3d347276ce
|
@ -1,8 +1,7 @@
|
|||
---
|
||||
#title: "Universal Checkpointing with DeepSpeed: A Practical Guide"
|
||||
title: "Universal Checkpointing with DeepSpeed: A Practical Guide"
|
||||
tags: checkpointing, training, deepspeed
|
||||
---
|
||||
# Universal Checkpointing with DeepSpeed: A Practical Guide
|
||||
|
||||
DeepSpeed Universal Checkpointing feature is a powerful tool for saving and loading model checkpoints in a way that is both efficient and flexible, enabling seamless model training continuation and finetuning across different model architectures, different parallelism techniques and training configurations. This tutorial, tailored for both begininers and experienced users, provides a step-by-step guide on how to leverage Universal Checkpointing in your DeepSpeed-powered applications. This tutorial will guide you through the process of creating ZeRO checkpoints, converting them into a Universal format, and resuming training with these universal checkpoints. This approach is crucial for leveraging pre-trained models and facilitating seamless model training across different setups.
|
||||
|
||||
|
@ -23,7 +22,7 @@ Follow the three simple steps below:
|
|||
|
||||
### Step 1: Create ZeRO Checkpoint
|
||||
|
||||
The first step in leveraging DeepSpeed Universal Checkpointing is to create a ZeRO checkpoint. [ZeRO]((/tutorials/zero/)) (Zero Redundancy Optimizer) is a memory optimization technology in DeepSpeed that allows for efficient training of large models. To create a ZeRO checkpoint, you'll need to:
|
||||
The first step in leveraging DeepSpeed Universal Checkpointing is to create a ZeRO checkpoint. [ZeRO](/tutorials/zero/) (Zero Redundancy Optimizer) is a memory optimization technology in DeepSpeed that allows for efficient training of large models. To create a ZeRO checkpoint, you'll need to:
|
||||
|
||||
- Initialize your model with DeepSpeed using the ZeRO optimizer.
|
||||
- Train your model to the desired state (iterations).
|
||||
|
@ -50,9 +49,8 @@ With the Universal checkpoint ready, you can now resume training on potentially
|
|||
## Conclusion
|
||||
DeepSpeed Universal Checkpointing simplifies the management of model states, making it easier to save, load, and transfer model states across different training sessions and parallelism techniques. By following the steps outlined in this tutorial, you can integrate Universal Checkpointing into your DeepSpeed applications, enhancing your model training and development workflow.
|
||||
|
||||
For more detailed examples and advanced configurations, please refer to the [Megatron-DeepSpeed examples]((https://github.com/microsoft/Megatron-DeepSpeed/tree/main/examples_deepspeed/universal_checkpointing)).
|
||||
For more detailed examples and advanced configurations, please refer to the [Megatron-DeepSpeed examples](https://github.com/microsoft/Megatron-DeepSpeed/tree/main/examples_deepspeed/universal_checkpointing).
|
||||
|
||||
For technical in-depth of DeepSpeed Universal Checkpointing, please see [arxiv manuscript](https://arxiv.org/abs/2406.18820) and [blog](/blogs/deepspeed-ucp/).
|
||||
For technical in-depth of DeepSpeed Universal Checkpointing, please see [arxiv manuscript](https://arxiv.org/abs/2406.18820) and [blog](https://github.com/microsoft/DeepSpeed/blob/master/blogs/deepspeed-ucp/).
|
||||
|
||||
Happy training!
|
||||
```
|
||||
|
|
Загрузка…
Ссылка в новой задаче