История

Gustavo Rosa bf22f26fc1 chore(scripts): Adds fine-tune option to DeepSpeed training script.		2023-04-04 15:59:42 -03:00
..
deepspeed	chore(scripts): Adds fine-tune option to DeepSpeed training script.	2023-04-04 15:59:42 -03:00
hf	fix(scripts): Improves evaluation with DeepSpeed trainer.	2023-03-28 13:45:55 -03:00
nvidia	chore(scripts): Improves training-related scripts.	2023-03-17 14:53:54 -03:00
README.md	chore(scripts): Adds fine-tune option to DeepSpeed training script.	2023-04-04 15:59:42 -03:00

README.md

Training Models with Archai

This folder contains the necessary files and instructions to train models using Archai.

Installation

Before you can start training models, you need to install Archai. To do so, you can follow these instructions:

Open your terminal and run the following command:

pip install --user git+https://github.com/microsoft/archai.git#egg=archai[dev]

If you plan to use DeepSpeed and Flash-Attention, run this command instead:

pip install --user git+https://github.com/microsoft/archai.git#egg=archai[dev,deepspeed,flash-attn]

Please note that DeepSpeed is not compatible with Windows.

Alternatively, you can use Docker to build a Docker image with Archai and all the necessary dependencies. Simply follow the instructions in the Dockerfile.

Data Preparation

To prepare the data, you can use the FastHfDatasetProvider class to load and encode datasets from the Hugging Face Hub. This is recommended as it offers a faster way to load and encode datasets. Here is an example code:

dataset_provider = FastHfDatasetProvider.from_hub(
    "wikitext",
    dataset_config_name="wikitext-103-raw-v1",
    tokenizer_name="Salesforce/codegen-350M-mono",
    cache_dir="wikitext_cache",
)
train_dataset = dataset_provider.get_train_dataset(seq_len=2048)
eval_dataset = dataset_provider.get_val_dataset(seq_len=2048)

Once the dataset is encoded, it can be cached and loaded from disk later as follows:

dataset_provider = FastHfDatasetProvider.cache("wikitext_cache")

However, please note that this method does not apply for NVIDIA-related training, as datasets are automatically created and encoded.

DeepSpeed

If you are using DeepSpeed, run the following command to begin training:

deepspeed deepspeed/train_codegen.py --help

You can customize the training by modifying the arguments defined in CodeGenFlashConfig, DsTrainingArguments, and ds_config.json. By default, the arguments are set to perform a toy training and explain how the pipeline works.

Additionally, if you have a model that has been previously trained with DeepSpeed, you can continue its training or fine-tune as follows:

deepspeed deepspeed/train_codegen.py --pre_trained_model_path <path_to_checkpoint>

Hugging Face

If you are using Hugging Face, run the following command to begin training:

python -m torch.distributed.run --nproc_per_node=4 hf/train_codegen.py --help

You can customize the training by modifying the arguments defined in CodeGenConfig and TrainingArguments. By default, the arguments are set to perform a toy training and explain how the pipeline works.

NVIDIA

If you are using NVIDIA, run the following command to begin training:

python -m torch.distributed.run --nproc_per_node=4 nvidia/train_gpt2.py --help

You can customize the training by modifying the arguments defined in GPT2Config and NvidiaTrainingArguments. By default, the arguments are set to perform a toy training and explain how the pipeline works.