Коммит
da5c145e07
|
@ -10,21 +10,21 @@ SSH into the machine and run the following commands.
|
|||
## Set the path
|
||||
Set the path to the mounted disk depending on the deployed VM with the pre-requisites.
|
||||
NC A100 v4 series
|
||||
```
|
||||
Data_path=/mnt/resource_nvme
|
||||
```
|
||||
```
|
||||
Data_path=/mnt/resource_nvme
|
||||
```
|
||||
NCsv3-series and the NCas_T4_v3 series
|
||||
```
|
||||
Data_path=/mnt/resource_mdisk
|
||||
```
|
||||
```
|
||||
Data_path=/mnt/resource_mdisk
|
||||
```
|
||||
|
||||
# BERT
|
||||
## Clone the repository
|
||||
```
|
||||
mkdir $Data_path/BERT && cd $Data_path/BERT
|
||||
git clone https://github.com/NVIDIA/DeepLearningExamples.git
|
||||
cd $Data_path/BERT/DeepLearningExamples/PyTorch/LanguageModeling/BERT
|
||||
```
|
||||
```
|
||||
mkdir $Data_path/BERT && cd $Data_path/BERT
|
||||
git clone https://github.com/NVIDIA/DeepLearningExamples.git
|
||||
cd $Data_path/BERT/DeepLearningExamples/PyTorch/LanguageModeling/BERT
|
||||
```
|
||||
## Set up the environment
|
||||
1. Get the checkpoints for both models SQUAD and Glue:
|
||||
```
|
||||
|
@ -48,123 +48,123 @@ NCsv3-series and the NCas_T4_v3 series
|
|||
```
|
||||
## Run inference benchmark GLUE
|
||||
This benchmark takes less than one minute per batch size. First, start by opening and modifying the configuration file.
|
||||
```
|
||||
vi scripts/run_glue.sh
|
||||
```
|
||||
```
|
||||
vi scripts/run_glue.sh
|
||||
```
|
||||
Modify the following parameters
|
||||
```
|
||||
init_checkpoint=${1:-"/workspace/bert/checkpoints/pytorch_model.bin"}
|
||||
num_gpu=${7:-"1"}
|
||||
batch_size=${8:-"1"}
|
||||
precision=${14:-"fp32"}
|
||||
mode=${16:-"eval"}
|
||||
```
|
||||
```
|
||||
init_checkpoint=${1:-"/workspace/bert/checkpoints/pytorch_model.bin"}
|
||||
num_gpu=${7:-"1"}
|
||||
batch_size=${8:-"1"}
|
||||
precision=${14:-"fp32"}
|
||||
mode=${16:-"eval"}
|
||||
```
|
||||
Run the benchmark
|
||||
```
|
||||
bash scripts/run_glue.sh
|
||||
```
|
||||
```
|
||||
bash scripts/run_glue.sh
|
||||
```
|
||||
Then, modify only the batch size by incrementations and run the previous command again to obtain more data points.
|
||||
|
||||
## Run inference benchmark SQuAD
|
||||
Reproduce the previous steps for SQUAD. This benchmark takes approximately five minutes per batch size. First, start by opening and modifying the configuration file.
|
||||
```
|
||||
vi scripts/run_squad.sh
|
||||
```
|
||||
```
|
||||
vi scripts/run_squad.sh
|
||||
```
|
||||
Modify the following parameters
|
||||
```
|
||||
init_checkpoint=${1:-"/workspace/bert/checkpoints/bert_large_qa.pt"}
|
||||
num_gpu=${7:-"1"}
|
||||
batch_size=${3:-"1"}
|
||||
precision=${6:-"fp32"}
|
||||
mode=${12:-"eval"}
|
||||
```
|
||||
```
|
||||
init_checkpoint=${1:-"/workspace/bert/checkpoints/bert_large_qa.pt"}
|
||||
num_gpu=${7:-"1"}
|
||||
batch_size=${3:-"1"}
|
||||
precision=${6:-"fp32"}
|
||||
mode=${12:-"eval"}
|
||||
```
|
||||
Run the benchmark
|
||||
```
|
||||
bash scripts/run_squad.sh
|
||||
```
|
||||
```
|
||||
bash scripts/run_squad.sh
|
||||
```
|
||||
Finally, modify only the batch size by incrementations and run the previous command again to obtain more data points.
|
||||
## Run training benchmarks
|
||||
Reproduce the previous steps for both SQUAD and Glue after changing the mode to run training benchmarks.
|
||||
```
|
||||
mode=${12:-"train"}
|
||||
```
|
||||
```
|
||||
mode=${12:-"train"}
|
||||
```
|
||||
|
||||
# ResNet
|
||||
## Clone the repository
|
||||
```
|
||||
mkdir $Data_path/resnet && cd $Data_path/resnet
|
||||
git clone https://github.com/NVIDIA/DeepLearningExamples
|
||||
cd $Data_path/resnet/DeepLearningExamples/PyTorch/Classification/
|
||||
```
|
||||
```
|
||||
mkdir $Data_path/resnet && cd $Data_path/resnet
|
||||
git clone https://github.com/NVIDIA/DeepLearningExamples
|
||||
cd $Data_path/resnet/DeepLearningExamples/PyTorch/Classification/
|
||||
```
|
||||
Download ImageNet Data [available online](https://image-net.org/download-images)
|
||||
|
||||
## Set up the environment
|
||||
Starting with training
|
||||
```
|
||||
mkdir train && mv ILSVRC2012_img_train.tar train/ && cd train
|
||||
tar -xvf ILSVRC2012_img_train.tar && rm -f ILSVRC2012_img_train.tar
|
||||
find . -name "*.tar" | while read NAME ; do mkdir -p "${NAME%.tar}"; tar -xvf "${NAME}" -C "${NAME%.tar}"; rm -f "${NAME}"; done
|
||||
cd ..
|
||||
```
|
||||
```
|
||||
mkdir train && mv ILSVRC2012_img_train.tar train/ && cd train
|
||||
tar -xvf ILSVRC2012_img_train.tar && rm -f ILSVRC2012_img_train.tar
|
||||
find . -name "*.tar" | while read NAME ; do mkdir -p "${NAME%.tar}"; tar -xvf "${NAME}" -C "${NAME%.tar}"; rm -f "${NAME}"; done
|
||||
cd ..
|
||||
```
|
||||
Continuing with inference:
|
||||
```
|
||||
mkdir val && mv ILSVRC2012_img_val.tar val/ && cd val && tar -xvf ILSVRC2012_img_val.tar
|
||||
wget -qO- https://raw.githubusercontent.com/soumith/imagenetloader.torch/master/valprep.sh | bash
|
||||
cd ../ConvNets
|
||||
```
|
||||
```
|
||||
mkdir val && mv ILSVRC2012_img_val.tar val/ && cd val && tar -xvf ILSVRC2012_img_val.tar
|
||||
wget -qO- https://raw.githubusercontent.com/soumith/imagenetloader.torch/master/valprep.sh | bash
|
||||
cd ../ConvNets
|
||||
```
|
||||
Create and launch the container
|
||||
```
|
||||
docker build . -t nvidia_resnet50
|
||||
nvidia-docker run --rm -it -v $Data_path/resnet/DeepLearningExamples:/imagenet --ipc=host nvidia_resnet50
|
||||
```
|
||||
```
|
||||
docker build . -t nvidia_resnet50
|
||||
nvidia-docker run --rm -it -v $Data_path/resnet/DeepLearningExamples:/imagenet --ipc=host nvidia_resnet50
|
||||
```
|
||||
## Run the benchmark
|
||||
Get the pretrained weights from NGC:
|
||||
```
|
||||
wget --content-disposition https://api.ngc.nvidia.com/v2/models/nvidia/resnet50_pyt_amp/versions/20.06.0/zip -O resnet50_pyt_am...
|
||||
unzip resnet50_pyt_amp_20.06.0.zip
|
||||
```
|
||||
```
|
||||
wget --content-disposition https://api.ngc.nvidia.com/v2/models/nvidia/resnet50_pyt_amp/versions/20.06.0/zip -O resnet50_pyt_am...
|
||||
unzip resnet50_pyt_amp_20.06.0.zip
|
||||
```
|
||||
Finally, for the benchmarks start by updating the config file for the desired batch size
|
||||
```
|
||||
vi configs.yml
|
||||
```
|
||||
```
|
||||
vi configs.yml
|
||||
```
|
||||
Run inference benchmark
|
||||
```
|
||||
python ./launch.py --model resnet50 --precision TF32 --mode benchmark_inference --platform DGXA100 /imagenet/PyTorch/Classification/ --raport-file benchmark.json --epochs 1 --prof 100
|
||||
```
|
||||
```
|
||||
python ./launch.py --model resnet50 --precision TF32 --mode benchmark_inference --platform DGXA100 /imagenet/PyTorch/Classification/ --raport-file benchmark.json --epochs 1 --prof 100
|
||||
```
|
||||
Run training benchmark
|
||||
```
|
||||
python ./launch.py --model resnet50 --precision TF32 --mode benchmark_training --platform DGXA100 /imagenet/PyTorch/Classification/ --raport-file benchmark.json --epochs 1 --prof 100
|
||||
```
|
||||
```
|
||||
python ./launch.py --model resnet50 --precision TF32 --mode benchmark_training --platform DGXA100 /imagenet/PyTorch/Classification/ --raport-file benchmark.json --epochs 1 --prof 100
|
||||
```
|
||||
Read the summary to get the values you need. Modify the config file before running the benchmark again to get data points for different batch size
|
||||
|
||||
# SSD
|
||||
## Clone the repository
|
||||
```
|
||||
sudo chmod 1777 /mnt
|
||||
mkdir $Data_path/SSD && cd $Data_path/SSD
|
||||
git clone https://github.com/NVIDIA/DeepLearningExamples
|
||||
```
|
||||
```
|
||||
sudo chmod 1777 /mnt
|
||||
mkdir $Data_path/SSD && cd $Data_path/SSD
|
||||
git clone https://github.com/NVIDIA/DeepLearningExamples
|
||||
```
|
||||
## Set up the environment
|
||||
Get the datasets
|
||||
```
|
||||
mkdir $Data_path/coco && cd $Data_path/coco
|
||||
sudo apt install unzip
|
||||
wget http://images.cocodataset.org/zips/train2017.zip && unzip train2017.zip
|
||||
wget http://images.cocodataset.org/zips/val2017.zip && unzip val2017.zip
|
||||
wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip && unzip annotations_trainval2017.zip
|
||||
```
|
||||
```
|
||||
mkdir $Data_path/coco && cd $Data_path/coco
|
||||
sudo apt install unzip
|
||||
wget http://images.cocodataset.org/zips/train2017.zip && unzip train2017.zip
|
||||
wget http://images.cocodataset.org/zips/val2017.zip && unzip val2017.zip
|
||||
wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip && unzip annotations_trainval2017.zip
|
||||
```
|
||||
Build and launch docker with this three-step launch
|
||||
```
|
||||
cd $Data_path/SSD/DeepLearningExamples/PyTorch/Detection/SSD
|
||||
docker build . -t nvidia_ssd
|
||||
docker run --rm -it --gpus=all --ipc=host -v $Data_path:/coco nvidia_ssd
|
||||
```
|
||||
```
|
||||
cd $Data_path/SSD/DeepLearningExamples/PyTorch/Detection/SSD
|
||||
docker build . -t nvidia_ssd
|
||||
docker run --rm -it --gpus=all --ipc=host -v $Data_path:/coco nvidia_ssd
|
||||
```
|
||||
## Run the benchmarks
|
||||
Inference benchmark takes less than one minute per batch size: modify only the variable eval-batch-size and run again to obtain more data points
|
||||
```
|
||||
python main.py --data /coco/coco --eval-batch-size 1 --mode benchmark-inference
|
||||
```
|
||||
```
|
||||
python main.py --data /coco/coco --eval-batch-size 1 --mode benchmark-inference
|
||||
```
|
||||
Then, one can run training benchmark with the following command. Again, modify the variable batch-size and run again to obtain more data points
|
||||
```
|
||||
python main.py --data /coco/coco --batch-size 2 --mode benchmark-training
|
||||
```
|
||||
```
|
||||
python main.py --data /coco/coco --batch-size 2 --mode benchmark-training
|
||||
```
|
||||
|
|
Загрузка…
Ссылка в новой задаче