batch-shipyard/recipes/HPMLA-CPU-OpenMPI/Data-Shredding
Fred Park 6f35a50bf7
Rename MADL recipe to HPMLA
2018-10-04 09:59:19 -07:00
..
DataShredding.py Rename MADL recipe to HPMLA 2018-10-04 09:59:19 -07:00
README.md Rename MADL recipe to HPMLA 2018-10-04 09:59:19 -07:00
configuration-template.json Rename MADL recipe to HPMLA 2018-10-04 09:59:19 -07:00

README.md

HPMLA-CPU-OpenMPI Data Shredding

This Data Shredding recipe shows how to shred and deploy your training data for HPMLA prior to running the training job on Azure VMs via Open MPI.

Data Shredding Configuration

Rename the configuration-template.json to configuration.json. The configuration should enable the following properties:

  • node_count should be set to the number of VMs in the compute pool.
  • thread_count thread's count per VM.
  • training_data_shred_count It's advisable to set this number high. This way you only do this step once, and use it for different VMs configuration.
  • dataset_local_directory A local directory to download and shred the training data according to training_data_shred_count.
  • shredded_dataset_Per_Node A local directory to hold the final data shreds before deploying them to Azure blobs.
  • container_name container name where the sliced data will be stored.
  • trainind_dataset_name name for the dataset. Used when creating the data blobs.
  • subscription_id Azure subscription id.
  • secret_key Azure password.
  • resource_group Resource group name.
  • storage_account storage account name and access key.
  • training_data_container_name Container name where the training data is hosted.

You can use your own access mechanism (password, access key, etc.). Above is only a one example. Although, make sure to update the python script every time you make a configuration change.

You must agree to the following licenses prior to use: