Rename MADL recipe to HPMLA

2018-10-04 09:59:19 -07:00 · 2018-10-04 09:59:19 -07:00 · 6f35a50bf7
--- a/recipes/HPMLA-CPU-OpenMPI/Data-Shredding/DataShredding.py
+++ b/recipes/HPMLA-CPU-OpenMPI/Data-Shredding/DataShredding.py
--- a/recipes/HPMLA-CPU-OpenMPI/Data-Shredding/README.md
+++ b/recipes/HPMLA-CPU-OpenMPI/Data-Shredding/README.md
@ -0,0 +1,29 @@
+## HPMLA-CPU-OpenMPI Data Shredding
+This Data Shredding recipe shows how to shred and deploy your training data
+for HPMLA prior to running the training job on Azure VMs via Open MPI.
+
+### Data Shredding Configuration
+Rename the `configuration-template.json` to `configuration.json`.
+The configuration should enable the following properties:
+
+* `node_count` should be set to the number of VMs in the compute pool.
+* `thread_count` thread's count per VM.
+* `training_data_shred_count` It's advisable to set this number high. This way you only do this step once, and use it for different VMs configuration.
+* `dataset_local_directory` A local directory to download and shred the training data according to `training_data_shred_count`.
+* `shredded_dataset_Per_Node` A local directory to hold the final data shreds before deploying them to Azure blobs.
+* `container_name` container name where the sliced data will be stored.
+* `trainind_dataset_name` name for the dataset.  Used when creating the data blobs.
+* `subscription_id` Azure subscription id.
+* `secret_key` Azure password.
+* `resource_group` Resource group name.
+* `storage_account` storage account name and access key.
+* `training_data_container_name` Container name where the training data is hosted.
+
+You can use your own access mechanism (password, access key, etc.). Above is
+only a one example.  Although, make sure to update the python script
+every time you make a configuration change.
+
+You must agree to the following licenses prior to use:
+* [High Performance ML Algorithms License](https://github.com/saeedmaleki/Distributed-Linear-Learner/blob/master/High%20Performance%20ML%20Algorithms%20-%20Standalone%20(free)%20Use%20Terms%20V2%20(06-06-18).txt)
+* [TPN Ubuntu Container](https://github.com/saeedmaleki/Distributed-Linear-Learner/blob/master/TPN_Ubuntu%20Container_16-04-FINAL.txt)
+* [Microsoft Third Party Notice](https://github.com/saeedmaleki/Distributed-Linear-Learner/blob/master/MicrosoftThirdPartyNotice.txt)
--- a/recipes/HPMLA-CPU-OpenMPI/Data-Shredding/configuration-template.json
+++ b/recipes/HPMLA-CPU-OpenMPI/Data-Shredding/configuration-template.json
--- a/recipes/HPMLA-CPU-OpenMPI/README.md
+++ b/recipes/HPMLA-CPU-OpenMPI/README.md
@ -1,6 +1,6 @@
-# MADL-CPU-OpenMPI
-This recipe shows how to run High Performance ML Algorithms Learner on CPUs across
-Azure VMs via Open MPI.
+# HPMLA-CPU-OpenMPI
+This recipe shows how to run High Performance ML Algorithms (HPMLA) on CPUs
+across Azure VMs via Open MPI.

 ## Configuration
 Please see refer to this [set of sample configuration files](./config) for
@ -8,30 +8,31 @@ this recipe.

 ### Pool Configuration
 The pool configuration should enable the following properties:
-* `vm_size` should be a CPU-only instance, for example, 'STANDARD_D2_V2'.
+* `vm_size` should be a CPU-only instance, for example, `STANDARD_D2_V2`.
 * `inter_node_communication_enabled` must be set to `true`
 * `max_tasks_per_node` must be set to 1 or omitted

 ### Global Configuration
 The global configuration should set the following properties:
-* `docker_images` array must have a reference to a valid MADL
-Docker image that can be run with OpenMPI. The image denoted with `0.0.1` tag found in [msmadl/symsgd:0.0.1](https://hub.docker.com/r/msmadl/symsgd/)
+* `docker_images` array must have a reference to a valid HPMLA
+Docker image that can be run with OpenMPI. The image denoted with `0.0.1`
+tag found in [msmadl/symsgd:0.0.1](https://hub.docker.com/r/msmadl/symsgd/)
 is compatible with Azure Batch Shipyard VMs.

 ### MPI Jobs Configuration (MultiNode)
 The jobs configuration should set the following properties within the `tasks`
 array which should have a task definition containing:
-* `docker_image` should be the name of the Docker image for this container invocation.
-For this example, this should be
+* `docker_image` should be the name of the Docker image for this container
+invocation. For this example, this should be
 `msmadl/symsgd:0.0.1`.
 Please note that the `docker_images` in the Global Configuration should match
 this image name.
 * `command` should contain the command to pass to the Docker run invocation.
-For this MADL training example with the `msmadl/symsgd:0.0.1` Docker image. The
+For this HPMLA training example with the `msmadl/symsgd:0.0.1` Docker image. The
 application `command` to run would be:
 `"/parasail/run_parasail.sh -w /parasail/supersgd -l 1e-4 -k 32 -m 1e-2 -e 10 -r 10 -f $AZ_BATCH_NODE_SHARED_DIR/azblob/<container_name from the data shredding configuration file> -t 1 -g 1 -d $AZ_BATCH_TASK_WORKING_DIR/models -b $AZ_BATCH_NODE_SHARED_DIR/azblob/<container_name from the data shredding configuration file>"`
  * [`run_parasail.sh`](docker/run_parasail.sh) has these parameters
-    * `-w` the MADL superSGD directory
+    * `-w` the HPMLA superSGD directory
    * `-l` learning rate
    * `-k` approximation rank constant
    * `-m` model combiner convergence threshold
@ -42,8 +43,8 @@ application `command` to run would be:
    * `-g` log global models every this many epochs
    * `-d` log global models to this directory at the host"
    * `-b` location for the algorithm's binary"
-	
-* The training data will need to be shredded to match the number of VMs and the thread's count per VM, and then deployed to a mounted Azure blob that the VM docker images have read/write access.  
+
+* The training data will need to be shredded to match the number of VMs and the thread's count per VM, and then deployed to a mounted Azure blob that the VM docker images have read/write access.
 A basic python script that can be used to shred and deploy the training data to a blob container, and other data shredding files can be found [here](./DataShredding).
 * `shared_data_volumes`  should contain the shared data volume with an `azureblob` volume driver as specified in the global configuration file found [here](./config/config.yaml).

@ -57,4 +58,4 @@ Supplementary files can be found [here](./docker).
 You must agree to the following licenses prior to use:
 * [High Performance ML Algorithms License](https://github.com/saeedmaleki/Distributed-Linear-Learner/blob/master/High%20Performance%20ML%20Algorithms%20-%20Standalone%20(free)%20Use%20Terms%20V2%20(06-06-18).txt)
 * [TPN Ubuntu Container](https://github.com/saeedmaleki/Distributed-Linear-Learner/blob/master/TPN_Ubuntu%20Container_16-04-FINAL.txt)
-* [Microsoft Third Party Notice](https://github.com/saeedmaleki/Distributed-Linear-Learner/blob/master/MicrosoftThirdPartyNotice.txt) 
+* [Microsoft Third Party Notice](https://github.com/saeedmaleki/Distributed-Linear-Learner/blob/master/MicrosoftThirdPartyNotice.txt)
--- a/recipes/HPMLA-CPU-OpenMPI/config/config.yaml
+++ b/recipes/HPMLA-CPU-OpenMPI/config/config.yaml
--- a/recipes/HPMLA-CPU-OpenMPI/config/credentials.yaml
+++ b/recipes/HPMLA-CPU-OpenMPI/config/credentials.yaml
--- a/recipes/HPMLA-CPU-OpenMPI/config/jobs.yaml
+++ b/recipes/HPMLA-CPU-OpenMPI/config/jobs.yaml
--- a/recipes/HPMLA-CPU-OpenMPI/config/pool.yaml
+++ b/recipes/HPMLA-CPU-OpenMPI/config/pool.yaml
--- a/recipes/HPMLA-CPU-OpenMPI/docker/Dockerfile
+++ b/recipes/HPMLA-CPU-OpenMPI/docker/Dockerfile
@ -1,4 +1,4 @@
-#Dockerfile for MADL (Microsoft Distributed Learners)
+#Dockerfile for HPMLA (Microsoft High Performance ML Algorithms)

 FROM ubuntu:16.04
 MAINTAINER Saeed Maleki Todd Mytkowicz Madan Musuvathi Dany rouhana https://github.com/saeedmaleki/Distributed-Linear-Learner
@ -15,7 +15,7 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
 		openmpi-common \
 		libopenmpi-dev && \
 		apt-get clean && \
-		rm -rf /var/lib/apt/lists/* 
+		rm -rf /var/lib/apt/lists/*

 # configure ssh server and keys
 RUN mkdir -p /root/.ssh && \
@ -27,12 +27,12 @@ RUN mkdir -p /root/.ssh && \
    ssh-keygen -f /root/.ssh/id_rsa -t rsa -N '' && \
    chmod 600 /root/.ssh/config && \
    chmod 700 /root/.ssh && \
-    cp /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys	
+    cp /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys

-# set parasail dir	
+# set parasail dir
 WORKDIR /parasail

-# to create your own image, first download the supersgd from the link supplied in the read me file, 
+# to create your own image, first download the supersgd from the link supplied in the read me file,
 # and the put it in the same dir as this file.
 COPY supersgd /parasail
 COPY run_parasail.sh /parasail
--- a/recipes/HPMLA-CPU-OpenMPI/docker/README.md
+++ b/recipes/HPMLA-CPU-OpenMPI/docker/README.md
--- a/recipes/HPMLA-CPU-OpenMPI/docker/run_parasail.sh
+++ b/recipes/HPMLA-CPU-OpenMPI/docker/run_parasail.sh
--- a/recipes/HPMLA-CPU-OpenMPI/docker/ssh_config
+++ b/recipes/HPMLA-CPU-OpenMPI/docker/ssh_config
--- a/recipes/MADL-CPU-OpenMPI/Data-Shredding/README.md
+++ b/recipes/MADL-CPU-OpenMPI/Data-Shredding/README.md
@ -1,26 +0,0 @@
-## MADL-CPU-OpenMPI Data Shredding
-This Data Shredding recipe shows how to shred and deploy your training data prior to running a training job on Azure VMs via Open MPI.
-
-### Data Shredding Configuration
-Rename the configuration-template.json to configuration.json.  The configuration should enable the following properties:
-* `node_count` should be set to the number of VMs in the compute pool.
-* `thread_count` thread's count per VM.
-* `training_data_shred_count` It's  advisable to set this number high. This way you only do this step once, and use it for different VMs configuration.
-* 'dataset_local_directory' A local directory to download and shred the training data according to 'training_data_shred_count'.
-* 'shredded_dataset_Per_Node' A local directory to hold the final data shreds before deploying them to Azure blobs. 
-* 'container_name' container name where the sliced data will be stored.
-* 'trainind_dataset_name' name for the dataset.  Used when creating the data blobs.
-* 'subscription_id' Azure subscription id.
-* 'secret_key' Azure password.
-* 'resource_group' Resource group name.
-* 'storage_account' storage account name and access key.
-* 'training_data_container_name' Container name where the training data is hosted.
-*''
-
-You can use your own access mechanism (password, access key, etc.).  The above is only a one example.  Although, make sure to update the python script 
-every time you make a configuration change.
-
-You must agree to the following licenses prior to use:
-* [High Performance ML Algorithms License](https://github.com/saeedmaleki/Distributed-Linear-Learner/blob/master/High%20Performance%20ML%20Algorithms%20-%20Standalone%20(free)%20Use%20Terms%20V2%20(06-06-18).txt)
-* [TPN Ubuntu Container](https://github.com/saeedmaleki/Distributed-Linear-Learner/blob/master/TPN_Ubuntu%20Container_16-04-FINAL.txt)
-* [Microsoft Third Party Notice](https://github.com/saeedmaleki/Distributed-Linear-Learner/blob/master/MicrosoftThirdPartyNotice.txt) 
--- a/recipes/README.md
+++ b/recipes/README.md
@ -105,9 +105,10 @@ This Keras+Theano-GPU recipe contains information on how to containerize
 [Theano](http://www.deeplearning.net/software/theano/) backend for use with
 N-Series Azure VMs.

-#### [MADL-CPU-OpenMPI](./MADL-CPU-OpenMPI)
+#### [HPMLA-CPU-OpenMPI](./HPMLA-CPU-OpenMPI)
 This recipe contains information on how to containerize the Microsoft High
-Performance ML Algorithms Learner for use across multiple compute nodes.
+Performance ML Algorithms (HPMLA) for use across multiple compute
+nodes.

 #### [MXNet-CPU](./MXNet-CPU)
 This MXNet-CPU recipe contains information on how to containerize