put create workspace in the first place

2019-06-05 18:36:20 -04:00 · 2019-06-05 18:36:20 -04:00 · 9e6c4680fe
--- a/scenarios/sentence_similarity/gensen_aml_deep_dive.ipynb
+++ b/scenarios/sentence_similarity/gensen_aml_deep_dive.ipynb
@ -96,9 +96,7 @@
    "    * 1.3. [Preprocess for GenSen Model](#1.3-Preprocess-for-GenSen-Model)  \n",
    "    * 1.4. [Upload to Azure Blob Storage](#1.4-Upload-to-Azure-Blob-Storage)  \n",
    "2. [Train GenSen Model with Distributed Pytorch with Horovod on AzureML](#2-Train-GenSen-Model-with-Distributed-Pytorch-with-Horovod-on-AzureML)  \n",
-    "    * 2.1. [Initialization](#2.1-Initialization) \n",
-    "        * 2.1.1 [Initialize Workspace](#2.1.1-Initialize-Workspace)  \n",
-    "        * 2.1.2 [Create or Attach Existing AmlCompute](#2.1.2-Create-or-Attach-Existing-AmlCompute)  \n",
+    "    * 2.1 [Create or Attach Existing AmlCompute](#2.1-Create-or-Attach-Existing-AmlCompute)  \n",
    "    * 2.2. [Access to a Project Directory](#2.2-Access-to-a-Project-Directory)  \n",
    "    * 2.3. [Train Model on the Remote Compute](#2.3-Train-Model-on-the-Remote-Compute)  \n",
    "        * 2.3.1 [Prepare Training Script](#2.3.1-Prepare-Training-Script)  \n",
@ -129,7 +127,9 @@
  {
   "cell_type": "code",
   "execution_count": 2,
-   "metadata": {},
+   "metadata": {
+    "scrolled": true
+   },
   "outputs": [
    {
     "name": "stdout",
@ -165,6 +165,60 @@
    "print(\"Pandas version: {}\".format(pd.__version__))"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Initialize Workspace**\n",
+    "\n",
+    "Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. For instructions on how to do this, see [here](README.md). `Workspace.from_config()` creates a workspace object from the details stored in `config.json`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Performing interactive authentication. Please follow the instructions on the terminal.\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING - Note, we have launched a browser for you to login. For old experience with device code, use \"az login --use-device-code\"\n",
+      "WARNING - You have logged in. Now let us find all the subscriptions to which you have access...\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Interactive authentication successfully completed.\n",
+      "Workspace name: MAIDAPTest\n",
+      "Azure region: eastus2\n",
+      "Subscription id: 15ae9cb6-95c1-483d-a0e3-b1a1a3b06324\n",
+      "Resource group: nlprg\n"
+     ]
+    }
+   ],
+   "source": [
+    "ws = azureml_utils.get_or_create_workspace(\n",
+    "    subscription_id=\"<SUBSCRIPTION_ID>\",\n",
+    "    resource_group=\"<RESOURCE_GROUP>\",\n",
+    "    workspace_name=\"<WORKSPACE_NAME>\",\n",
+    "    workspace_region=\"<WORKSPACE_REGION>\"\n",
+    ")\n",
+    "print('Workspace name: ' + ws.name, \n",
+    "      'Azure region: ' + ws.location, \n",
+    "      'Subscription id: ' + ws.subscription_id, \n",
+    "      'Resource group: ' + ws.resource_group, sep='\\n')"
+   ]
+  },
  {
   "cell_type": "markdown",
   "metadata": {},
@ -217,7 +271,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 64,
+   "execution_count": 31,
   "metadata": {},
   "outputs": [],
   "source": [
@ -563,24 +617,32 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 74,
+   "execution_count": 32,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import azureml.data\n",
+    "from azureml.data.azure_storage_datastore import AzureFileDatastore\n",
+    "\n",
+    "data_folder = os.path.join(BASE_DATA_PATH, \"clean\\snli_1.0\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "AzureFile maidaptest3334372853 azureml-filestore-792de9d4-7d0a-464c-b40a-58584f23f5ec $AZUREML_DATAREFERENCE_liqungensen ..\\..\\data\\clean\\snli_1.0\n"
+      "AzureFile maidaptest3334372853 azureml-filestore-792de9d4-7d0a-464c-b40a-58584f23f5ec $AZUREML_DATAREFERENCE_liqungensen\n"
     ]
    }
   ],
   "source": [
-    "import azureml.data\n",
-    "from azureml.data.azure_storage_datastore import AzureFileDatastore\n",
-    "\n",
-    "data_folder = os.path.join(BASE_DATA_PATH, \"clean\\snli_1.0\")\n",
    "ds = ws.get_default_datastore()\n",
-    "print(ds.datastore_type, ds.account_name, ds.container_name, ds.as_mount(), data_folder)"
+    "print(ds.datastore_type, ds.account_name, ds.container_name, ds.as_mount())"
   ]
  },
  {
@ -675,69 +737,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## 2.1 Initialization\n",
-    "In this section, we will initialize a workspace and create a AmlCompute for training."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### 2.1.1 Initialize Workspace\n",
-    "\n",
-    "Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. For instructions on how to do this, see [here](README.md). `Workspace.from_config()` creates a workspace object from the details stored in `config.json`."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 3,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Performing interactive authentication. Please follow the instructions on the terminal.\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "WARNING - Note, we have launched a browser for you to login. For old experience with device code, use \"az login --use-device-code\"\n",
-      "WARNING - You have logged in. Now let us find all the subscriptions to which you have access...\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Interactive authentication successfully completed.\n",
-      "Workspace name: MAIDAPTest\n",
-      "Azure region: eastus2\n",
-      "Subscription id: 15ae9cb6-95c1-483d-a0e3-b1a1a3b06324\n",
-      "Resource group: nlprg\n"
-     ]
-    }
-   ],
-   "source": [
-    "ws = azureml_utils.get_or_create_workspace(\n",
-    "    subscription_id=\"<SUBSCRIPTION_ID>\",\n",
-    "    resource_group=\"<RESOURCE_GROUP>\",\n",
-    "    workspace_name=\"<WORKSPACE_NAME>\",\n",
-    "    workspace_region=\"<WORKSPACE_REGION>\"\n",
-    ")\n",
-    "print('Workspace name: ' + ws.name, \n",
-    "      'Azure region: ' + ws.location, \n",
-    "      'Subscription id: ' + ws.subscription_id, \n",
-    "      'Resource group: ' + ws.resource_group, sep='\\n')"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### 2.1.2 Create or Attach Existing AmlCompute\n",
+    "## 2.1 Create or Attach Existing AmlCompute\n",
    "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, we use Azure ML managed compute ([AmlCompute](https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute)) for our remote training compute resource. Specifically, the below code creates an `STANDARD_NC6` GPU cluster that autoscales from `0` to `4` nodes.\n",
    "\n",
    "**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace, this code will skip the creation process.\n",
--- a/utils_nlp/model/gensen/large_config.json
+++ b/utils_nlp/model/gensen/large_config.json
@ -0,0 +1,45 @@
+{
+  "training": {
+    "optimizer": "adam",
+    "clip_c": 1,
+    "lrate": 0.0001,
+    "batch_size": 48,
+    "n_gpus": 1,
+    "stop_patience": 2
+  },
+  "management": {
+    "monitor_loss": 480,
+    "print_samples": 12800,
+    "checkpoint_freq": 480000,
+    "eval_freq": 9600
+  },
+  "data": {"paths": [
+        {
+            "train_src": "data/processed/snli_1.0_train.txt.s1.tok",
+            "train_trg": "data/processed/snli_1.0_train.txt.s2.tok",
+            "val_src": "data/processed/snli_1.0_dev.txt.s1.tok",
+            "val_trg": "data/processed/snli_1.0_dev.txt.s1.tok",
+            "taskname": "snli"
+        }
+    ],
+        "max_src_length": 90,
+        "max_trg_length": 90,
+        "task": "multi-seq2seq-nli",
+        "save_dir": "data/models/example",
+        "nli_train": "data/processed/snli_1.0_train.txt.clean.noblank",
+        "nli_dev": "data/processed/snli_1.0_dev.txt.clean.noblank",
+        "nli_test": "data/processed/snli_1.0_test.txt.clean.noblank"
+	},
+    "model": {
+    	"dim_src": 2048,
+    	"dim_trg": 2048,
+    	"dim_word_src": 512,
+    	"dim_word_trg": 512,
+    	"n_words_src": 80000,
+    	"n_words_trg": 30000,
+    	"n_layers_src": 1,
+    	"bidirectional": true,
+        "layernorm": false,
+        "dropout": 0.8
+    }
+}