Complete end to end working version

This commit is contained in:
Mathew Salvaris 2018-03-31 17:47:28 +00:00
Родитель d97feee55a
Коммит 9fedc493ca
8 изменённых файлов: 655 добавлений и 595 удалений

Просмотреть файл

@ -363,6 +363,13 @@
"source": [
"dict(zip(labels, top_results))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can move onto [developing the model api for our model](01_DevelopModelDriver.ipynb)"
]
}
],
"metadata": {

Просмотреть файл

@ -284,23 +284,19 @@
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'[[[\"n02127052 lynx, catamount\", 0.9974517226219177], [\"n02128385 leopard, Panthera pardus\", 0.001507689943537116], [\"n02128757 snow leopard, ounce, Panthera uncia\", 0.0005164744798094034]], \"Computed in 925.01 ms\"]'"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"outputs": [],
"source": [
"json.dumps(output)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can move onto [building our docker image](02_BuildImage.ipynb)"
]
}
],
"metadata": {

Просмотреть файл

@ -5,9 +5,7 @@
"metadata": {},
"source": [
"# Build Docker Image \n",
"In this notebook we will build the docker container that contains the Resnet 152 model, Flask web application, model driver and all dependencies.\n",
" \n",
"Make sure you are have logged in using docker login. \n"
"In this notebook we will build the docker container that contains the Resnet 152 model, Flask web application, model driver and all dependencies."
]
},
{
@ -1387,6 +1385,13 @@
"!docker build -t $image_name -f $docker_file_location $application_path"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Below we will push the image created to our dockerhub registry. Make sure you have already logged in to the appropriate dockerhub account using the docker login command"
]
},
{
"cell_type": "code",
"execution_count": 192,
@ -1447,7 +1452,7 @@
},
"source": [
"### Test locally\n",
"Go to the [Test Locally notebook](TestLocally.ipynb) to test your Docker image"
"Go to the [Test Locally notebook](03_TestLocally.ipynb) to test your Docker image"
]
}
],

Просмотреть файл

@ -310,14 +310,14 @@
},
{
"cell_type": "code",
"execution_count": 43,
"execution_count": 19,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Average time taken: 37.68 ms\n"
"Average time taken: 38.30 ms\n"
]
}
],
@ -334,14 +334,14 @@
},
{
"cell_type": "code",
"execution_count": 64,
"execution_count": 20,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"b457161bb223\n"
"b6f6e673f347\n"
]
}
],
@ -354,7 +354,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We can move onto [deploying our web application on ACS](DeployOnACS.ipynb)"
"We can move onto [deploying our web application on AKS](04_DeployOnAKS.ipynb)"
]
}
],

Просмотреть файл

@ -51,13 +51,15 @@
},
"outputs": [],
"source": [
"resource_group = \"msaksrg\" # Feel free to modify these\n",
"aks_name = \"msAKSTFCluster\"\n",
"# Please modify the below as you see fit\n",
"resource_group = \"<RESOURCE_GROUP>\" \n",
"aks_name = \"<AKS_CLUSTER_NAME>\"\n",
"location = \"eastus\"\n",
"\n",
"image_name = 'masalvar/tfresnet-gpu' \n",
"selected_subscription = \"'Team Danielle Internal'\" # If you have multiple subscriptions select \n",
" # the subscription you want to use here"
"image_name = '<YOUR_DOCKER_IMAGE>' # 'masalvar/tfresnet-gpu' Feel free to use this Image if you want to \n",
" # skip creating your own container\n",
"selected_subscription = \"'<YOUR SUBSCRIPTION>'\" # If you have multiple subscriptions select \n",
" # the subscription you want to use here"
]
},
{
@ -79,37 +81,11 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": null,
"metadata": {
"scrolled": true
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[33mTo sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code FNGYXPRU3 to authenticate.\u001b[0m\n",
"CloudName Name State TenantId IsDefault\n",
"----------- ----------------------------- ------- ------------------------------------ -----------\n",
"AzureCloud Boston DS Dev Enabled 72f988bf-86f1-41af-91ab-2d7cd011db47\n",
"AzureCloud Azure Internal - London Enabled 72f988bf-86f1-41af-91ab-2d7cd011db47\n",
"AzureCloud Team Danielle Internal Enabled 72f988bf-86f1-41af-91ab-2d7cd011db47 True\n",
"AzureCloud Visual Studio Enterprise Enabled 72f988bf-86f1-41af-91ab-2d7cd011db47\n",
"AzureCloud Boston Engineering Enabled 72f988bf-86f1-41af-91ab-2d7cd011db47\n",
"AzureCloud ADLTrainingMS Enabled 72f988bf-86f1-41af-91ab-2d7cd011db47\n",
"AzureCloud PhillyExt Enabled 72f988bf-86f1-41af-91ab-2d7cd011db47\n",
"AzureCloud Ads Eng Big Data Subscription Enabled 72f988bf-86f1-41af-91ab-2d7cd011db47\n",
"AzureCloud Data Wrangling Preview Enabled 72f988bf-86f1-41af-91ab-2d7cd011db47\n",
"AzureCloud Data Wrangling development Enabled 72f988bf-86f1-41af-91ab-2d7cd011db47\n",
"AzureCloud AzureML Client PROD Enabled 72f988bf-86f1-41af-91ab-2d7cd011db47\n",
"AzureCloud R portal - Production Enabled 72f988bf-86f1-41af-91ab-2d7cd011db47\n",
"AzureCloud PhillyInt Enabled 72f988bf-86f1-41af-91ab-2d7cd011db47\n",
"AzureCloud Solution Template Testing Enabled 72f988bf-86f1-41af-91ab-2d7cd011db47\n",
"AzureCloud Team Ilan Enabled 72f988bf-86f1-41af-91ab-2d7cd011db47\n",
"AzureCloud Marketing Automation Enabled 72f988bf-86f1-41af-91ab-2d7cd011db47\n"
]
}
],
"outputs": [],
"source": [
"!az login -o table"
]
@ -127,28 +103,9 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{\r\n",
" \"environmentName\": \"AzureCloud\",\r\n",
" \"id\": \"edf507a2-6235-46c5-b560-fd463ba2e771\",\r\n",
" \"isDefault\": true,\r\n",
" \"name\": \"Team Danielle Internal\",\r\n",
" \"state\": \"Enabled\",\r\n",
" \"tenantId\": \"72f988bf-86f1-41af-91ab-2d7cd011db47\",\r\n",
" \"user\": {\r\n",
" \"name\": \"masalvar@microsoft.com\",\r\n",
" \"type\": \"user\"\r\n",
" }\r\n",
"}\r\n"
]
}
],
"outputs": [],
"source": [
"!az account show"
]
@ -194,7 +151,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 4,
"metadata": {},
"outputs": [
{
@ -220,7 +177,7 @@
},
{
"cell_type": "code",
"execution_count": 29,
"execution_count": 5,
"metadata": {},
"outputs": [
{
@ -245,7 +202,7 @@
" }\n",
" ],\n",
" \"dnsPrefix\": \"msAKSTFClu-msaksrg-edf507\",\n",
" \"fqdn\": \"msakstfclu-msaksrg-edf507-26f4c0b4.hcp.eastus.azmk8s.io\",\n",
" \"fqdn\": \"msakstfclu-msaksrg-edf507-1f197d36.hcp.eastus.azmk8s.io\",\n",
" \"id\": \"/subscriptions/edf507a2-6235-46c5-b560-fd463ba2e771/resourcegroups/msaksrg/providers/Microsoft.ContainerService/managedClusters/msAKSTFCluster\",\n",
" \"kubernetesVersion\": \"1.7.9\",\n",
" \"linuxProfile\": {\n",
@ -291,14 +248,14 @@
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[33mDownloading client to /usr/local/bin/kubectl from https://storage.googleapis.com/kubernetes-release/release/v1.9.4/bin/linux/amd64/kubectl\u001b[0m\n",
"\u001b[33mDownloading client to /usr/local/bin/kubectl from https://storage.googleapis.com/kubernetes-release/release/v1.10.0/bin/linux/amd64/kubectl\u001b[0m\n",
"\u001b[33mPlease ensure that /usr/local/bin is in your search PATH, so the `kubectl` command can be found.\u001b[0m\n"
]
}
@ -307,9 +264,16 @@
"!sudo az aks install-cli"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Below we define our manifest file for our service and load balancer. Note that we have to specify the volume mounts to the drivers that are located on the node."
]
},
{
"cell_type": "code",
"execution_count": 60,
"execution_count": 7,
"metadata": {
"collapsed": true
},
@ -417,7 +381,7 @@
},
{
"cell_type": "code",
"execution_count": 61,
"execution_count": 8,
"metadata": {
"collapsed": true
},
@ -431,29 +395,29 @@
},
{
"cell_type": "code",
"execution_count": 62,
"execution_count": 9,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"write_json_to_file(app_template, 'az-dl.json')"
"write_json_to_file(app_template, 'az-dl.json') # We write the service template to the json file"
]
},
{
"cell_type": "code",
"execution_count": 63,
"execution_count": 10,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"write_json_to_file(service_temp, 'az-dl.json', mode='a')"
"write_json_to_file(service_temp, 'az-dl.json', mode='a') # We add the loadbelanacer template to the json file"
]
},
{
"cell_type": "code",
"execution_count": 64,
"execution_count": 11,
"metadata": {},
"outputs": [
{
@ -568,7 +532,7 @@
},
{
"cell_type": "code",
"execution_count": 35,
"execution_count": 12,
"metadata": {},
"outputs": [
{
@ -585,7 +549,7 @@
},
{
"cell_type": "code",
"execution_count": 65,
"execution_count": 13,
"metadata": {},
"outputs": [
{
@ -593,7 +557,7 @@
"output_type": "stream",
"text": [
"NAME STATUS ROLES AGE VERSION\r\n",
"aks-nodepool1-27496346-0 Ready agent 44m v1.7.9\r\n"
"aks-nodepool1-27496346-0 Ready agent 2m v1.7.9\r\n"
]
}
],
@ -603,7 +567,7 @@
},
{
"cell_type": "code",
"execution_count": 66,
"execution_count": 14,
"metadata": {},
"outputs": [
{
@ -611,13 +575,13 @@
"output_type": "stream",
"text": [
"NAMESPACE NAME READY STATUS RESTARTS AGE\r\n",
"kube-system heapster-2574232661-1dx42 2/2 Running 0 41m\r\n",
"kube-system kube-dns-v20-2253765213-3kb0s 3/3 Running 0 42m\r\n",
"kube-system kube-dns-v20-2253765213-p80ng 3/3 Running 0 42m\r\n",
"kube-system kube-proxy-9zd4s 1/1 Running 0 42m\r\n",
"kube-system kube-svc-redirect-c8klv 1/1 Running 0 42m\r\n",
"kube-system kubernetes-dashboard-2898242510-9l409 1/1 Running 0 42m\r\n",
"kube-system tunnelfront-180102643-hn69h 1/1 Running 0 42m\r\n"
"kube-system heapster-2574232661-07lzh 2/2 Running 0 1m\r\n",
"kube-system kube-dns-v20-2253765213-730n6 3/3 Running 0 2m\r\n",
"kube-system kube-dns-v20-2253765213-m9d9q 3/3 Running 0 2m\r\n",
"kube-system kube-proxy-3d25d 1/1 Running 0 2m\r\n",
"kube-system kube-svc-redirect-psp3n 1/1 Running 0 2m\r\n",
"kube-system kubernetes-dashboard-2898242510-7h28r 1/1 Running 0 2m\r\n",
"kube-system tunnelfront-527646831-lj63z 1/1 Running 0 2m\r\n"
]
}
],
@ -625,16 +589,23 @@
"!kubectl get pods --all-namespaces"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This command will create everything we specified in the az-dl.json manifest file."
]
},
{
"cell_type": "code",
"execution_count": 67,
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"deployment \"azure-dl\" created\n",
"deployment.apps \"azure-dl\" created\n",
"service \"azure-dl\" created\n"
]
}
@ -643,9 +614,16 @@
"!kubectl create -f az-dl.json"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"After a few seconds you should see the pod start running on the cluster."
]
},
{
"cell_type": "code",
"execution_count": 69,
"execution_count": 18,
"metadata": {},
"outputs": [
{
@ -653,14 +631,14 @@
"output_type": "stream",
"text": [
"NAMESPACE NAME READY STATUS RESTARTS AGE\r\n",
"default azure-dl-3880299103-jsn4n 1/1 Running 0 11m\r\n",
"kube-system heapster-2574232661-1dx42 2/2 Running 0 53m\r\n",
"kube-system kube-dns-v20-2253765213-3kb0s 3/3 Running 0 54m\r\n",
"kube-system kube-dns-v20-2253765213-p80ng 3/3 Running 0 54m\r\n",
"kube-system kube-proxy-9zd4s 1/1 Running 0 54m\r\n",
"kube-system kube-svc-redirect-c8klv 1/1 Running 0 54m\r\n",
"kube-system kubernetes-dashboard-2898242510-9l409 1/1 Running 0 54m\r\n",
"kube-system tunnelfront-180102643-hn69h 1/1 Running 0 54m\r\n"
"default azure-dl-3880299103-v5mb7 1/1 Running 0 4m\r\n",
"kube-system heapster-2574232661-07lzh 2/2 Running 0 5m\r\n",
"kube-system kube-dns-v20-2253765213-730n6 3/3 Running 0 6m\r\n",
"kube-system kube-dns-v20-2253765213-m9d9q 3/3 Running 0 6m\r\n",
"kube-system kube-proxy-3d25d 1/1 Running 0 6m\r\n",
"kube-system kube-svc-redirect-psp3n 1/1 Running 0 6m\r\n",
"kube-system kubernetes-dashboard-2898242510-7h28r 1/1 Running 0 6m\r\n",
"kube-system tunnelfront-527646831-lj63z 1/1 Running 0 6m\r\n"
]
}
],
@ -668,9 +646,16 @@
"!kubectl get pods --all-namespaces"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If anything goes wrong you can use the commands below to observe the events on the node as well as review the logs."
]
},
{
"cell_type": "code",
"execution_count": 70,
"execution_count": 19,
"metadata": {},
"outputs": [
{
@ -678,63 +663,27 @@
"output_type": "stream",
"text": [
"LAST SEEN FIRST SEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE\r\n",
"56m 1h 7 aks-nodepool1-27496346-0.151e4321d9812de1 Node Normal NodeHasSufficientDisk kubelet, aks-nodepool1-27496346-0 Node aks-nodepool1-27496346-0 status is now: NodeHasSufficientDisk\r\n",
"56m 1h 7 aks-nodepool1-27496346-0.151e4321d9818fef Node Normal NodeHasSufficientMemory kubelet, aks-nodepool1-27496346-0 Node aks-nodepool1-27496346-0 status is now: NodeHasSufficientMemory\r\n",
"56m 1h 7 aks-nodepool1-27496346-0.151e4321d981b123 Node Normal NodeHasNoDiskPressure kubelet, aks-nodepool1-27496346-0 Node aks-nodepool1-27496346-0 status is now: NodeHasNoDiskPressure\r\n",
"5s 1h 62 aks-nodepool1-27496346-0.151e4321d998eade Node Warning FailedNodeAllocatableEnforcement kubelet, aks-nodepool1-27496346-0 Failed to update Node Allocatable Limits \"\": failed to set supported cgroup subsystems for cgroup : Failed to set config for supported subsystems : failed to write 59076296704 to memory.limit_in_bytes: write /var/lib/docker/overlay2/5ee5687fca21ea5e2ffbdbbf82a839179a687ca508be84758a090e41fcb3ecf2/merged/sys/fs/cgroup/memory/memory.limit_in_bytes: invalid argument\r\n",
"54m 54m 1 aks-nodepool1-27496346-0.151e437c603ba0d3 Node Normal RegisteredNode controllermanager Node aks-nodepool1-27496346-0 event: Registered Node aks-nodepool1-27496346-0 in NodeController\r\n",
"54m 54m 1 aks-nodepool1-27496346-0.151e437e41f27985 Node Normal Starting kube-proxy, aks-nodepool1-27496346-0 Starting kube-proxy.\r\n",
"54m 54m 1 aks-nodepool1-27496346-0.151e438133293444 Node Normal NodeReady kubelet, aks-nodepool1-27496346-0 Node aks-nodepool1-27496346-0 status is now: NodeReady\r\n",
"31m 31m 1 azure-dl-2914933029-stvh2.151e44c4846b082f Pod Normal Scheduled default-scheduler Successfully assigned azure-dl-2914933029-stvh2 to aks-nodepool1-27496346-0\r\n",
"31m 31m 1 azure-dl-2914933029-stvh2.151e44c49605c701 Pod Normal SuccessfulMountVolume kubelet, aks-nodepool1-27496346-0 MountVolume.SetUp succeeded for volume \"libcuda\" \r\n",
"31m 31m 1 azure-dl-2914933029-stvh2.151e44c49606ceae Pod Normal SuccessfulMountVolume kubelet, aks-nodepool1-27496346-0 MountVolume.SetUp succeeded for volume \"lib\" \r\n",
"31m 31m 1 azure-dl-2914933029-stvh2.151e44c496093aff Pod Normal SuccessfulMountVolume kubelet, aks-nodepool1-27496346-0 MountVolume.SetUp succeeded for volume \"bin\" \r\n",
"31m 31m 1 azure-dl-2914933029-stvh2.151e44c49675d85e Pod Normal SuccessfulMountVolume kubelet, aks-nodepool1-27496346-0 MountVolume.SetUp succeeded for volume \"default-token-hnhd0\" \r\n",
"29m 31m 2 azure-dl-2914933029-stvh2.151e44c4c33344c2 Pod spec.containers{azure-dl} Normal Pulling kubelet, aks-nodepool1-27496346-0 pulling image \"masalvar/cntkresnet-gpu\"\r\n",
"29m 29m 1 azure-dl-2914933029-stvh2.151e44d7c373b7b1 Pod spec.containers{azure-dl} Warning Failed kubelet, aks-nodepool1-27496346-0 Failed to pull image \"masalvar/cntkresnet-gpu\": rpc error: code = 2 desc = net/http: request canceled\r\n",
"26m 29m 3 azure-dl-2914933029-stvh2.151e44d7c37615f3 Pod Warning FailedSync kubelet, aks-nodepool1-27496346-0 Error syncing pod\r\n",
"29m 29m 1 azure-dl-2914933029-stvh2.151e44d80064f440 Pod spec.containers{azure-dl} Normal BackOff kubelet, aks-nodepool1-27496346-0 Back-off pulling image \"masalvar/cntkresnet-gpu\"\r\n",
"26m 26m 1 azure-dl-2914933029-stvh2.151e4508233ca91d Pod spec.containers{azure-dl} Normal Pulled kubelet, aks-nodepool1-27496346-0 Successfully pulled image \"masalvar/cntkresnet-gpu\"\r\n",
"26m 26m 1 azure-dl-2914933029-stvh2.151e450823842f64 Pod spec.containers{azure-dl} Warning Failed kubelet, aks-nodepool1-27496346-0 Error: Error response from daemon: {\"message\":\"No such container: 620a683f81da9738cee9242ffc83fd7dc71f76493efaac48bb263196ad117836\"}\r\n",
"26m 26m 1 azure-dl-2914933029-tlnmp.151e450845ebce8c Pod Normal Scheduled default-scheduler Successfully assigned azure-dl-2914933029-tlnmp to aks-nodepool1-27496346-0\r\n",
"26m 26m 1 azure-dl-2914933029-tlnmp.151e45085074d34a Pod Normal SuccessfulMountVolume kubelet, aks-nodepool1-27496346-0 MountVolume.SetUp succeeded for volume \"lib\" \r\n",
"26m 26m 1 azure-dl-2914933029-tlnmp.151e450850a6f5c0 Pod Normal SuccessfulMountVolume kubelet, aks-nodepool1-27496346-0 MountVolume.SetUp succeeded for volume \"bin\" \r\n",
"26m 26m 1 azure-dl-2914933029-tlnmp.151e450850b5de92 Pod Normal SuccessfulMountVolume kubelet, aks-nodepool1-27496346-0 MountVolume.SetUp succeeded for volume \"libcuda\" \r\n",
"26m 26m 1 azure-dl-2914933029-tlnmp.151e4508510e2849 Pod Normal SuccessfulMountVolume kubelet, aks-nodepool1-27496346-0 MountVolume.SetUp succeeded for volume \"default-token-hnhd0\" \r\n",
"25m 25m 1 azure-dl-2914933029-tlnmp.151e450fa7fbbb30 Pod spec.containers{azure-dl} Normal Pulling kubelet, aks-nodepool1-27496346-0 pulling image \"masalvar/cntkresnet-gpu\"\r\n",
"25m 25m 1 azure-dl-2914933029-tlnmp.151e45100e809afd Pod spec.containers{azure-dl} Normal Pulled kubelet, aks-nodepool1-27496346-0 Successfully pulled image \"masalvar/cntkresnet-gpu\"\r\n",
"25m 25m 1 azure-dl-2914933029-tlnmp.151e4510e3e2cf7b Pod spec.containers{azure-dl} Normal Created kubelet, aks-nodepool1-27496346-0 Created container\r\n",
"25m 25m 1 azure-dl-2914933029-tlnmp.151e4510eda5194d Pod spec.containers{azure-dl} Normal Started kubelet, aks-nodepool1-27496346-0 Started container\r\n",
"12m 12m 1 azure-dl-2914933029-tlnmp.151e45c888d4874e Pod spec.containers{azure-dl} Normal Killing kubelet, aks-nodepool1-27496346-0 Killing container with id docker://azure-dl:Need to kill Pod\r\n",
"31m 31m 1 azure-dl-2914933029.151e44c4828da01b ReplicaSet Normal SuccessfulCreate replicaset-controller Created pod: azure-dl-2914933029-stvh2\r\n",
"27m 27m 1 azure-dl-2914933029.151e44f0433ccc88 ReplicaSet Normal SuccessfulDelete replicaset-controller Deleted pod: azure-dl-2914933029-stvh2\r\n",
"26m 26m 1 azure-dl-2914933029.151e450845a59cd5 ReplicaSet Normal SuccessfulCreate replicaset-controller Created pod: azure-dl-2914933029-tlnmp\r\n",
"12m 12m 1 azure-dl-2914933029.151e45c61b7fd5a6 ReplicaSet Normal SuccessfulDelete replicaset-controller Deleted pod: azure-dl-2914933029-tlnmp\r\n",
"11m 11m 1 azure-dl-3880299103-jsn4n.151e45d4627d7eaf Pod Normal Scheduled default-scheduler Successfully assigned azure-dl-3880299103-jsn4n to aks-nodepool1-27496346-0\r\n",
"11m 11m 1 azure-dl-3880299103-jsn4n.151e45d469b24e70 Pod Normal SuccessfulMountVolume kubelet, aks-nodepool1-27496346-0 MountVolume.SetUp succeeded for volume \"bin\" \r\n",
"11m 11m 1 azure-dl-3880299103-jsn4n.151e45d469b4f3ca Pod Normal SuccessfulMountVolume kubelet, aks-nodepool1-27496346-0 MountVolume.SetUp succeeded for volume \"lib\" \r\n",
"11m 11m 1 azure-dl-3880299103-jsn4n.151e45d469cd4be1 Pod Normal SuccessfulMountVolume kubelet, aks-nodepool1-27496346-0 MountVolume.SetUp succeeded for volume \"libcuda\" \r\n",
"11m 11m 1 azure-dl-3880299103-jsn4n.151e45d46a16cf32 Pod Normal SuccessfulMountVolume kubelet, aks-nodepool1-27496346-0 MountVolume.SetUp succeeded for volume \"default-token-hnhd0\" \r\n",
"11m 11m 1 azure-dl-3880299103-jsn4n.151e45d4bceb6016 Pod spec.containers{azure-dl} Normal Pulling kubelet, aks-nodepool1-27496346-0 pulling image \"masalvar/tfresnet-gpu\"\r\n",
"9m 9m 1 azure-dl-3880299103-jsn4n.151e45f3d0873cbd Pod spec.containers{azure-dl} Normal Pulled kubelet, aks-nodepool1-27496346-0 Successfully pulled image \"masalvar/tfresnet-gpu\"\r\n",
"9m 9m 1 azure-dl-3880299103-jsn4n.151e45f3de5e2664 Pod spec.containers{azure-dl} Normal Created kubelet, aks-nodepool1-27496346-0 Created container\r\n",
"9m 9m 1 azure-dl-3880299103-jsn4n.151e45f3e7cf4868 Pod spec.containers{azure-dl} Normal Started kubelet, aks-nodepool1-27496346-0 Started container\r\n",
"11m 11m 1 azure-dl-3880299103.151e45d461b5dd14 ReplicaSet Normal SuccessfulCreate replicaset-controller Created pod: azure-dl-3880299103-jsn4n\r\n",
"31m 31m 1 azure-dl.151e44c4809ef854 Deployment Normal ScalingReplicaSet deployment-controller Scaled up replica set azure-dl-2914933029 to 1\r\n",
"31m 31m 1 azure-dl.151e44c4846c4ff9 Service Normal CreatingLoadBalancer service-controller Creating load balancer\r\n",
"28m 28m 1 azure-dl.151e44e49cf53e12 Service Normal CreatedLoadBalancer service-controller Created load balancer\r\n",
"27m 27m 1 azure-dl.151e44f042cb9ca1 Deployment Normal ScalingReplicaSet deployment-controller Scaled down replica set azure-dl-2914933029 to 0\r\n",
"27m 27m 1 azure-dl.151e44f10ce12363 Service Normal DeletingLoadBalancer service-controller Deleting load balancer\r\n",
"26m 26m 1 azure-dl.151e450843800b94 Deployment Normal ScalingReplicaSet deployment-controller Scaled up replica set azure-dl-2914933029 to 1\r\n",
"25m 25m 1 azure-dl.151e450fe1af53b3 Service Normal DeletedLoadBalancer service-controller Deleted load balancer\r\n",
"25m 25m 1 azure-dl.151e450fe1afa42b Service Normal CreatingLoadBalancer service-controller Creating load balancer\r\n",
"22m 22m 1 azure-dl.151e45376d5add13 Service Normal CreatedLoadBalancer service-controller Created load balancer\r\n",
"12m 12m 1 azure-dl.151e45c61a9bc760 Deployment Normal ScalingReplicaSet deployment-controller Scaled down replica set azure-dl-2914933029 to 0\r\n",
"12m 12m 1 azure-dl.151e45c6e4b8a1b9 Service Normal DeletingLoadBalancer service-controller Deleting load balancer\r\n",
"11m 11m 1 azure-dl.151e45d4604dbd4a Deployment Normal ScalingReplicaSet deployment-controller Scaled up replica set azure-dl-3880299103 to 1\r\n",
"9m 9m 1 azure-dl.151e45ef396adac1 Service Normal DeletedLoadBalancer service-controller Deleted load balancer\r\n",
"9m 9m 1 azure-dl.151e45ef396b3881 Service Normal CreatingLoadBalancer service-controller Creating load balancer\r\n",
"7m 7m 1 azure-dl.151e460b74c777ad Service Normal CreatedLoadBalancer service-controller Created load balancer\r\n"
"9m 9m 1 aks-nodepool1-27496346-0.1520fb005972710f Node Normal Starting kubelet, aks-nodepool1-27496346-0 Starting kubelet.\r\n",
"7m 8m 3 aks-nodepool1-27496346-0.1520fb08e765af3d Node Normal NodeHasSufficientDisk kubelet, aks-nodepool1-27496346-0 Node aks-nodepool1-27496346-0 status is now: NodeHasSufficientDisk\r\n",
"7m 8m 3 aks-nodepool1-27496346-0.1520fb08e7663219 Node Normal NodeHasSufficientMemory kubelet, aks-nodepool1-27496346-0 Node aks-nodepool1-27496346-0 status is now: NodeHasSufficientMemory\r\n",
"7m 8m 3 aks-nodepool1-27496346-0.1520fb08e7665b1e Node Normal NodeHasNoDiskPressure kubelet, aks-nodepool1-27496346-0 Node aks-nodepool1-27496346-0 status is now: NodeHasNoDiskPressure\r\n",
"55s 8m 9 aks-nodepool1-27496346-0.1520fb08e780a4eb Node Warning FailedNodeAllocatableEnforcement kubelet, aks-nodepool1-27496346-0 Failed to update Node Allocatable Limits \"\": failed to set supported cgroup subsystems for cgroup : Failed to set config for supported subsystems : failed to write 59076296704 to memory.limit_in_bytes: write /var/lib/docker/overlay2/daad1bc683430e39749de19537b2702c53db1f36ba866537b8f76687375c368f/merged/sys/fs/cgroup/memory/memory.limit_in_bytes: invalid argument\r\n",
"6m 6m 1 aks-nodepool1-27496346-0.1520fb2740f627b6 Node Normal RegisteredNode controllermanager Node aks-nodepool1-27496346-0 event: Registered Node aks-nodepool1-27496346-0 in NodeController\r\n",
"6m 6m 1 aks-nodepool1-27496346-0.1520fb29877c82d1 Node Normal Starting kube-proxy, aks-nodepool1-27496346-0 Starting kube-proxy.\r\n",
"6m 6m 1 aks-nodepool1-27496346-0.1520fb2d38a1c12c Node Normal NodeReady kubelet, aks-nodepool1-27496346-0 Node aks-nodepool1-27496346-0 status is now: NodeReady\r\n",
"4m 4m 1 azure-dl-3880299103-v5mb7.1520fb46d4fdc9fa Pod Normal Scheduled default-scheduler Successfully assigned azure-dl-3880299103-v5mb7 to aks-nodepool1-27496346-0\r\n",
"4m 4m 1 azure-dl-3880299103-v5mb7.1520fb46e1cd117d Pod Normal SuccessfulMountVolume kubelet, aks-nodepool1-27496346-0 MountVolume.SetUp succeeded for volume \"bin\" \r\n",
"4m 4m 1 azure-dl-3880299103-v5mb7.1520fb46e1cf3b05 Pod Normal SuccessfulMountVolume kubelet, aks-nodepool1-27496346-0 MountVolume.SetUp succeeded for volume \"libcuda\" \r\n",
"4m 4m 1 azure-dl-3880299103-v5mb7.1520fb46e1cf86ce Pod Normal SuccessfulMountVolume kubelet, aks-nodepool1-27496346-0 MountVolume.SetUp succeeded for volume \"lib\" \r\n",
"4m 4m 1 azure-dl-3880299103-v5mb7.1520fb46e2516335 Pod Normal SuccessfulMountVolume kubelet, aks-nodepool1-27496346-0 MountVolume.SetUp succeeded for volume \"default-token-thxzk\" \r\n",
"4m 4m 1 azure-dl-3880299103-v5mb7.1520fb47102b3c32 Pod spec.containers{azure-dl} Normal Pulling kubelet, aks-nodepool1-27496346-0 pulling image \"masalvar/tfresnet-gpu\"\r\n",
"1m 1m 1 azure-dl-3880299103-v5mb7.1520fb73ea97742a Pod spec.containers{azure-dl} Normal Pulled kubelet, aks-nodepool1-27496346-0 Successfully pulled image \"masalvar/tfresnet-gpu\"\r\n",
"1m 1m 1 azure-dl-3880299103-v5mb7.1520fb75bccdb1f5 Pod spec.containers{azure-dl} Normal Created kubelet, aks-nodepool1-27496346-0 Created container\r\n",
"1m 1m 1 azure-dl-3880299103-v5mb7.1520fb76711f3dd5 Pod spec.containers{azure-dl} Normal Started kubelet, aks-nodepool1-27496346-0 Started container\r\n",
"4m 4m 1 azure-dl-3880299103.1520fb46d46b36d8 ReplicaSet Normal SuccessfulCreate replicaset-controller Created pod: azure-dl-3880299103-v5mb7\r\n",
"4m 4m 1 azure-dl.1520fb46d294f3d3 Deployment Normal ScalingReplicaSet deployment-controller Scaled up replica set azure-dl-3880299103 to 1\r\n",
"4m 4m 1 azure-dl.1520fb46d8ebdb8a Service Normal CreatingLoadBalancer service-controller Creating load balancer\r\n",
"2m 2m 1 azure-dl.1520fb66b2965ba7 Service Normal CreatedLoadBalancer service-controller Created load balancer\r\n"
]
}
],
@ -744,44 +693,59 @@
},
{
"cell_type": "code",
"execution_count": 55,
"execution_count": 38,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"2018-03-22 14:41:03,137 CRIT Supervisor running as root (no user in config file)\r\n",
"2018-03-22 14:41:03,139 INFO supervisord started with pid 7\r\n",
"2018-03-22 14:41:04,141 INFO spawned: 'program_exit' with pid 17\r\n",
"2018-03-22 14:41:04,143 INFO spawned: 'nginx' with pid 18\r\n",
"2018-03-22 14:41:04,144 INFO spawned: 'gunicorn' with pid 19\r\n",
"2018-03-22 14:41:05,174 INFO success: program_exit entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)\r\n",
"2018-03-22 14:41:09,192 INFO success: nginx entered RUNNING state, process has stayed up for > than 5 seconds (startsecs)\r\n",
"Selected GPU[0] Tesla K80 as the process wide default device.\r\n",
"2018-03-31 10:45:57,344 CRIT Supervisor running as root (no user in config file)\r\n",
"2018-03-31 10:45:57,346 INFO supervisord started with pid 7\r\n",
"2018-03-31 10:45:58,348 INFO spawned: 'program_exit' with pid 15\r\n",
"2018-03-31 10:45:58,349 INFO spawned: 'nginx' with pid 16\r\n",
"2018-03-31 10:45:58,351 INFO spawned: 'gunicorn' with pid 17\r\n",
"2018-03-31 10:45:59,380 INFO success: program_exit entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)\r\n",
"2018-03-31 10:45:59.971916: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA\r\n",
"2018-03-31 10:46:03,977 INFO success: nginx entered RUNNING state, process has stayed up for > than 5 seconds (startsecs)\r\n",
"2018-03-31 10:46:11.453255: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties: \r\n",
"name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235\r\n",
"pciBusID: cff2:00:00.0\r\n",
"totalMemory: 11.17GiB freeMemory: 11.10GiB\r\n",
"2018-03-31 10:46:11.453299: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Tesla K80, pci bus id: cff2:00:00.0, compute capability: 3.7)\r\n",
"INFO:tensorflow:Restoring parameters from resnet_v1_152.ckpt\r\n",
"{\"timestamp\": \"2018-03-31T10:46:17.203847Z\", \"level\": \"INFO\", \"stack_info\": null, \"host\": \"azure-dl-3880299103-v5mb7\", \"message\": \"Restoring parameters from resnet_v1_152.ckpt\", \"logger\": \"tensorflow\", \"msg\": \"Restoring parameters from %s\", \"tags\": [], \"path\": \"/opt/conda/envs/py3.5/lib/python3.5/site-packages/tensorflow/python/platform/tf_logging.py\"}\r\n",
"{\"timestamp\": \"2018-03-31T10:46:19.060001Z\", \"level\": \"INFO\", \"stack_info\": null, \"host\": \"azure-dl-3880299103-v5mb7\", \"message\": \"Model loading time: 19089.38 ms\", \"logger\": \"model_driver\", \"tags\": [], \"path\": \"/code/driver.py\"}\r\n",
"2018-03-31 10:46:19,060 INFO success: gunicorn entered RUNNING state, process has stayed up for > than 20 seconds (startsecs)\r\n",
"Initialising\r\n",
"Model loading time: 13501.91 ms\r\n",
"{\"timestamp\": \"2018-03-22T14:41:18.246347Z\", \"message\": \"Model loading time: 13501.91 ms\", \"host\": \"azure-dl-2914933029-tlnmp\", \"path\": \"/code/driver.py\", \"tags\": [], \"level\": \"INFO\", \"logger\": \"cntk_svc_logger\", \"stack_info\": null}\r\n",
"{\"timestamp\": \"2018-03-22T14:41:18.250653Z\", \"message\": \" * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)\", \"host\": \"azure-dl-2914933029-tlnmp\", \"path\": \"/opt/conda/envs/py3.6/lib/python3.6/site-packages/werkzeug/_internal.py\", \"tags\": [], \"level\": \"INFO\", \"logger\": \"werkzeug\", \"msg\": \" * Running on %s://%s:%d/ %s\", \"stack_info\": null}\r\n",
"2018-03-22 14:41:24,257 INFO success: gunicorn entered RUNNING state, process has stayed up for > than 20 seconds (startsecs)\r\n"
"{\"timestamp\": \"2018-03-31T10:46:19.065300Z\", \"level\": \"INFO\", \"stack_info\": null, \"host\": \"azure-dl-3880299103-v5mb7\", \"message\": \" * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)\", \"logger\": \"werkzeug\", \"msg\": \" * Running on %s://%s:%d/ %s\", \"tags\": [], \"path\": \"/opt/conda/envs/py3.5/lib/python3.5/site-packages/werkzeug/_internal.py\"}\r\n"
]
}
],
"source": [
"!kubectl logs azure-dl-2914933029-tlnmp"
"pod_json = !kubectl get pods -o json\n",
"pod_dict = json.loads(''.join(pod_json))\n",
"!kubectl logs {pod_dict['items'][0]['metadata']['name']}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It can take a few minutes for the service to populate the EXTERNAL-IP field. This will be the IP you use to call the service. You can also specify an IP to use please see the AKS documentation for further details."
]
},
{
"cell_type": "code",
"execution_count": 71,
"execution_count": 39,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE\r\n",
"azure-dl LoadBalancer 10.0.155.14 13.82.238.75 80:30532/TCP 11m\r\n"
"NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE\r\n",
"azure-dl LoadBalancer 10.0.204.221 40.71.172.160 80:32567/TCP 11m\r\n"
]
}
],
@ -789,6 +753,14 @@
"!kubectl get service azure-dl"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now that we have our deployed service we can move onto [testing it](05_TestWebApp.ipynb) \n",
"Below are the instructions to tear everything down once we are done with the cluster"
]
},
{
"cell_type": "markdown",
"metadata": {

Различия файлов скрыты, потому что одна или несколько строк слишком длинны

Различия файлов скрыты, потому что одна или несколько строк слишком длинны

439
06_SpeedTestWebApp.ipynb Normal file

Различия файлов скрыты, потому что одна или несколько строк слишком длинны