Merge branch 'master' into merge-helm
|
@ -1 +1,15 @@
|
|||
# Binaries for programs and plugins
|
||||
*.exe
|
||||
*.dll
|
||||
*.so
|
||||
*.dylib
|
||||
|
||||
# Test binary, build with `go test -c`
|
||||
*.test
|
||||
|
||||
# Output of the go coverage tool, specifically when used with LiteIDE
|
||||
*.out
|
||||
|
||||
# Project-local glide cache, RE: https://github.com/Masterminds/glide/issues/736
|
||||
.glide/
|
||||
.DS_Store
|
|
@ -11,7 +11,7 @@ Here are a subset of pain points that exists in a typical ML workflow.
|
|||
#### A Typical (Simplified) ML Workflow and its Pain Points
|
||||
![Typical Workflow](workflow.png)
|
||||
|
||||
This workshop is going to focus on improving the training process by leveraging containers and Kubernetes.
|
||||
This workshop is going to focus on improving the training and serving process by leveraging containers and Kubernetes.
|
||||
|
||||
Today many data scientists are training their models either on their physical workstation (be it a laptop or a desktop with multiple GPUs) or using a VM (sometime, but rarely, a couple of them) in the cloud.
|
||||
|
||||
|
|
|
@ -69,25 +69,15 @@ We will be creating a deployment in the exercise toward the end of this module,
|
|||
|
||||
## Provisioning a Kubernetes cluster on Azure
|
||||
|
||||
There are multiple ways to provision a Kubernetes (K8s) on Azure:
|
||||
* ACS
|
||||
* AKS
|
||||
* acs-engine
|
||||
We are going to use AKS to create a GPU-enabled Kubernetes cluster.
|
||||
You could also use [acs-engine](https://github.com/Azure/acs-engine) if you prefer, this guide will assume you are using aks.
|
||||
|
||||
AKS is currently still in preview and acs-engine is a bit more complex to setup, so we advice you to create your cluster using ACS.
|
||||
|
||||
We are going to create a Linux-based K8s cluster.
|
||||
You can either create the cluster using the portal, or using Azure-CLI (`az`).
|
||||
|
||||
### A Note on GPUs with Kubernetes
|
||||
|
||||
As of this writing, GPUs are still in preview with ACS.
|
||||
You can deploy an ACS cluster with GPU VMs (such as `Standard_NC6`) in `westus2` or `uksouth` but you should be aware of some pitfalls:
|
||||
* Deploying a GPU cluster takes longer than a CPU cluster (about 10-15 minutes more) because the NVIDIA drivers need to be installed as well.
|
||||
* Since this is a preview, you might hit capacity issues if the location you chose does not have enough GPUs available to accommodate you.
|
||||
As of this writing, GPUs are available for AKS in the `eastus` and `westeurope` regions. If you wants more options you may want to use acs-engine for more flexibility.
|
||||
|
||||
**Unless you are already pretty familiar with docker and Kubernetes, we recommend that you create a cluster with CPU VMs to save some time.**
|
||||
Only module 3 has an exercise which is specific for GPU VMs, all other modules can be followed on either CPU or GPU clusters.
|
||||
Only module 3 has an exercise which is specific for GPU VMs, all other modules can be followed on either CPU or GPU clusters, so if you are on a budget, feel free to create a CPU cluster instead.
|
||||
|
||||
### With the CLI
|
||||
|
||||
|
@ -105,22 +95,23 @@ With:
|
|||
|
||||
#### Creating the cluster
|
||||
```console
|
||||
az acs create --agent-vm-size <AGENT_SIZE> --resource-group <RG> --name <NAME>
|
||||
--orchestrator-type Kubernetes --agent-count <AGENT_COUNT>
|
||||
--location <LOCATION> --generate-ssh-keys
|
||||
az aks create --agent-vm-size <AGENT_SIZE> --resource-group <RG> --name <NAME>
|
||||
--agent-count <AGENT_COUNT> --kubernetes-version 1.9.6 --location <LOCATION> --generate-ssh-keys
|
||||
```
|
||||
|
||||
> Note : The kubernetes verion could change depending where you are deploying your cluster. You can get more informations running the `az aks get-versions` command.
|
||||
|
||||
With:
|
||||
|
||||
| Parameter | Description |
|
||||
| --- | --- |
|
||||
| AGENT_SIZE | The size of K8s's agent VM. `Standard_D2_v2` is enough for this workshop. |
|
||||
| AGENT_SIZE | The size of K8s's agent VM. Choose `Standard_NC6` for GPUs or `Standard_D2_v2` if you just want CPUs. |
|
||||
| RG | Name of the resource group that was created in the previous step. |
|
||||
| NAME | Name of the ACS resource (can be whatever you want). |
|
||||
| AGENT_COUNT | The number of agents (virtual machines) that you want in your cluster. 2 or 3 is recommended to play with hyper-parameter tuning and distributed TensorFlow |
|
||||
| LOCATION | Same location that was specified for the resource group creation. |
|
||||
|
||||
The command should take a few minutes to complete (longer if you chose GPU VMs). Once it is done, the output should be a JSON object indicating among other things the `provisioningState`:
|
||||
The command should take a few minutes to complete. Once it is done, the output should be a JSON object indicating among other things the `provisioningState`:
|
||||
```
|
||||
{
|
||||
[...]
|
||||
|
@ -135,7 +126,7 @@ The `kubeconfig` file is a configuration file that will allow Kubernetes's CLI (
|
|||
To download the `kubeconfig` file from the cluster we just created, run:
|
||||
|
||||
```console
|
||||
az acs kubernetes get-credentials --name <NAME> --resource-group <RG>
|
||||
az aks get-credentials --name <NAME> --resource-group <RG>
|
||||
```
|
||||
|
||||
Where `NAME` and `RG` should be the same values as for the cluster creation.
|
||||
|
@ -150,11 +141,10 @@ kubectl get nodes
|
|||
|
||||
Should yield an output similar to this one:
|
||||
```
|
||||
NAME STATUS AGE VERSION
|
||||
k8s-agent-ef2b999d-0 Ready 9d v1.7.7
|
||||
k8s-agent-ef2b999d-1 Ready 9d v1.7.7
|
||||
k8s-agent-ef2b999d-2 Ready 9d v1.7.7
|
||||
k8s-master-ef2b999d-0 Ready 9d v1.7.7
|
||||
NAME STATUS ROLES AGE VERSION
|
||||
aks-nodepool1-42640332-0 Ready agent 1h v1.9.6
|
||||
aks-nodepool1-42640332-1 Ready agent 1h v1.9.6
|
||||
aks-nodepool1-42640332-2 Ready agent 1h v1.9.6
|
||||
```
|
||||
|
||||
If you provisioned GPU VM, describing one of the node should indicate the presence of GPU(s) on the node:
|
||||
|
@ -219,7 +209,7 @@ kubectl get job
|
|||
Should show your new job:
|
||||
|
||||
```bash
|
||||
NAME DESIRED SUCCESSFUL AGE
|
||||
NAME DESIRED SUCCESSFUL AGE
|
||||
module2-ex1 1 0 1m
|
||||
```
|
||||
|
||||
|
@ -267,7 +257,7 @@ kubectl get job
|
|||
```
|
||||
|
||||
```bash
|
||||
NAME DESIRED SUCCESSFUL AGE
|
||||
NAME DESIRED SUCCESSFUL AGE
|
||||
module2-ex1 1 1 3m
|
||||
```
|
||||
|
||||
|
|
299
4-gpus/README.md
|
@ -1,299 +0,0 @@
|
|||
# GPUs And Kubernetes
|
||||
|
||||
## Prerequisites
|
||||
* [1 - Docker Basics](../1-docker)
|
||||
* [2 - Kubernetes Basics and cluster created](../2-kubernetes)
|
||||
|
||||
## Summary
|
||||
|
||||
In this module you will learn how to:
|
||||
* Create a Pod that is using GPU.
|
||||
* Requesting a GPU
|
||||
* Mounting the NVIDIA drivers into the container
|
||||
|
||||
|
||||
## Important Note
|
||||
|
||||
If you created a cluster with CPU VMs only you won't be able to complete the exercises in this module, but it still contains valuable information that you should read through nonetheless.
|
||||
|
||||
## How GPU works with Kubernetes
|
||||
|
||||
GPU support in K8s is still in it's early stage, and as such requires a bit of effort on your part to use.
|
||||
|
||||
While you don't need to do anything to access a CPU from inside your container (except specifying CPU request and limit optionally), getting access to the agent's GPU is a little bit more tricky:
|
||||
* First, the drivers needs to be installed on the agent, otherwise this agent will not report the presence of GPU, and you won't be able to use it (this is already done for you in ACS/AKS/acs-engine).
|
||||
* Then you need to explicitly ask for 1 or multiple GPU(s) to be mounted into your container, otherwise you will simply not be able to access the GPU, even if is running on a GPU agent.
|
||||
* Finally, and most importantly, you need to mount the drivers from the agent VM into your container.
|
||||
|
||||
In Module 5, we will see how this process can be greatly simplified when using TensorFlow with `TFJob`, but for now, let's do it ourselves.
|
||||
|
||||
|
||||
### Creating a container that can benefit from GPU
|
||||
|
||||
As a prerequisite for everything else, it is important to make sure that the container we are going to use actually knows what to do with a GPU.
|
||||
For example TensorFlow needs to be installed with GPU support. CUDA and cuDNN also needs to be present.
|
||||
Thankfully, most deep learning framework provide base images that are ready to use with GPU support, so we can use them as base image.
|
||||
|
||||
For example, TensorFlow has a lot of different images ready to use [https://hub.docker.com/r/tensorflow/tensorflow/tags/](https://hub.docker.com/r/tensorflow/tensorflow/tags/) such as:
|
||||
* `tensorflow/tensorflow:1.4.0-gpu-py3` for GPU
|
||||
* `tensorflow/tensorflow:1.4.0-py3` for CPU only
|
||||
|
||||
CNTK also has pre-built images with or without GPU [https://hub.docker.com/r/microsoft/cntk/tags/](https://hub.docker.com/r/microsoft/cntk/tags/):
|
||||
* `microsoft/cntk:2.2-gpu-python3.5-cuda8.0-cudnn6.0` for GPU
|
||||
* `microsoft/cntk:2.2-python3.5` for CPU only
|
||||
|
||||
Also what's important to note, is that most deep learning frameworks images are built on top of the official [nvidia/cuda][https://hub.docker.com/r/nvidia/cuda/] image, which already comes with CUDA and cuDNN preinstalled, so you don't need to worry about installing them.
|
||||
|
||||
|
||||
### Requesting GPU(s)
|
||||
|
||||
K8s has a concept of resource `requests` and `limits` allowing you to specify how much CPU, RAM and GPU should be reserved for a specific container.
|
||||
By default, if no `limits` is specified for CPU or RAM on a container, K8s will schedule it on any node and run the container with unbounded CPU and memory limits.
|
||||
|
||||
> *To know more on K8s `requests` and `limits`, see [Managing Compute Resources for Containers](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/).*
|
||||
|
||||
However, things are different for GPUs. If no `limit` is defined for GPU, K8s will run the pod on any node (with or without GPU), and will not expose the GPU even if the node has one. So you need to explicitly set the `limit` to the exact number of GPUs that should be assigned to your container.
|
||||
Also, not that while you can request for a fraction of a CPU, you cannot request a fraction of a GPU. One GPU can thus only be assigned to one container at a time.
|
||||
The name for the GPU resource in K8s is `alpha.kubernetes.io/nvidia-gpu` for versions `1.8` and below and `nvidia.com/gpu` for versions > `1.9`. Note that currently only NVIDIA GPUs are supported.
|
||||
|
||||
To set the `limit` for GPU, you should provide a value to `spec.containers[].resources.limits.alpha.kubernetes.io/nvidia-gpu`, in YAML this would looks like:
|
||||
|
||||
```yaml
|
||||
[...]
|
||||
containers:
|
||||
- name: tensorflow
|
||||
image: tensorflow/tensorflow:latest-gpu
|
||||
resources:
|
||||
limits:
|
||||
alpha.kubernetes.io/nvidia-gpu: 1
|
||||
[...]
|
||||
```
|
||||
|
||||
### Exposing the node's drivers into the container
|
||||
|
||||
Now for the tricky part.
|
||||
As stated earlier the NVIDIA drivers needs to be exposed (mounted) from the node into the container. This is a bit tricky since the location of the drivers can vary depending on the operating system of the node, as well as depending on how the drivers were installed.
|
||||
For ACS/AKS/acs-engine only Ubuntu nodes are supported so far, so it should be a consistent experience as long as your cluster was created with one of them.
|
||||
|
||||
##### Drivers locations on the node
|
||||
|
||||
| Path | Purpose
|
||||
|----|----|
|
||||
|`/usr/lib/nvidia-384` | NVIDIA libraries |
|
||||
|`/usr/lib/nvidia-384/bin`| NVIDIA binaries |
|
||||
|`/usr/lib/x86_64-linux-gnu/libcuda.so.1` | CUDA Driver API library |
|
||||
|
||||
> Note that the NVIDIA driver's version is `384` at the time of this writing, but the driver's location will change as the version change.
|
||||
|
||||
For each of the above paths we need to create a corresponding `Volume` and a `VolumeMount` to expose them into our container.
|
||||
|
||||
> To understand how to configure `Volumes` and `VolumeMounts` take a look at [Volumes](https://kubernetes.io/docs/user-guide/walkthrough/#volumes) on the Kubernetes documentation.
|
||||
|
||||
## Exercises
|
||||
|
||||
### 1. NVIDIA-SMI
|
||||
In this first exercise we are simply going to schedule a `Job` that will run `nvidia-smi`, printing details about our GPU from inside the container and exit.
|
||||
You don't need to build a custom image, instead, simply use the official `nvidia/cuda` docker image.
|
||||
|
||||
Your K8s YAML template should have the following characteristics:
|
||||
* It should be a `Job`
|
||||
* It should be name `module4-ex1`
|
||||
* It should request 1 GPU
|
||||
* It should mount the drivers from the node into the container
|
||||
* It should run the `nvidia-smi` executable
|
||||
|
||||
#### Useful Links
|
||||
* [Microsoft Azure Container Service Engine - Using GPUs with Kubernetes](https://github.com/Azure/acs-engine/blob/master/docs/kubernetes/gpu.md)
|
||||
|
||||
#### Validation
|
||||
|
||||
Once you have created your Job with `kubectl create -f <template-path>:
|
||||
|
||||
```console
|
||||
kubectl get pods -a
|
||||
```
|
||||
The `-a` arguments tells K8s to also report pods that are already completed. Since the container exits as soon as you nvidia-smi finishes executing, it might already be completed by the tome you execute the command.
|
||||
|
||||
```bash
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
module4-ex1-p40vx 0/1 Completed 0 20s
|
||||
```
|
||||
|
||||
Let's look at the logs of our pod
|
||||
|
||||
```console
|
||||
kubectl logs <pod-name>
|
||||
```
|
||||
```bash
|
||||
Wed Nov 29 23:43:03 2017
|
||||
+-----------------------------------------------------------------------------+
|
||||
| NVIDIA-SMI 384.98 Driver Version: 384.98 |
|
||||
|-------------------------------+----------------------+----------------------+
|
||||
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
|
||||
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|
||||
|===============================+======================+======================|
|
||||
| 0 Tesla K80 Off | 0000E322:00:00.0 Off | 0 |
|
||||
| N/A 39C P0 70W / 149W | 0MiB / 11439MiB | 0% Default |
|
||||
+-------------------------------+----------------------+----------------------+
|
||||
|
||||
+-----------------------------------------------------------------------------+
|
||||
| Processes: GPU Memory |
|
||||
| GPU PID Type Process name Usage |
|
||||
|=============================================================================|
|
||||
| No running processes found |
|
||||
+-----------------------------------------------------------------------------+
|
||||
```
|
||||
We can see that `nvidia-smi` has successfully detected a Tesla K80 with drivers version `384.98`.
|
||||
|
||||
#### Solution
|
||||
|
||||
<details>
|
||||
<summary><strong>Solution (expand to see)</strong></summary>
|
||||
<p>
|
||||
|
||||
```yaml
|
||||
apiVersion: batch/v1
|
||||
kind: Job # We want a Job
|
||||
metadata:
|
||||
name: 4-nvidia-smi
|
||||
spec:
|
||||
template:
|
||||
metadata:
|
||||
name: module4-ex1
|
||||
spec:
|
||||
restartPolicy: Never
|
||||
volumes: # Where the NVIDIA driver libraries and binaries are located on the host (note that libcuda is not needed to run nvidia-smi)
|
||||
- name: bin
|
||||
hostPath:
|
||||
path: /usr/lib/nvidia-384/bin
|
||||
- name: lib
|
||||
hostPath:
|
||||
path: /usr/lib/nvidia-384
|
||||
containers:
|
||||
- name: nvidia-smi
|
||||
image: nvidia/cuda # Which image to run
|
||||
command:
|
||||
- nvidia-smi
|
||||
resources:
|
||||
limits:
|
||||
alpha.kubernetes.io/nvidia-gpu: 1 # Requesting 1 GPU
|
||||
volumeMounts: # Where the NVIDIA driver libraries and binaries should be mounted inside our container
|
||||
- name: bin
|
||||
mountPath: /usr/local/nvidia/bin
|
||||
- name: lib
|
||||
mountPath: /usr/local/nvidia/lib64
|
||||
```
|
||||
</p>
|
||||
</details>
|
||||
|
||||
### 2. Running TensorFlow with GPU
|
||||
|
||||
In module 1 and 2, we first created a Docker image for our MNIST classifier and then ran a training on Kubernetes.
|
||||
However, this training only used CPU. Let's make things much faster by accelerating our training with GPU.
|
||||
|
||||
You'll find the code and the `Dockerfile` under [`./src`](./src).
|
||||
|
||||
For this exercise, your tasks are to:
|
||||
* Modify our `Dockerfile` to use a base image compatible with GPU, such as `tensorflow/tensorflow:1.4.0-gpu`
|
||||
* Build and push this new image under a new tag, such as `${DOCKER_USERNAME}/tf-mnist:gpu`
|
||||
* Modify the [template we built in module 2](2-kubernetes/training.yaml) to add a GPU `limit` and mount the drivers libraries.
|
||||
* Deploy this new template.
|
||||
|
||||
### Validation
|
||||
|
||||
Once you deployed your template, take a look at the logs of your pod:
|
||||
|
||||
```console
|
||||
kubectl logs <pod-name>
|
||||
```
|
||||
And you should see that your GPU is correctly detected and used by TensorFlow ( `[...] Found device 0 with properties: name: Tesla K80 [...]`)
|
||||
|
||||
```bash
|
||||
2017-11-30 00:59:54.053227: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
|
||||
2017-11-30 01:00:03.274198: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties:
|
||||
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
|
||||
pciBusID: b2de:00:00.0
|
||||
totalMemory: 11.17GiB freeMemory: 11.10GiB
|
||||
2017-11-30 01:00:03.274238: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Tesla K80, pci bus id: b2de:00:00.0, compute capability: 3.7)
|
||||
2017-11-30 01:00:08.000884: I tensorflow/stream_executor/dso_loader.cc:139] successfully opened CUDA library libcupti.so.8.0 locally
|
||||
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
|
||||
Extracting /tmp/tensorflow/input_data/train-images-idx3-ubyte.gz
|
||||
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
|
||||
Extracting /tmp/tensorflow/input_data/train-labels-idx1-ubyte.gz
|
||||
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
|
||||
Extracting /tmp/tensorflow/input_data/t10k-images-idx3-ubyte.gz
|
||||
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
|
||||
Extracting /tmp/tensorflow/input_data/t10k-labels-idx1-ubyte.gz
|
||||
Accuracy at step 0: 0.1245
|
||||
Accuracy at step 10: 0.6664
|
||||
Accuracy at step 20: 0.8227
|
||||
Accuracy at step 30: 0.8657
|
||||
Accuracy at step 40: 0.8815
|
||||
Accuracy at step 50: 0.892
|
||||
Accuracy at step 60: 0.9068
|
||||
[...]
|
||||
```
|
||||
|
||||
|
||||
### Solution
|
||||
|
||||
|
||||
<details>
|
||||
<summary><strong>Solution (expand to see)</strong></summary>
|
||||
<p>
|
||||
|
||||
First we need to modify the `Dockerfile`.
|
||||
We just need to change the tag of the TensorFlow base image to be one that support GPU:
|
||||
|
||||
```dockerfile
|
||||
FROM tensorflow/tensorflow:1.4.0-gpu
|
||||
COPY main.py /app/main.py
|
||||
|
||||
ENTRYPOINT ["python", "/app/main.py"]
|
||||
```
|
||||
|
||||
Then we can create our Job template:
|
||||
|
||||
```yaml
|
||||
apiVersion: batch/v1
|
||||
kind: Job # Our training should be a Job since it is supposed to terminate at some point
|
||||
metadata:
|
||||
name: module4-ex2 # Name of our job
|
||||
spec:
|
||||
template: # Template of the Pod that is going to be run by the Job
|
||||
metadata:
|
||||
name: mnist-pod # Name of the pod
|
||||
spec:
|
||||
containers: # List of containers that should run inside the pod, in our case there is only one.
|
||||
- name: tensorflow
|
||||
image: wbuchwalter/tf-mnist:gpu # The image to run, you can replace by your own.
|
||||
args: ["--max_steps", "500"] # Optional arguments to pass to our command. By default the command is defined by ENTRYPOINT in the Dockerfile
|
||||
resources:
|
||||
limits:
|
||||
alpha.kubernetes.io/nvidia-gpu: 1
|
||||
volumeMounts: # Where the drivers should be mounted in the container
|
||||
- name: lib
|
||||
mountPath: /usr/local/nvidia/lib64
|
||||
- name: libcuda
|
||||
mountPath: /usr/lib/x86_64-linux-gnu/libcuda.so.1
|
||||
restartPolicy: OnFailure
|
||||
volumes: # Where the drivers are located on the node
|
||||
- name: lib
|
||||
hostPath:
|
||||
path: /usr/lib/nvidia-384
|
||||
- name: libcuda
|
||||
hostPath:
|
||||
path: /usr/lib/x86_64-linux-gnu/libcuda.so.1
|
||||
```
|
||||
|
||||
And deploy it with
|
||||
|
||||
```console
|
||||
kubectl create -f <template-path>
|
||||
```
|
||||
|
||||
</p>
|
||||
</details>
|
||||
|
||||
## Next Step
|
||||
[5 - TFJob](../5-tfjob/README.md)
|
|
@ -1,5 +0,0 @@
|
|||
# Change this image to one that supports GPU
|
||||
FROM tensorflow/tensorflow:1.4.0
|
||||
COPY main.py /app/main.py
|
||||
|
||||
ENTRYPOINT ["python", "/app/main.py"]
|
|
@ -1,212 +0,0 @@
|
|||
# Copyright 2015 The TensorFlow Authors. All Rights Reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the 'License');
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an 'AS IS' BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
# ==============================================================================
|
||||
"""A simple MNIST classifier which displays summaries in TensorBoard.
|
||||
|
||||
This is an unimpressive MNIST model, but it is a good example of using
|
||||
tf.name_scope to make a graph legible in the TensorBoard graph explorer, and of
|
||||
naming summary tags so that they are grouped meaningfully in TensorBoard.
|
||||
|
||||
It demonstrates the functionality of every TensorBoard dashboard.
|
||||
"""
|
||||
from __future__ import absolute_import
|
||||
from __future__ import division
|
||||
from __future__ import print_function
|
||||
|
||||
import argparse
|
||||
import os
|
||||
import sys
|
||||
|
||||
import tensorflow as tf
|
||||
|
||||
from tensorflow.examples.tutorials.mnist import input_data
|
||||
|
||||
FLAGS = None
|
||||
|
||||
|
||||
def train():
|
||||
# Import data
|
||||
mnist = input_data.read_data_sets(FLAGS.data_dir,
|
||||
one_hot=True,
|
||||
fake_data=FLAGS.fake_data)
|
||||
|
||||
# Create a multilayer model.
|
||||
|
||||
# Input placeholders
|
||||
with tf.name_scope('input'):
|
||||
x = tf.placeholder(tf.float32, [None, 784], name='x-input')
|
||||
y_ = tf.placeholder(tf.float32, [None, 10], name='y-input')
|
||||
|
||||
with tf.name_scope('input_reshape'):
|
||||
image_shaped_input = tf.reshape(x, [-1, 28, 28, 1])
|
||||
tf.summary.image('input', image_shaped_input, 10)
|
||||
|
||||
# We can't initialize these variables to 0 - the network will get stuck.
|
||||
def weight_variable(shape):
|
||||
"""Create a weight variable with appropriate initialization."""
|
||||
initial = tf.truncated_normal(shape, stddev=0.1)
|
||||
return tf.Variable(initial)
|
||||
|
||||
def bias_variable(shape):
|
||||
"""Create a bias variable with appropriate initialization."""
|
||||
initial = tf.constant(0.1, shape=shape)
|
||||
return tf.Variable(initial)
|
||||
|
||||
def variable_summaries(var):
|
||||
"""Attach a lot of summaries to a Tensor (for TensorBoard visualization)."""
|
||||
with tf.name_scope('summaries'):
|
||||
mean = tf.reduce_mean(var)
|
||||
tf.summary.scalar('mean', mean)
|
||||
with tf.name_scope('stddev'):
|
||||
stddev = tf.sqrt(tf.reduce_mean(tf.square(var - mean)))
|
||||
tf.summary.scalar('stddev', stddev)
|
||||
tf.summary.scalar('max', tf.reduce_max(var))
|
||||
tf.summary.scalar('min', tf.reduce_min(var))
|
||||
tf.summary.histogram('histogram', var)
|
||||
|
||||
def nn_layer(input_tensor, input_dim, output_dim, layer_name, act=tf.nn.relu):
|
||||
"""Reusable code for making a simple neural net layer.
|
||||
|
||||
It does a matrix multiply, bias add, and then uses ReLU to nonlinearize.
|
||||
It also sets up name scoping so that the resultant graph is easy to read,
|
||||
and adds a number of summary ops.
|
||||
"""
|
||||
# Adding a name scope ensures logical grouping of the layers in the graph.
|
||||
with tf.name_scope(layer_name):
|
||||
# This Variable will hold the state of the weights for the layer
|
||||
with tf.name_scope('weights'):
|
||||
weights = weight_variable([input_dim, output_dim])
|
||||
variable_summaries(weights)
|
||||
with tf.name_scope('biases'):
|
||||
biases = bias_variable([output_dim])
|
||||
variable_summaries(biases)
|
||||
with tf.name_scope('Wx_plus_b'):
|
||||
preactivate = tf.matmul(input_tensor, weights) + biases
|
||||
tf.summary.histogram('pre_activations', preactivate)
|
||||
activations = act(preactivate, name='activation')
|
||||
tf.summary.histogram('activations', activations)
|
||||
return activations
|
||||
|
||||
hidden1 = nn_layer(x, 784, 500, 'layer1')
|
||||
|
||||
with tf.name_scope('dropout'):
|
||||
keep_prob = tf.placeholder(tf.float32)
|
||||
tf.summary.scalar('dropout_keep_probability', keep_prob)
|
||||
dropped = tf.nn.dropout(hidden1, keep_prob)
|
||||
|
||||
# Do not apply softmax activation yet, see below.
|
||||
y = nn_layer(dropped, 500, 10, 'layer2', act=tf.identity)
|
||||
|
||||
with tf.name_scope('cross_entropy'):
|
||||
# The raw formulation of cross-entropy,
|
||||
#
|
||||
# tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(tf.softmax(y)),
|
||||
# reduction_indices=[1]))
|
||||
#
|
||||
# can be numerically unstable.
|
||||
#
|
||||
# So here we use tf.nn.softmax_cross_entropy_with_logits on the
|
||||
# raw outputs of the nn_layer above, and then average across
|
||||
# the batch.
|
||||
diff = tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y)
|
||||
with tf.name_scope('total'):
|
||||
cross_entropy = tf.reduce_mean(diff)
|
||||
tf.summary.scalar('cross_entropy', cross_entropy)
|
||||
|
||||
with tf.name_scope('train'):
|
||||
train_step = tf.train.AdamOptimizer(FLAGS.learning_rate).minimize(
|
||||
cross_entropy)
|
||||
|
||||
with tf.name_scope('accuracy'):
|
||||
with tf.name_scope('correct_prediction'):
|
||||
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
|
||||
with tf.name_scope('accuracy'):
|
||||
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
|
||||
tf.summary.scalar('accuracy', accuracy)
|
||||
|
||||
# Merge all the summaries and write them out to
|
||||
# /tmp/tensorflow/mnist/logs/mnist_with_summaries (by default)
|
||||
merged = tf.summary.merge_all()
|
||||
|
||||
def feed_dict(train):
|
||||
"""Make a TensorFlow feed_dict: maps data onto Tensor placeholders."""
|
||||
if train or FLAGS.fake_data:
|
||||
xs, ys = mnist.train.next_batch(100, fake_data=FLAGS.fake_data)
|
||||
k = FLAGS.dropout
|
||||
else:
|
||||
xs, ys = mnist.test.images, mnist.test.labels
|
||||
k = 1.0
|
||||
return {x: xs, y_: ys, keep_prob: k}
|
||||
|
||||
sess = tf.InteractiveSession()
|
||||
train_writer = tf.summary.FileWriter(FLAGS.log_dir + '/train', sess.graph)
|
||||
test_writer = tf.summary.FileWriter(FLAGS.log_dir + '/test')
|
||||
tf.global_variables_initializer().run()
|
||||
# Train the model, and also write summaries.
|
||||
# Every 10th step, measure test-set accuracy, and write test summaries
|
||||
# All other steps, run train_step on training data, & add training summaries
|
||||
|
||||
|
||||
for i in range(FLAGS.max_steps):
|
||||
if i % 10 == 0: # Record summaries and test-set accuracy
|
||||
summary, acc = sess.run([merged, accuracy], feed_dict=feed_dict(False))
|
||||
test_writer.add_summary(summary, i)
|
||||
print('Accuracy at step %s: %s' % (i, acc))
|
||||
else: # Record train set summaries, and train
|
||||
if i % 100 == 99: # Record execution stats
|
||||
run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
|
||||
run_metadata = tf.RunMetadata()
|
||||
summary, _ = sess.run([merged, train_step],
|
||||
feed_dict=feed_dict(True),
|
||||
options=run_options,
|
||||
run_metadata=run_metadata)
|
||||
train_writer.add_run_metadata(run_metadata, 'step%03d' % i)
|
||||
train_writer.add_summary(summary, i)
|
||||
print('Adding run metadata for', i)
|
||||
else: # Record a summary
|
||||
summary, _ = sess.run([merged, train_step], feed_dict=feed_dict(True))
|
||||
train_writer.add_summary(summary, i)
|
||||
train_writer.close()
|
||||
test_writer.close()
|
||||
|
||||
|
||||
def main(_):
|
||||
train()
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument('--fake_data', nargs='?', const=True, type=bool,
|
||||
default=False,
|
||||
help='If true, uses fake data for unit testing.')
|
||||
parser.add_argument('--max_steps', type=int, default=1000,
|
||||
help='Number of steps to run trainer.')
|
||||
parser.add_argument('--learning_rate', type=float, default=0.001,
|
||||
help='Initial learning rate')
|
||||
parser.add_argument('--dropout', type=float, default=0.9,
|
||||
help='Keep probability for training dropout.')
|
||||
parser.add_argument(
|
||||
'--data_dir',
|
||||
type=str,
|
||||
default=os.path.join(os.getenv('TEST_TMPDIR', '/tmp'),
|
||||
'tensorflow/input_data'),
|
||||
help='Directory for storing input data')
|
||||
parser.add_argument(
|
||||
'--log_dir',
|
||||
type=str,
|
||||
default=os.path.join(os.getenv('TEST_TMPDIR', '/tmp'),
|
||||
'tensorflow/logs'),
|
||||
help='Summaries log directory')
|
||||
FLAGS, unparsed = parser.parse_known_args()
|
||||
tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
|
До Ширина: | Высота: | Размер: 26 KiB После Ширина: | Высота: | Размер: 26 KiB |
До Ширина: | Высота: | Размер: 112 KiB После Ширина: | Высота: | Размер: 112 KiB |
|
@ -0,0 +1,97 @@
|
|||
# Jupyter Notebooks on Kubernetes
|
||||
|
||||
## Prerequisites
|
||||
* [1 - Docker Basics](../1-docker)
|
||||
* [2 - Kubernetes Basics and cluster created](../2-kubernetes)
|
||||
* [4 - Kubeflow and tfjob Basics](../4-kubeflow-tfjob)
|
||||
|
||||
## Summary
|
||||
|
||||
In this module, you will learn how to:
|
||||
* Run Jupyter Notebooks locally using Docker
|
||||
* Run JupyterHub on Kubernetes using Kubeflow
|
||||
|
||||
## How Jupyter Notebooks work
|
||||
|
||||
The [Jupyter Notebook](http://jupyter.org/) is an open source web application that allows users to create and share documents that contain live code, equations, visualizations, and narrative text for rapid prototyping. It is often used for data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and more. To better support exploratory iteration and to accelerate computation of Tensorflow jobs, let's look at how we can include data science tools like Jupyter Notebook with Docker and Kubernetes.
|
||||
|
||||
## How JupyterHub works
|
||||
|
||||
The [JupyterHub](https://jupyterhub.readthedocs.io/en/latest/) is a multi-user Hub, spawns, manages, and proxies multiple instances of the single-user Jupyter notebook server. JupyterHub can be used to serve notebooks to a class of students, a corporate data science group, or a scientific research group. Let's look at how we can create JupyterHub to spawn multiple instances of Jupyter Notebook on Kubernetes using Kubeflow.
|
||||
|
||||
## Exercises
|
||||
|
||||
### Exercise 1: Run Jupyter Notebooks locally using Docker
|
||||
|
||||
In this first exercise, we will run Jupyter Notebooks locally using Docker. We will use the official tensorflow docker image as it comes with Jupyter notebook.
|
||||
|
||||
```console
|
||||
docker run -it -p 8888:8888 tensorflow/tensorflow
|
||||
```
|
||||
|
||||
#### Validation
|
||||
|
||||
To verify, browse to the url in the output log.
|
||||
|
||||
For example: `http://localhost:8888/?token=a3ea3cd914c5b68149e2b4a6d0220eca186fec41563c0413`
|
||||
|
||||
|
||||
### Exercise 2: Run JupyterHub on Kubernetes using Kubeflow
|
||||
|
||||
In this exercise, we will run JupyterHub to spawn multiple instances of Jupyter Notebooks on a Kubernetes cluster using Kubeflow.
|
||||
|
||||
As a prerequisite, you should already have a Kubernetes cluster running, you can follow [module 2 - Kubernetes](../2-kubernetes) to create your own cluster and you should already have Kubeflow running in your Kubernetes cluster, you can follow [module 4 - Kubeflow and tfjob Basics](../4-kubeflow-tfjob).
|
||||
|
||||
In module 4, you installed the kubeflow-core component, which already includes JupyterHub and a corresponding load balancer service of type `ClusterIP`. To check its status, run the following kubectl command.
|
||||
|
||||
```
|
||||
kubectl get svc -n=${NAMESPACE}
|
||||
|
||||
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
|
||||
...
|
||||
tf-hub-0 ClusterIP None <none> 8000/TCP 1m
|
||||
tf-hub-lb ClusterIP 10.0.40.191 <none> 80/TCP 1m
|
||||
```
|
||||
|
||||
To connect to your JupyterHub locally:
|
||||
|
||||
```
|
||||
PODNAME=`kubectl get pods --namespace=${NAMESPACE} --selector="app=tf-hub" --output=template --template="{{with index .items 0}}{{.metadata.name}}{{end}}"`
|
||||
kubectl port-forward --namespace=${NAMESPACE} $PODNAME 8000:8000
|
||||
```
|
||||
|
||||
[Optional] To connect to your JupyterHub over a public IP:
|
||||
|
||||
To update the default service created for JupyterHub, run the following command to change the service to type LoadBalancer:
|
||||
|
||||
```
|
||||
ks param set kubeflow-core jupyterHubServiceType LoadBalancer
|
||||
ks apply ${YOUR_KF_ENV}
|
||||
```
|
||||
|
||||
Create a new Jupyter Notebook instance:
|
||||
- open http://127.0.0.1:8000 in your browser
|
||||
- log in using any username and password
|
||||
- click the "Start My Server" button to sprawn a new Jupyter notebook
|
||||
- from the image dropdown, select a tensorflow image for your notebook
|
||||
- for CPU and memory, enter values based on your resource requirements, for example: 1 CPU and 2Gi
|
||||
- to get available GPUs in your cluster, run the following command:
|
||||
```
|
||||
kubectl get nodes "-o=custom-columns=NAME:.metadata.name,GPU:.status.allocatable.alpha\.kubernetes\.io\/nvidia-gpu"
|
||||
```
|
||||
- for GPU, enter values in json format `{"alpha.kubernetes.io/nvidia-gpu":"1"}`
|
||||
- click the "Spawn" button
|
||||
|
||||
![jupyterhub](./jupyterhub.png)
|
||||
|
||||
The images are quite large. This process can take a long time.
|
||||
|
||||
#### Validation
|
||||
|
||||
You can check the status of the pod by running:
|
||||
|
||||
```
|
||||
kubectl -n ${NAMESPACE} describe pods jupyter-${USERNAME}
|
||||
```
|
||||
|
||||
After the pod status changes to `running`, to verify you will see a new Jupyter notebook running at: http://127.0.0.1:8000/user/{USERNAME}/tree or http://{PUBLIC-IP}/user/{USERNAME}/tree
|
После Ширина: | Высота: | Размер: 90 KiB |
|
@ -0,0 +1,30 @@
|
|||
# TensorFlow Serving
|
||||
|
||||
## Introduction
|
||||
|
||||
TensorFlow Serving is a flexible, high-performance serving system for machine learning models, designed for production environments. TensorFlow Serving makes it easy to deploy new algorithms and experiments, while keeping the same server architecture and APIs. TensorFlow Serving provides out-of-the-box integration with TensorFlow models, but can be easily extended to serve other types of models and data.
|
||||
|
||||
## Getting started
|
||||
|
||||
## Installation
|
||||
|
||||
```commandline
|
||||
ks init my-model-server
|
||||
cd my-model-server
|
||||
ks registry add kubeflow github.com/kubeflow/kubeflow/tree/master/kubeflow
|
||||
ks pkg install kubeflow/tf-serving
|
||||
ks env add cloud
|
||||
ks env set cloud --namespace ${NAMESPACE}
|
||||
|
||||
MODEL_COMPONENT=serveInception
|
||||
MODEL_NAME=inception
|
||||
#Replace this with the url to your bucket if using your own model
|
||||
MODEL_PATH=gs://kubeflow-models/inception
|
||||
MODEL_SERVER_IMAGE=gcr.io/$(gcloud config get-value project)/model-server:1.0
|
||||
ks generate tf-serving ${MODEL_COMPONENT} --name=${MODEL_NAME}
|
||||
ks param set --env=cloud ${MODEL_COMPONENT} modelPath $MODEL_PATH
|
||||
# If you want to use your custom image.
|
||||
ks param set --env=cloud ${MODEL_COMPONENT} modelServerImage $MODEL_SERVER_IMAGE
|
||||
# If you want to have the http endpoint.
|
||||
ks param set --env=cloud ${MODEL_COMPONENT} deployHttpProxy true
|
||||
```
|
До Ширина: | Высота: | Размер: 219 KiB После Ширина: | Высота: | Размер: 219 KiB |
|
@ -1,197 +0,0 @@
|
|||
# Jupyter Notebooks on Kubernetes
|
||||
|
||||
## Prerequisites
|
||||
* [1 - Docker Basics](../1-docker)
|
||||
* [2 - Kubernetes Basics and cluster created](../2-kubernetes)
|
||||
|
||||
## Summary
|
||||
|
||||
In this module, you will learn how to:
|
||||
* Run Jupyter Notebooks locally using Docker
|
||||
* Run Jupyter Notebooks on Kubernetes
|
||||
|
||||
## How Jupyter Notebooks work
|
||||
|
||||
The [Jupyter Notebook](http://jupyter.org/) is an open source web application that allows users to create and share documents that contain live code, equations, visualizations, and narrative text for rapid prototyping. It is often used for data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and more. To better support exploratory iteration and to accelerate computation of Tensorflow jobs, let's look at how we can include data science tools like Jupyter Notebook with Docker and Kubernetes.
|
||||
|
||||
## Exercises
|
||||
|
||||
### Exercise 1: Run Jupyter Notebooks locally using Docker
|
||||
|
||||
In this first exercise, we will run Jupyter Notebooks locally using Docker. We will use the official tensorflow docker image as it comes with Jupyter notebook.
|
||||
|
||||
```console
|
||||
docker run -it -p 8888:8888 tensorflow/tensorflow
|
||||
```
|
||||
|
||||
#### Validation
|
||||
|
||||
To verify, browse to the url in the output log.
|
||||
|
||||
For example: `http://localhost:8888/?token=a3ea3cd914c5b68149e2b4a6d0220eca186fec41563c0413`
|
||||
|
||||
|
||||
### Exercise 2: Run Jupyter Notebooks on Kubernetes
|
||||
|
||||
In this exercise, we will run Jupyter Notebooks on a Kubernetes cluster.
|
||||
|
||||
As a prerequisite, you should already have a Kubernetes cluster running, you can follow [module 2 - Kubernetes](../2-kubernetes) to create your own cluster.
|
||||
|
||||
Similar to running Jupyter Notebooks locally using Docker, we can again use the official tensorflow docker image as it comes with Jupyter notebook. But here we can run many instances of Jupyter Notebooks in the cluster to handle additional load.
|
||||
|
||||
To run Jupyter Notebook using Kubernetes, you need to:
|
||||
* Create a Pod using tensorflow image
|
||||
* Expose port 8888 to run Jupyter notebook
|
||||
* [With GPU] Mount nvidia libraries from the host VM to a custom directory in the container
|
||||
* Create a Service to run Jupyter Notebook
|
||||
|
||||
#### Solution for Exercise 2
|
||||
|
||||
Create a yaml file like to the one below.
|
||||
|
||||
<details>
|
||||
<summary><strong>Solution for CPU only (expand to see)</strong></summary>
|
||||
<p>
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
labels:
|
||||
app: jupyter-server
|
||||
name: jupyter-server
|
||||
spec:
|
||||
ports:
|
||||
- port: 8888
|
||||
targetPort: 8888
|
||||
selector:
|
||||
app: jupyter-server
|
||||
type: LoadBalancer
|
||||
---
|
||||
apiVersion: extensions/v1beta1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: jupyter-server
|
||||
spec:
|
||||
replicas: 1
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app: jupyter-server
|
||||
spec:
|
||||
containers:
|
||||
- args:
|
||||
image: tensorflow/tensorflow
|
||||
name: jupyter-server
|
||||
ports:
|
||||
- containerPort: 8888
|
||||
```
|
||||
|
||||
</p>
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary><strong>Solution with GPU (expand to see)</strong></summary>
|
||||
<p>
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
labels:
|
||||
app: jupyter-server
|
||||
name: jupyter-server
|
||||
spec:
|
||||
ports:
|
||||
- port: 8888
|
||||
targetPort: 8888
|
||||
selector:
|
||||
app: jupyter-server
|
||||
type: LoadBalancer
|
||||
---
|
||||
apiVersion: extensions/v1beta1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: jupyter-server
|
||||
spec:
|
||||
replicas: 1
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app: jupyter-server
|
||||
spec:
|
||||
containers:
|
||||
- name: jupyter-server
|
||||
image: tensorflow/tensorflow:latest-gpu
|
||||
ports:
|
||||
- containerPort: 8888
|
||||
imagePullPolicy: IfNotPresent
|
||||
env:
|
||||
- name: LD_LIBRARY_PATH
|
||||
value: /usr/lib/nvidia:/usr/lib/x86_64-linux-gnu
|
||||
resources:
|
||||
requests:
|
||||
alpha.kubernetes.io/nvidia-gpu: 1
|
||||
volumeMounts:
|
||||
- mountPath: /usr/local/nvidia/bin
|
||||
name: bin
|
||||
- mountPath: /usr/lib/nvidia
|
||||
name: lib
|
||||
- mountPath: /usr/lib/x86_64-linux-gnu/libcuda.so.1
|
||||
name: libcuda
|
||||
volumes:
|
||||
- name: bin
|
||||
hostPath:
|
||||
path: /usr/lib/nvidia-384/bin
|
||||
- name: lib
|
||||
hostPath:
|
||||
path: /usr/lib/nvidia-384
|
||||
- name: libcuda
|
||||
hostPath:
|
||||
path: /usr/lib/x86_64-linux-gnu/libcuda.so.1
|
||||
```
|
||||
|
||||
</p>
|
||||
</details>
|
||||
|
||||
Save the yaml file, then deploy it to your Kubernetes cluster by running:
|
||||
|
||||
```console
|
||||
kubectl create -f <template-path>
|
||||
```
|
||||
|
||||
#### Validation
|
||||
|
||||
After the deployment is created, a pod running tensorflow will be created, along with a new service for the Jupyter notebook. The new service will acquire a new external ip to run Jupyter Notebook on port 8888. This may take few minutes to complete.
|
||||
|
||||
To verify, run the following to view the output log to get the URL and the token for the hosted Jupyter notebook:
|
||||
|
||||
```console
|
||||
kubectl log jupyter-server-xxxxx
|
||||
|
||||
# sample output
|
||||
|
||||
http://localhost:8888/?token=2e7c875bd4e72137911d33e209c91d01f7a7b44868cf664d
|
||||
|
||||
```
|
||||
|
||||
Next to get the public ip for the new service created for Jupyter Notebook, run:
|
||||
|
||||
```console
|
||||
kubectl get svc jupyter-server -o jsonpath={.status.loadBalancer.ingress[0].ip}
|
||||
|
||||
xx.xx.xx.xx
|
||||
```
|
||||
From a browser, navigate to the Jupyter notebook with the following URL, replace `PUBLICIP` with the output from previous step:
|
||||
|
||||
```
|
||||
http://<PUBLICIP>:8888/?token=2e7c875bd4e72137911d33e209c91d01f7a7b44868cf664d
|
||||
```
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
404
LICENSE
|
@ -1,21 +1,391 @@
|
|||
MIT License
|
||||
Creative Commons Corporation ("Creative Commons") is not a law firm and
|
||||
does not provide legal services or legal advice. Distribution of
|
||||
Creative Commons public licenses does not create a lawyer-client or
|
||||
other relationship. Creative Commons makes its licenses and related
|
||||
information available on an "as-is" basis. Creative Commons gives no
|
||||
warranties regarding its licenses, any material licensed under their
|
||||
terms and conditions, or any related information. Creative Commons
|
||||
disclaims all liability for damages resulting from their use to the
|
||||
fullest extent possible.
|
||||
|
||||
Copyright (c) 2017 Microsoft
|
||||
Using Creative Commons Public Licenses
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
of this software and associated documentation files (the "Software"), to deal
|
||||
in the Software without restriction, including without limitation the rights
|
||||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
copies of the Software, and to permit persons to whom the Software is
|
||||
furnished to do so, subject to the following conditions:
|
||||
Creative Commons public licenses provide a standard set of terms and
|
||||
conditions that creators and other rights holders may use to share
|
||||
original works of authorship and other material subject to copyright
|
||||
and certain other rights specified in the public license below. The
|
||||
following considerations are for informational purposes only, are not
|
||||
exhaustive, and do not form part of our licenses.
|
||||
|
||||
The above copyright notice and this permission notice shall be included in all
|
||||
copies or substantial portions of the Software.
|
||||
Considerations for licensors: Our public licenses are
|
||||
intended for use by those authorized to give the public
|
||||
permission to use material in ways otherwise restricted by
|
||||
copyright and certain other rights. Our licenses are
|
||||
irrevocable. Licensors should read and understand the terms
|
||||
and conditions of the license they choose before applying it.
|
||||
Licensors should also secure all rights necessary before
|
||||
applying our licenses so that the public can reuse the
|
||||
material as expected. Licensors should clearly mark any
|
||||
material not subject to the license. This includes other CC-
|
||||
licensed material, or material used under an exception or
|
||||
limitation to copyright. More considerations for licensors:
|
||||
wiki.creativecommons.org/Considerations_for_licensors
|
||||
|
||||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
SOFTWARE.
|
||||
Considerations for the public: By using one of our public
|
||||
licenses, a licensor grants the public permission to use the
|
||||
licensed material under specified terms and conditions. If
|
||||
the licensor's permission is not necessary for any reason--for
|
||||
example, because of any applicable exception or limitation to
|
||||
copyright--then that use is not regulated by the license. Our
|
||||
licenses grant only permissions under copyright and certain
|
||||
other rights that a licensor has authority to grant. Use of
|
||||
the licensed material may still be restricted for other
|
||||
reasons, including because others have copyright or other
|
||||
rights in the material. A licensor may make special requests,
|
||||
such as asking that all changes be marked or described.
|
||||
Although not required by our licenses, you are encouraged to
|
||||
respect those requests where reasonable. More_considerations
|
||||
for the public:
|
||||
wiki.creativecommons.org/Considerations_for_licensees
|
||||
|
||||
=======================================================================
|
||||
|
||||
Creative Commons Attribution 4.0 International Public License
|
||||
|
||||
By exercising the Licensed Rights (defined below), You accept and agree
|
||||
to be bound by the terms and conditions of this Creative Commons
|
||||
Attribution 4.0 International Public License ("Public License"). To the
|
||||
extent this Public License may be interpreted as a contract, You are
|
||||
granted the Licensed Rights in consideration of Your acceptance of
|
||||
these terms and conditions, and the Licensor grants You such rights in
|
||||
consideration of benefits the Licensor receives from making the
|
||||
Licensed Material available under these terms and conditions.
|
||||
|
||||
|
||||
Section 1 -- Definitions.
|
||||
|
||||
a. Adapted Material means material subject to Copyright and Similar
|
||||
Rights that is derived from or based upon the Licensed Material
|
||||
and in which the Licensed Material is translated, altered,
|
||||
arranged, transformed, or otherwise modified in a manner requiring
|
||||
permission under the Copyright and Similar Rights held by the
|
||||
Licensor. For purposes of this Public License, where the Licensed
|
||||
Material is a musical work, performance, or sound recording,
|
||||
Adapted Material is always produced where the Licensed Material is
|
||||
synched in timed relation with a moving image.
|
||||
|
||||
b. Adapter's License means the license You apply to Your Copyright
|
||||
and Similar Rights in Your contributions to Adapted Material in
|
||||
accordance with the terms and conditions of this Public License.
|
||||
|
||||
c. Copyright and Similar Rights means copyright and/or similar rights
|
||||
closely related to copyright including, without limitation,
|
||||
performance, broadcast, sound recording, and Sui Generis Database
|
||||
Rights, without regard to how the rights are labeled or
|
||||
categorized. For purposes of this Public License, the rights
|
||||
specified in Section 2(b)(1)-(2) are not Copyright and Similar
|
||||
Rights.
|
||||
|
||||
d. Effective Technological Measures means those measures that, in the
|
||||
absence of proper authority, may not be circumvented under laws
|
||||
fulfilling obligations under Article 11 of the WIPO Copyright
|
||||
Treaty adopted on December 20, 1996, and/or similar international
|
||||
agreements.
|
||||
|
||||
e. Exceptions and Limitations means fair use, fair dealing, and/or
|
||||
any other exception or limitation to Copyright and Similar Rights
|
||||
that applies to Your use of the Licensed Material.
|
||||
|
||||
f. Licensed Material means the artistic or literary work, database,
|
||||
or other material to which the Licensor applied this Public
|
||||
License.
|
||||
|
||||
g. Licensed Rights means the rights granted to You subject to the
|
||||
terms and conditions of this Public License, which are limited to
|
||||
all Copyright and Similar Rights that apply to Your use of the
|
||||
Licensed Material and that the Licensor has authority to license.
|
||||
|
||||
h. Licensor means the individual(s) or entity(ies) granting rights
|
||||
under this Public License.
|
||||
|
||||
i. Share means to provide material to the public by any means or
|
||||
process that requires permission under the Licensed Rights, such
|
||||
as reproduction, public display, public performance, distribution,
|
||||
dissemination, communication, or importation, and to make material
|
||||
available to the public including in ways that members of the
|
||||
public may access the material from a place and at a time
|
||||
individually chosen by them.
|
||||
|
||||
j. Sui Generis Database Rights means rights other than copyright
|
||||
resulting from Directive 96/9/EC of the European Parliament and of
|
||||
the Council of 11 March 1996 on the legal protection of databases,
|
||||
as amended and/or succeeded, as well as other essentially
|
||||
equivalent rights anywhere in the world.
|
||||
|
||||
k. You means the individual or entity exercising the Licensed Rights
|
||||
under this Public License. Your has a corresponding meaning.
|
||||
|
||||
|
||||
Section 2 -- Scope.
|
||||
|
||||
a. License grant.
|
||||
|
||||
1. Subject to the terms and conditions of this Public License,
|
||||
the Licensor hereby grants You a worldwide, royalty-free,
|
||||
non-sublicensable, non-exclusive, irrevocable license to
|
||||
exercise the Licensed Rights in the Licensed Material to:
|
||||
|
||||
a. reproduce and Share the Licensed Material, in whole or
|
||||
in part; and
|
||||
|
||||
b. produce, reproduce, and Share Adapted Material.
|
||||
|
||||
2. Exceptions and Limitations. For the avoidance of doubt, where
|
||||
Exceptions and Limitations apply to Your use, this Public
|
||||
License does not apply, and You do not need to comply with
|
||||
its terms and conditions.
|
||||
|
||||
3. Term. The term of this Public License is specified in Section
|
||||
6(a).
|
||||
|
||||
4. Media and formats; technical modifications allowed. The
|
||||
Licensor authorizes You to exercise the Licensed Rights in
|
||||
all media and formats whether now known or hereafter created,
|
||||
and to make technical modifications necessary to do so. The
|
||||
Licensor waives and/or agrees not to assert any right or
|
||||
authority to forbid You from making technical modifications
|
||||
necessary to exercise the Licensed Rights, including
|
||||
technical modifications necessary to circumvent Effective
|
||||
Technological Measures. For purposes of this Public License,
|
||||
simply making modifications authorized by this Section 2(a)
|
||||
(4) never produces Adapted Material.
|
||||
|
||||
5. Downstream recipients.
|
||||
|
||||
a. Offer from the Licensor -- Licensed Material. Every
|
||||
recipient of the Licensed Material automatically
|
||||
receives an offer from the Licensor to exercise the
|
||||
Licensed Rights under the terms and conditions of this
|
||||
Public License.
|
||||
|
||||
b. No downstream restrictions. You may not offer or impose
|
||||
any additional or different terms or conditions on, or
|
||||
apply any Effective Technological Measures to, the
|
||||
Licensed Material if doing so restricts exercise of the
|
||||
Licensed Rights by any recipient of the Licensed
|
||||
Material.
|
||||
|
||||
6. No endorsement. Nothing in this Public License constitutes or
|
||||
may be construed as permission to assert or imply that You
|
||||
are, or that Your use of the Licensed Material is, connected
|
||||
with, or sponsored, endorsed, or granted official status by,
|
||||
the Licensor or others designated to receive attribution as
|
||||
provided in Section 3(a)(1)(A)(i).
|
||||
|
||||
b. Other rights.
|
||||
|
||||
1. Moral rights, such as the right of integrity, are not
|
||||
licensed under this Public License, nor are publicity,
|
||||
privacy, and/or other similar personality rights; however, to
|
||||
the extent possible, the Licensor waives and/or agrees not to
|
||||
assert any such rights held by the Licensor to the limited
|
||||
extent necessary to allow You to exercise the Licensed
|
||||
Rights, but not otherwise.
|
||||
|
||||
2. Patent and trademark rights are not licensed under this
|
||||
Public License.
|
||||
|
||||
3. To the extent possible, the Licensor waives any right to
|
||||
collect royalties from You for the exercise of the Licensed
|
||||
Rights, whether directly or through a collecting society
|
||||
under any voluntary or waivable statutory or compulsory
|
||||
licensing scheme. In all other cases the Licensor expressly
|
||||
reserves any right to collect such royalties.
|
||||
|
||||
|
||||
Section 3 -- License Conditions.
|
||||
|
||||
Your exercise of the Licensed Rights is expressly made subject to the
|
||||
following conditions.
|
||||
|
||||
a. Attribution.
|
||||
|
||||
1. If You Share the Licensed Material (including in modified
|
||||
form), You must:
|
||||
|
||||
a. retain the following if it is supplied by the Licensor
|
||||
with the Licensed Material:
|
||||
|
||||
i. identification of the creator(s) of the Licensed
|
||||
Material and any others designated to receive
|
||||
attribution, in any reasonable manner requested by
|
||||
the Licensor (including by pseudonym if
|
||||
designated);
|
||||
|
||||
ii. a copyright notice;
|
||||
|
||||
iii. a notice that refers to this Public License;
|
||||
|
||||
iv. a notice that refers to the disclaimer of
|
||||
warranties;
|
||||
|
||||
v. a URI or hyperlink to the Licensed Material to the
|
||||
extent reasonably practicable;
|
||||
|
||||
b. indicate if You modified the Licensed Material and
|
||||
retain an indication of any previous modifications; and
|
||||
|
||||
c. indicate the Licensed Material is licensed under this
|
||||
Public License, and include the text of, or the URI or
|
||||
hyperlink to, this Public License.
|
||||
|
||||
2. You may satisfy the conditions in Section 3(a)(1) in any
|
||||
reasonable manner based on the medium, means, and context in
|
||||
which You Share the Licensed Material. For example, it may be
|
||||
reasonable to satisfy the conditions by providing a URI or
|
||||
hyperlink to a resource that includes the required
|
||||
information.
|
||||
|
||||
3. If requested by the Licensor, You must remove any of the
|
||||
information required by Section 3(a)(1)(A) to the extent
|
||||
reasonably practicable.
|
||||
|
||||
4. If You Share Adapted Material You produce, the Adapter's
|
||||
License You apply must not prevent recipients of the Adapted
|
||||
Material from complying with this Public License.
|
||||
|
||||
|
||||
Section 4 -- Sui Generis Database Rights.
|
||||
|
||||
Where the Licensed Rights include Sui Generis Database Rights that
|
||||
apply to Your use of the Licensed Material:
|
||||
|
||||
a. for the avoidance of doubt, Section 2(a)(1) grants You the right
|
||||
to extract, reuse, reproduce, and Share all or a substantial
|
||||
portion of the contents of the database;
|
||||
|
||||
b. if You include all or a substantial portion of the database
|
||||
contents in a database in which You have Sui Generis Database
|
||||
Rights, then the database in which You have Sui Generis Database
|
||||
Rights (but not its individual contents) is Adapted Material; and
|
||||
|
||||
c. You must comply with the conditions in Section 3(a) if You Share
|
||||
all or a substantial portion of the contents of the database.
|
||||
|
||||
For the avoidance of doubt, this Section 4 supplements and does not
|
||||
replace Your obligations under this Public License where the Licensed
|
||||
Rights include other Copyright and Similar Rights.
|
||||
|
||||
|
||||
Section 5 -- Disclaimer of Warranties and Limitation of Liability.
|
||||
|
||||
a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE
|
||||
EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS
|
||||
AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF
|
||||
ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS,
|
||||
IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION,
|
||||
WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR
|
||||
PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS,
|
||||
ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT
|
||||
KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT
|
||||
ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU.
|
||||
|
||||
b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE
|
||||
TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION,
|
||||
NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT,
|
||||
INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES,
|
||||
COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR
|
||||
USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN
|
||||
ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR
|
||||
DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR
|
||||
IN PART, THIS LIMITATION MAY NOT APPLY TO YOU.
|
||||
|
||||
c. The disclaimer of warranties and limitation of liability provided
|
||||
above shall be interpreted in a manner that, to the extent
|
||||
possible, most closely approximates an absolute disclaimer and
|
||||
waiver of all liability.
|
||||
|
||||
|
||||
Section 6 -- Term and Termination.
|
||||
|
||||
a. This Public License applies for the term of the Copyright and
|
||||
Similar Rights licensed here. However, if You fail to comply with
|
||||
this Public License, then Your rights under this Public License
|
||||
terminate automatically.
|
||||
|
||||
b. Where Your right to use the Licensed Material has terminated under
|
||||
Section 6(a), it reinstates:
|
||||
|
||||
1. automatically as of the date the violation is cured, provided
|
||||
it is cured within 30 days of Your discovery of the
|
||||
violation; or
|
||||
|
||||
2. upon express reinstatement by the Licensor.
|
||||
|
||||
For the avoidance of doubt, this Section 6(b) does not affect any
|
||||
right the Licensor may have to seek remedies for Your violations
|
||||
of this Public License.
|
||||
|
||||
c. For the avoidance of doubt, the Licensor may also offer the
|
||||
Licensed Material under separate terms or conditions or stop
|
||||
distributing the Licensed Material at any time; however, doing so
|
||||
will not terminate this Public License.
|
||||
|
||||
d. Sections 1, 5, 6, 7, and 8 survive termination of this Public
|
||||
License.
|
||||
|
||||
|
||||
Section 7 -- Other Terms and Conditions.
|
||||
|
||||
a. The Licensor shall not be bound by any additional or different
|
||||
terms or conditions communicated by You unless expressly agreed.
|
||||
|
||||
b. Any arrangements, understandings, or agreements regarding the
|
||||
Licensed Material not stated herein are separate from and
|
||||
independent of the terms and conditions of this Public License.
|
||||
|
||||
|
||||
Section 8 -- Interpretation.
|
||||
|
||||
a. For the avoidance of doubt, this Public License does not, and
|
||||
shall not be interpreted to, reduce, limit, restrict, or impose
|
||||
conditions on any use of the Licensed Material that could lawfully
|
||||
be made without permission under this Public License.
|
||||
|
||||
b. To the extent possible, if any provision of this Public License is
|
||||
deemed unenforceable, it shall be automatically reformed to the
|
||||
minimum extent necessary to make it enforceable. If the provision
|
||||
cannot be reformed, it shall be severed from this Public License
|
||||
without affecting the enforceability of the remaining terms and
|
||||
conditions.
|
||||
|
||||
c. No term or condition of this Public License will be waived and no
|
||||
failure to comply consented to unless expressly agreed to by the
|
||||
Licensor.
|
||||
|
||||
d. Nothing in this Public License constitutes or may be interpreted
|
||||
as a limitation upon, or waiver of, any privileges and immunities
|
||||
that apply to the Licensor or You, including from the legal
|
||||
processes of any jurisdiction or authority.
|
||||
|
||||
|
||||
=======================================================================
|
||||
|
||||
Creative Commons is not a party to its public
|
||||
licenses. Notwithstanding, Creative Commons may elect to apply one of
|
||||
its public licenses to material it publishes and in those instances
|
||||
will be considered the “Licensor.” The text of the Creative Commons
|
||||
public licenses is dedicated to the public domain under the CC0 Public
|
||||
Domain Dedication. Except for the limited purpose of indicating that
|
||||
material is shared under a Creative Commons public license or as
|
||||
otherwise permitted by the Creative Commons policies published at
|
||||
creativecommons.org/policies, Creative Commons does not authorize the
|
||||
use of the trademark "Creative Commons" or any other trademark or logo
|
||||
of Creative Commons without its prior written consent including,
|
||||
without limitation, in connection with any unauthorized modifications
|
||||
to any of its public licenses or any other arrangements,
|
||||
understandings, or agreements concerning use of licensed material. For
|
||||
the avoidance of doubt, this paragraph does not form part of the
|
||||
public licenses.
|
||||
|
||||
Creative Commons may be contacted at creativecommons.org.
|
||||
|
|
|
@ -0,0 +1,17 @@
|
|||
The MIT License (MIT)
|
||||
Copyright (c) Microsoft Corporation
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and
|
||||
associated documentation files (the "Software"), to deal in the Software without restriction,
|
||||
including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense,
|
||||
and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so,
|
||||
subject to the following conditions:
|
||||
|
||||
The above copyright notice and this permission notice shall be included in all copies or substantial
|
||||
portions of the Software.
|
||||
|
||||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT
|
||||
NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
|
||||
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
|
||||
WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
|
||||
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
53
README.md
|
@ -1,4 +1,4 @@
|
|||
# Train TensorFlow Models at Scale with Kubernetes on Azure
|
||||
# Labs for Training and Serving TensorFlow Models with Kubernetes and Kubeflow on Azure Container Service (AKS)
|
||||
|
||||
<!-- ## [Learning Objectives](./learningObjectives.md)
|
||||
## [Presentation Content](./presentationContent.md)
|
||||
|
@ -6,16 +6,17 @@
|
|||
|
||||
## Prerequisites
|
||||
|
||||
1. Have a valid Microsoft Azure subscription allowing the creation of an ACS cluster
|
||||
1. Have a valid Microsoft Azure subscription allowing the creation of an AKS cluster
|
||||
1. Docker client installed: [Installing Docker](https://www.docker.com/community-edition)
|
||||
1. Azure-cli (2.0) installed: [Installing the Azure CLI 2.0 | Microsoft Docs](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest)
|
||||
1. Git cli installed: [Installing Git CLI](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git)
|
||||
1. Kubectl installed: [Installing Kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/)
|
||||
1. Helm installed: [Installing Helm CLI](https://docs.helm.sh/using_helm/#from-the-binary-releases) (**Note**: On Windows you can extract the `tar` file using a tool like 7Zip.
|
||||
1. Helm installed: [Installing Helm CLI](https://docs.helm.sh/using_helm/#from-the-binary-releases) (**Note**: On Windows you can extract the `tar` file using a tool like 7Zip.)
|
||||
1. ksonnet installed: [Installing ksonnet CLI](https://ksonnet.io/#get-started)
|
||||
|
||||
Clone this repository somewhere so you can easily access the different source files:
|
||||
```console
|
||||
git clone https://github.com/wbuchwalter/tensorflow-k8s-azure
|
||||
git clone https://github.com/Azure/kubeflow-labs
|
||||
```
|
||||
|
||||
## Content Summary
|
||||
|
@ -26,9 +27,41 @@ git clone https://github.com/wbuchwalter/tensorflow-k8s-azure
|
|||
|1| **[Docker](1-docker)** | Docker and containers 101.|
|
||||
|2| **[Kubernetes](2-kubernetes)** | Kubernetes important concepts overview.|
|
||||
|3| **[Helm](3-helm)** | Introduction to Helm |
|
||||
|4| **[GPUs](4-gpus)** | How to use GPUs with Kubernetes.|
|
||||
|5| **[TFJob](5-tfjob)** | How to use `tensorflow/k8s` and `TFJob` to deploy a simple TensorFlow training.|
|
||||
|6| **[Distributed Tensorflow](6-distributed-tensorflow)** | Going distributed with `TFJob`|
|
||||
|7| **[Hyperparameters Sweep with Helm](7-hyperparam-sweep)** | Using Helm to deploy a large number of training testing different hypothesis, monitoring and comparing them. |
|
||||
|8| **[Going Further](8-going-further)** | Links and resources to go further: Autoscaling, Distributed Storage. |
|
||||
|9| **[Jupyter Notebooks](9-jupyter)** | Easily deploy a Jupyter Notebook instance on Kubernetes. |
|
||||
|4| **[Kubeflow + TFJob](4-kubeflow-tfjob)** | Introduction to Kubeflow. How to use `tensorflow/k8s` and `TFJob` to deploy a simple TensorFlow training.|
|
||||
|5| **[JupyterHub](5-jupyterhub)** | Learn how to run JupyterHub to create and manage Jupyter notebooks using Kubeflow |
|
||||
|6| **[Distributed Tensorflow](6-distributed-tensorflow)** | Learn how to deploy and monitor distributed TensorFlow trainings with `TFJob`|
|
||||
|7| **[Hyperparameters Sweep with Helm](7-hyperparam-sweep)** | Using Helm to deploy a large number of trainings testing different hypothesis, and TensorBoard to monitor and compare the results |
|
||||
|8| **[Serving](8-serving)** | Using TensorFlow Serving to serve predictions |
|
||||
|9| **[Going Further](9-going-further)** | Links and resources to go further: Autoscaling, Distributed Storage etc. |
|
||||
|
||||
|
||||
# Contributing
|
||||
|
||||
This project welcomes contributions and suggestions. Most contributions require you to agree to a
|
||||
Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
|
||||
the rights to use your contribution. For details, visit https://cla.microsoft.com.
|
||||
|
||||
When you submit a pull request, a CLA-bot will automatically determine whether you need to provide
|
||||
a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions
|
||||
provided by the bot. You will only need to do this once across all repos using our CLA.
|
||||
|
||||
This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
|
||||
For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
|
||||
contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.
|
||||
|
||||
# Legal Notices
|
||||
|
||||
Microsoft and any contributors grant you a license to the Microsoft documentation and other content
|
||||
in this repository under the [Creative Commons Attribution 4.0 International Public License](https://creativecommons.org/licenses/by/4.0/legalcode),
|
||||
see the [LICENSE](LICENSE) file, and grant you a license to any code in the repository under the [MIT License](https://opensource.org/licenses/MIT), see the
|
||||
[LICENSE-CODE](LICENSE-CODE) file.
|
||||
|
||||
Microsoft, Windows, Microsoft Azure and/or other Microsoft products and services referenced in the documentation
|
||||
may be either trademarks or registered trademarks of Microsoft in the United States and/or other countries.
|
||||
The licenses for this project do not grant you rights to use any Microsoft names, logos, or trademarks.
|
||||
Microsoft's general trademark guidelines can be found at http://go.microsoft.com/fwlink/?LinkID=254653.
|
||||
|
||||
Privacy information can be found at https://privacy.microsoft.com/en-us/
|
||||
|
||||
Microsoft and any contributors reserve all others rights, whether under their respective copyrights, patents,
|
||||
or trademarks, whether by implication, estoppel or otherwise.
|
||||
|
|