Update Tutorials with new learner APIs.

This commit is contained in:
Yuqing Tang 2017-11-10 15:35:50 -08:00
Родитель 6fc501f615
Коммит 10cb3328f6
24 изменённых файлов: 800 добавлений и 815 удалений

Просмотреть файл

@ -216,7 +216,7 @@
" # class 1 into the vector \"0 1 0\", ...\n",
" class_ind = [Y==class_number for class_number in range(num_classes)]\n",
" Y = np.asarray(np.hstack(class_ind), dtype=np.float32)\n",
" return X, Y "
" return X, Y"
]
},
{
@ -588,8 +588,7 @@
" if not (loss == \"NA\" or error ==\"NA\"):\n",
" plotdata[\"batchsize\"].append(batchsize)\n",
" plotdata[\"loss\"].append(loss)\n",
" plotdata[\"error\"].append(error)\n",
" "
" plotdata[\"error\"].append(error)"
]
},
{
@ -677,7 +676,7 @@
"test_minibatch_size = 25\n",
"features, labels = generate_random_data_sample(test_minibatch_size, input_dim, num_output_classes)\n",
"\n",
"trainer.test_minibatch({feature : features, label : labels}) "
"trainer.test_minibatch({feature : features, label : labels})"
]
},
{
@ -821,7 +820,7 @@
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
"version": 3.0
},
"file_extension": ".py",
"mimetype": "text/x-python",
@ -832,5 +831,5 @@
}
},
"nbformat": 4,
"nbformat_minor": 1
}
"nbformat_minor": 0
}

Просмотреть файл

@ -562,7 +562,7 @@
"source": [
"# Instantiate the trainer object to drive the model training\n",
"learning_rate = 0.5\n",
"lr_schedule = C.learning_rate_schedule(learning_rate, C.UnitType.minibatch) \n",
"lr_schedule = C.learning_parameter_schedule(learning_rate) \n",
"learner = C.sgd(z.parameters, lr_schedule)\n",
"trainer = C.Trainer(z, (loss, eval_error), [learner])"
]
@ -885,7 +885,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
"version": "3.5.4"
}
},
"nbformat": 4,

Просмотреть файл

@ -4,9 +4,7 @@
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true,
"deletable": true,
"editable": true
"collapsed": true
},
"outputs": [],
"source": [
@ -16,8 +14,6 @@
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true,
"nbpresent": {
"id": "29b9bd1d-766f-4422-ad96-de0accc1ce58"
}
@ -38,11 +34,7 @@
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"metadata": {},
"outputs": [
{
"data": {
@ -65,10 +57,7 @@
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"metadata": {},
"source": [
"**Goal**:\n",
"Our goal is to train a classifier that will identify the digits in the MNIST dataset. \n",
@ -83,10 +72,7 @@
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"metadata": {},
"source": [
"## Logistic Regression\n",
"[Logistic Regression](https://en.wikipedia.org/wiki/Logistic_regression) (LR) is a fundamental machine learning technique that uses a linear weighted combination of features and generates probability-based predictions of different classes. \n",
@ -98,10 +84,7 @@
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"metadata": {},
"source": [
"In **Binary Logistic Regression** (see top of figure above), the input features are each scaled by an associated weight and summed together. The sum is passed through a squashing (aka activation) function and generates an output in [0,1]. This output value (which can be thought of as a probability) is then compared with a threshold (such as 0.5) to produce a binary label (0 or 1). This technique supports only classification problems with two output classes, hence the name binary LR. In the binary LR example shown above, the [sigmoid][] function is used as the squashing function.\n",
"\n",
@ -110,10 +93,7 @@
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"metadata": {},
"source": [
"In **Multinomial Linear Regression** (see bottom of figure above), 2 or more output nodes are used, one for each output class to be predicted. Each summation node uses its own set of weights to scale the input features and sum them together. Instead of passing the summed output of the weighted input features through a sigmoid squashing function, the output is often passed through a [softmax][] function (which in addition to squashing, like the sigmoid, the softmax normalizes each nodes' output value using the sum of all unnormalized nodes). (Details in the context of MNIST image to follow)\n",
"\n",
@ -127,8 +107,6 @@
"execution_count": 3,
"metadata": {
"collapsed": true,
"deletable": true,
"editable": true,
"nbpresent": {
"id": "138d1a78-02e2-4bd6-a20e-07b83f303563"
}
@ -153,10 +131,7 @@
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"metadata": {},
"source": [
"## Initialization"
]
@ -165,9 +140,7 @@
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": true,
"deletable": true,
"editable": true
"collapsed": true
},
"outputs": [],
"source": [
@ -178,10 +151,7 @@
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"metadata": {},
"source": [
"## Data reading\n",
"\n",
@ -199,9 +169,7 @@
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": true,
"deletable": true,
"editable": true
"collapsed": true
},
"outputs": [],
"source": [
@ -220,11 +188,7 @@
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"metadata": {},
"outputs": [
{
"name": "stdout",
@ -255,10 +219,7 @@
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"metadata": {},
"source": [
"## Model Creation\n",
"\n",
@ -272,10 +233,7 @@
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"metadata": {},
"source": [
"The first step is to compute the evidence for an observation. \n",
"\n",
@ -288,10 +246,7 @@
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"metadata": {},
"source": [
"Network input and output: \n",
"- **input** variable (a key CNTK concept): \n",
@ -305,9 +260,7 @@
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": true,
"deletable": true,
"editable": true
"collapsed": true
},
"outputs": [],
"source": [
@ -317,10 +270,7 @@
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"metadata": {},
"source": [
"## Logistic Regression network setup\n",
"\n",
@ -331,9 +281,7 @@
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": true,
"deletable": true,
"editable": true
"collapsed": true
},
"outputs": [],
"source": [
@ -345,10 +293,7 @@
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"metadata": {},
"source": [
"`z` will be used to represent the output of a network."
]
@ -357,9 +302,7 @@
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": true,
"deletable": true,
"editable": true
"collapsed": true
},
"outputs": [],
"source": [
@ -369,10 +312,7 @@
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"metadata": {},
"source": [
"### Learning model parameters\n",
"\n",
@ -381,10 +321,7 @@
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"metadata": {},
"source": [
"## Training\n",
"\n",
@ -395,9 +332,7 @@
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": true,
"deletable": true,
"editable": true
"collapsed": true
},
"outputs": [],
"source": [
@ -406,10 +341,7 @@
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"metadata": {},
"source": [
"### Evaluation\n",
"\n",
@ -420,9 +352,7 @@
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": true,
"deletable": true,
"editable": true
"collapsed": true
},
"outputs": [],
"source": [
@ -431,10 +361,7 @@
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"metadata": {},
"source": [
"### Configure training\n",
"\n",
@ -452,25 +379,20 @@
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": true,
"deletable": true,
"editable": true
"collapsed": true
},
"outputs": [],
"source": [
"# Instantiate the trainer object to drive the model training\n",
"learning_rate = 0.2\n",
"lr_schedule = C.learning_rate_schedule(learning_rate, C.UnitType.minibatch)\n",
"lr_schedule = C.learning_parameter_schedule(learning_rate)\n",
"learner = C.sgd(z.parameters, lr_schedule)\n",
"trainer = C.Trainer(z, (loss, label_error), [learner])"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"metadata": {},
"source": [
"First let us create some helper functions that will be needed to visualize different functions associated with training."
]
@ -479,9 +401,7 @@
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": true,
"deletable": true,
"editable": true
"collapsed": true
},
"outputs": [],
"source": [
@ -509,10 +429,7 @@
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"metadata": {},
"source": [
"### Run the trainer\n",
"\n",
@ -525,9 +442,7 @@
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": true,
"deletable": true,
"editable": true
"collapsed": true
},
"outputs": [],
"source": [
@ -541,11 +456,7 @@
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"metadata": {},
"outputs": [
{
"name": "stdout",
@ -604,10 +515,7 @@
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"metadata": {},
"source": [
"Let us plot the errors over the different training minibatches. Note that as we iterate the training loss decreases though we do see some intermediate bumps. \n",
"\n",
@ -617,11 +525,7 @@
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"metadata": {},
"outputs": [
{
"data": {
@ -671,10 +575,7 @@
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"metadata": {},
"source": [
"### Run evaluation / Testing \n",
"\n",
@ -684,11 +585,7 @@
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"metadata": {},
"outputs": [
{
"name": "stdout",
@ -731,20 +628,14 @@
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"metadata": {},
"source": [
"Note, this error is very comparable to our training error indicating that our model has good \"out of sample\" error a.k.a. generalization error. This implies that our model can very effectively deal with previously unseen observations (during the training process). This is key to avoid the phenomenon of overfitting."
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"metadata": {},
"source": [
"We have so far been dealing with aggregate measures of error. Let us now get the probabilities associated with individual data points. For each observation, the `eval` function returns the probability distribution across all the classes. The classifier is trained to recognize digits, hence has 10 classes. First let us route the network output through a `softmax` function. This maps the aggregated activations across the network to probabilities across the 10 classes."
]
@ -753,9 +644,7 @@
"cell_type": "code",
"execution_count": 19,
"metadata": {
"collapsed": true,
"deletable": true,
"editable": true
"collapsed": true
},
"outputs": [],
"source": [
@ -764,10 +653,7 @@
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"metadata": {},
"source": [
"Let us a small minibatch sample from the test data."
]
@ -776,9 +662,7 @@
"cell_type": "code",
"execution_count": 20,
"metadata": {
"collapsed": true,
"deletable": true,
"editable": true
"collapsed": true
},
"outputs": [],
"source": [
@ -799,9 +683,7 @@
"cell_type": "code",
"execution_count": 21,
"metadata": {
"collapsed": true,
"deletable": true,
"editable": true
"collapsed": true
},
"outputs": [],
"source": [
@ -813,11 +695,7 @@
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"metadata": {},
"outputs": [
{
"name": "stdout",
@ -835,10 +713,7 @@
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"metadata": {},
"source": [
"Let us visualize some of the results"
]
@ -846,11 +721,7 @@
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"metadata": {},
"outputs": [
{
"name": "stdout",
@ -883,9 +754,7 @@
{
"cell_type": "markdown",
"metadata": {
"collapsed": true,
"deletable": true,
"editable": true
"collapsed": true
},
"source": [
"**Exploration Suggestion**\n",
@ -912,7 +781,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.3"
"version": "3.5.4"
}
},
"nbformat": 4,

Просмотреть файл

@ -374,7 +374,7 @@
"source": [
"# Instantiate the trainer object to drive the model training\n",
"learning_rate = 0.2\n",
"lr_schedule = C.learning_rate_schedule(learning_rate, C.UnitType.minibatch)\n",
"lr_schedule = C.learning_parameter_schedule(learning_rate)\n",
"learner = C.sgd(z.parameters, lr_schedule)\n",
"trainer = C.Trainer(z, (loss, label_error), [learner])"
]
@ -783,7 +783,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
"version": "3.5.4"
}
},
"nbformat": 4,

Просмотреть файл

@ -585,7 +585,7 @@
" \n",
" # Instantiate the trainer object to drive the model training\n",
" learning_rate = 0.2\n",
" lr_schedule = C.learning_rate_schedule(learning_rate, C.UnitType.minibatch)\n",
" lr_schedule = C.learning_parameter_schedule(learning_rate)\n",
" learner = C.sgd(z.parameters, lr_schedule)\n",
" trainer = C.Trainer(z, (loss, label_error), [learner])\n",
" \n",
@ -1071,7 +1071,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
"version": "3.5.4"
}
},
"nbformat": 4,

Просмотреть файл

@ -570,7 +570,7 @@
"z = create_model(input, num_output_classes)\n",
"loss = C.cross_entropy_with_softmax(z, label)\n",
"label_error = C.classification_error(z, label)\n",
"lr_per_minibatch = C.learning_rate_schedule(0.125,C.UnitType.minibatch)\n",
"lr_per_minibatch = C.learning_parameter_schedule(0.125)\n",
"trainer = C.Trainer(z, (loss, label_error), [C.sgd(z.parameters, lr=lr_per_minibatch)])"
]
},
@ -1089,7 +1089,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
"version": "3.5.4"
}
},
"nbformat": 4,

Различия файлов скрыты, потому что одна или несколько строк слишком длинны

Просмотреть файл

@ -359,7 +359,7 @@
"\n",
"# the learning rate\n",
"learning_rate = 0.02\n",
"lr_schedule = C.learning_rate_schedule(learning_rate, C.UnitType.minibatch)\n",
"lr_schedule = C.learning_parameter_schedule(learning_rate)\n",
"\n",
"# loss function\n",
"loss = C.squared_error(z, l)\n",
@ -368,10 +368,10 @@
"error = C.squared_error(z, l)\n",
"\n",
"# use fsadagrad optimizer\n",
"momentum_time_constant = C.momentum_as_time_constant_schedule(BATCH_SIZE / -math.log(0.9)) \n",
"momentum_schedule = C.momentum_schedule(0.9, minibatch_size=BATCH_SIZE)\n",
"learner = C.fsadagrad(z.parameters, \n",
" lr = lr_schedule, \n",
" momentum = momentum_time_constant, \n",
" momentum = momentum_schedule, \n",
" unit_gain = True)\n",
"\n",
"trainer = C.Trainer(z, (loss, error), [learner])"
@ -573,7 +573,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
"version": "3.5.4"
}
},
"nbformat": 4,

Различия файлов скрыты, потому что одна или несколько строк слишком длинны

Различия файлов скрыты, потому что одна или несколько строк слишком длинны

Просмотреть файл

@ -520,15 +520,15 @@
" minibatch_size = 64\n",
"\n",
" # Set training parameters\n",
" lr_per_minibatch = C.learning_rate_schedule([0.01]*10 + [0.003]*10 + [0.001], \n",
" C.UnitType.minibatch, epoch_size)\n",
" momentum_time_constant = C.momentum_as_time_constant_schedule(-minibatch_size/np.log(0.9))\n",
" lr_per_minibatch = C.learning_parameter_schedule([0.01]*10 + [0.003]*10 + [0.001], \n",
" epoch_size)\n",
" momentums = C.momentum_schedule(0.9, minibatch_size)\n",
" l2_reg_weight = 0.001\n",
" \n",
" # trainer object\n",
" learner = C.momentum_sgd(z.parameters, \n",
" lr = lr_per_minibatch, \n",
" momentum = momentum_time_constant, \n",
" momentum = momentums, \n",
" l2_regularization_weight=l2_reg_weight)\n",
" progress_printer = C.logging.ProgressPrinter(tag='Training', num_epochs=max_epochs)\n",
" trainer = C.Trainer(z, (ce, pe), [learner], [progress_printer])\n",
@ -1352,7 +1352,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
"version": "3.5.4"
}
},
"nbformat": 4,

Просмотреть файл

@ -477,17 +477,17 @@
" # do other stuff (e.g. checkpointing, adjust learning rate, etc.)\n",
" lr_per_sample = [3e-4]*4+[1.5e-4]\n",
" lr_per_minibatch = [lr * minibatch_size for lr in lr_per_sample]\n",
" lr_schedule = C.learning_rate_schedule(lr_per_minibatch, C.UnitType.minibatch, epoch_size)\n",
" lr_schedule = C.learning_parameter_schedule(lr_per_minibatch, epoch_size=epoch_size)\n",
" \n",
" # Momentum schedule\n",
" momentum_as_time_constant = C.momentum_as_time_constant_schedule(700)\n",
" momentums = C.momentum_schedule(0.9048374180359595, minibatch_size=minibatch_size)\n",
" \n",
" # We use a the Adam optimizer which is known to work well on this dataset\n",
" # Feel free to try other optimizers from \n",
" # https://www.cntk.ai/pythondocs/cntk.learner.html#module-cntk.learner\n",
" learner = C.adam(parameters=model.parameters,\n",
" lr=lr_schedule,\n",
" momentum=momentum_as_time_constant,\n",
" momentum=momentums,\n",
" gradient_clipping_threshold_per_sample=15, \n",
" gradient_clipping_with_truncation=True)\n",
"\n",
@ -1518,7 +1518,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
"version": "3.5.4"
}
},
"nbformat": 4,

Различия файлов скрыты, потому что одна или несколько строк слишком длинны

Просмотреть файл

@ -709,8 +709,9 @@
" minibatch_size = 72\n",
" lr = 0.001 if use_attention else 0.005\n",
" learner = C.fsadagrad(model_train.parameters,\n",
" lr = C.learning_rate_schedule([lr]*2+[lr/2]*3+[lr/4], C.UnitType.sample, epoch_size),\n",
" momentum = C.momentum_as_time_constant_schedule(1100),\n",
" #apply the learning rate as if it is a minibatch of size 1\n",
" lr = C.learning_parameter_schedule_per_sample([lr]*2+[lr/2]*3+[lr/4], epoch_size),\n",
" momentum = C.momentum_schedule(0.9366416204111472, minibatch_size=minibatch_size),\n",
" gradient_clipping_threshold_per_sample=2.3,\n",
" gradient_clipping_with_truncation=True)\n",
" trainer = C.Trainer(None, criterion, learner)\n",
@ -1331,7 +1332,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
"version": "3.5.4"
}
},
"nbformat": 4,

Просмотреть файл

@ -367,13 +367,13 @@
"\n",
" G_learner = C.fsadagrad(\n",
" parameters = X_fake.parameters,\n",
" lr = C.learning_rate_schedule(lr, C.UnitType.sample),\n",
" momentum = C.momentum_as_time_constant_schedule(700)\n",
" lr = C.learning_parameter_schedule_per_sample(lr),\n",
" momentum = C.momentum_schedule_per_sample(0.9985724484938566)\n",
" )\n",
" D_learner = C.fsadagrad(\n",
" parameters = D_real.parameters,\n",
" lr = C.learning_rate_schedule(lr, C.UnitType.sample),\n",
" momentum = C.momentum_as_time_constant_schedule(700)\n",
" lr = C.learning_parameter_schedule_per_sample(lr),\n",
" momentum = C.momentum_schedule_per_sample(0.9985724484938566)\n",
" )\n",
"\n",
" # Instantiate the trainers\n",
@ -695,7 +695,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
"version": "3.5.4"
}
},
"nbformat": 4,

Просмотреть файл

@ -418,12 +418,12 @@
"\n",
" G_learner = C.adam(\n",
" parameters = X_fake.parameters,\n",
" lr = C.learning_rate_schedule(lr, C.UnitType.sample),\n",
" lr = C.learning_parameter_schedule_per_sample(lr),\n",
" momentum = C.momentum_schedule(momentum)\n",
" )\n",
" D_learner = C.adam(\n",
" parameters = D_real.parameters,\n",
" lr = C.learning_rate_schedule(lr, C.UnitType.sample),\n",
" lr = C.learning_parameter_schedule_per_sample(lr),\n",
" momentum = C.momentum_schedule(momentum)\n",
" )\n",
"\n",
@ -680,7 +680,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
"version": "3.5.4"
}
},
"nbformat": 4,

Различия файлов скрыты, потому что одна или несколько строк слишком длинны

Просмотреть файл

@ -166,7 +166,7 @@
"err = C.edit_distance_error(z, label, squashInputs=True, tokensToIgnore=[132])\n",
"# Learning rate parameter schedule per sample: \n",
"# Use 0.01 for the first 3 epochs, followed by 0.001 for the remaining\n",
"lr = C.learning_rate_schedule([(3, .01), (1,.001)], C.UnitType.sample)\n",
"lr = C.learning_parameter_schedule_per_sample([(3, .01), (1,.001)])\n",
"mm = C.momentum_schedule([(1000, 0.9), (0, 0.99)], mbsize)\n",
"learner = C.momentum_sgd(z.parameters, lr, mm)\n",
"trainer = C.Trainer(z, (criteria, err), learner)"
@ -282,7 +282,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
"version": "3.5.4"
}
},
"nbformat": 4,

Просмотреть файл

@ -655,7 +655,7 @@
" pe = C.classification_error(tl_model, label_input)\n",
"\n",
" # Instantiate the trainer object\n",
" lr_schedule = C.learning_rate_schedule(learning_params['lr_per_mb'], unit=C.UnitType.minibatch)\n",
" lr_schedule = C.learning_parameter_schedule(learning_params['lr_per_mb'])\n",
" mm_schedule = C.momentum_schedule(learning_params['momentum_per_mb'])\n",
" learner = C.momentum_sgd(tl_model.parameters, lr_schedule, mm_schedule, \n",
" l2_regularization_weight=learning_params['l2_reg_weight'])\n",
@ -1232,7 +1232,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
"version": "3.5.4"
}
},
"nbformat": 4,

Просмотреть файл

@ -394,12 +394,12 @@
" [0.00015625]*20 + \\\n",
" [0.000046875]*10 + \\\n",
" [0.000015625]\n",
" lr_schedule = C.learning_rate_schedule(lr_per_sample, \\\n",
" unit=C.learners.UnitType.sample, \\\n",
" lr_schedule = C.learning_parameter_schedule_per_sample(lr_per_sample, \\\n",
" epoch_size=EPOCH_SIZE)\n",
" mm_time_constant = [0]*20 + [600]*20 + [1200]\n",
" mm_schedule = C.learners.momentum_as_time_constant_schedule(mm_time_constant, \\\n",
" epoch_size=EPOCH_SIZE)\n",
" mms = [0]*20 + [0.9200444146293233]*20 + [0.9591894571091382]\n",
" mm_schedule = C.learners.momentum_schedule(mms, \\\n",
" epoch_size=EPOCH_SIZE, \\\n",
" minibatch_size=MINIBATCH_SIZE)\n",
" l2_reg_weight = 0.0002\n",
"\n",
" model = C.combine(network['query_vector'], network['answer_vector'])\n",
@ -676,7 +676,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
"version": "3.5.4"
}
},
"nbformat": 4,

Просмотреть файл

@ -153,7 +153,9 @@
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from __future__ import print_function\n",
@ -250,7 +252,7 @@
" url = \"https://github.com/Microsoft/CNTK/blob/release/2.2/Examples/SequenceToSequence/CMUDict/Data/%s?raw=true\"%file\n",
" print(\"Starting download:\", file)\n",
" download(url, file)\n",
" print(\"Download completed\")\n"
" print(\"Download completed\")"
]
},
{
@ -346,7 +348,9 @@
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def create_reader(path, randomize, size=C.io.INFINITELY_REPEAT):\n",
@ -379,7 +383,9 @@
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"model_dir = \".\" # we downloaded our data to the local directory above # TODO check me\n",
@ -413,7 +419,9 @@
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Source and target inputs to the model\n",
@ -446,7 +454,9 @@
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Instantiate the sequence to sequence translation model\n",
@ -485,7 +495,9 @@
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def LSTM_layer(input, \n",
@ -536,7 +548,9 @@
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# 1.\n",
@ -583,7 +597,9 @@
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"decoder_input = C.element_select(is_first_label, label_sentence_start_scattered, C.sequence.past_value(label_sequence))"
@ -603,7 +619,9 @@
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"(output_h, output_c) = LSTM_layer(input_sequence, hidden_dim,\n",
@ -629,7 +647,9 @@
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# 1.\n",
@ -681,7 +701,9 @@
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# 1.\n",
@ -822,17 +844,19 @@
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# training parameters\n",
"lr_per_sample = C.learning_rate_schedule(0.007, C.UnitType.sample)\n",
"lr_per_sample = C.learning_parameter_schedule_per_sample(0.007)\n",
"minibatch_size = 72\n",
"momentum_time_constant = C.momentum_as_time_constant_schedule(1100)\n",
"momentum_schedule = C.momentum_schedule(0.9366416204111472, minibatch_size=minibatch_size)\n",
"clipping_threshold_per_sample = 2.3\n",
"gradient_clipping_with_truncation = True\n",
"learner = C.momentum_sgd(model.parameters,\n",
" lr_per_sample, momentum_time_constant,\n",
" lr_per_sample, momentum_schedule,\n",
" gradient_clipping_threshold_per_sample=clipping_threshold_per_sample,\n",
" gradient_clipping_with_truncation=gradient_clipping_with_truncation)\n",
"trainer = C.Trainer(model, (ce, errs), learner)"
@ -848,7 +872,9 @@
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# helper function to find variables by name\n",
@ -932,7 +958,9 @@
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"model = create_model()\n",
@ -965,7 +993,9 @@
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"########################\n",
@ -1169,7 +1199,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
"version": "3.5.4"
}
},
"nbformat": 4,

Различия файлов скрыты, потому что одна или несколько строк слишком длинны

Просмотреть файл

@ -8,7 +8,7 @@ import numpy as np
from cntk.device import cpu, try_set_default_device
from cntk import Trainer
from cntk.layers import Dense, Sequential, For
from cntk.learners import sgd, learning_rate_schedule, UnitType
from cntk.learners import sgd, learning_parameter_schedule
from cntk.ops import input_variable, sigmoid
from cntk.losses import cross_entropy_with_softmax
from cntk.metrics import classification_error
@ -50,7 +50,7 @@ def ffnet():
ce = cross_entropy_with_softmax(netout, label)
pe = classification_error(netout, label)
lr_per_minibatch = learning_rate_schedule(0.5, UnitType.minibatch)
lr_per_minibatch = learning_parameter_schedule(0.5)
# Instantiate the trainer object to drive the model training
learner = sgd(netout.parameters, lr=lr_per_minibatch)
progress_printer = ProgressPrinter(128)

Просмотреть файл

@ -270,7 +270,7 @@ def training_parameter_schedule(schedule, unit=UnitType.minibatch, epoch_size=No
training parameter schedule
See also:
:func:`learning_rate_schedule`
:func:`learning_parameter_schedule`
'''
if unit == UnitType.sample:
@ -353,6 +353,34 @@ def learning_parameter_schedule(schedule, minibatch_size=None, epoch_size=None):
raise ValueError(
'schedule must be either a float or a list, not %s' % type(schedule))
@typemap
def learning_parameter_schedule_per_sample(schedule, epoch_size=None):
'''
Create a learning parameter schedule as if the parameter is applied to minibatches of size 1. CNTK
will scale the parameters accordingly with respect to the actual minibatch size.
Args:
schedule (float or list): if float, is the parameter schedule to be used
for all samples. In case of list [p_1, p_2, .., p_n], the i-th parameter p_i in the list is used as the
value from the (``epoch_size`` * (i-1) + 1)-th sample to the (``epoch_size`` * i)-th sample. If list contains
pair, i.e. [(num_epoch_1, p_1), (num_epoch_n, p_2), .., (num_epoch_n, p_n)], the i-th parameter is used as a
value from the (``epoch_size`` * (num_epoch_0 + ... + num_epoch_2 + ... + num_epoch_(i-1) + 1)-th sample to the
(``epoch_size`` * num_epoch_i)-th sample (taking num_epoch_0 = 0 as a special initialization).
epoch_size (optional, int): number of samples as a scheduling unit.
Parameters in the schedule change their values every ``epoch_size``
samples. If no ``epoch_size`` is provided, this parameter is substituted
by the size of the full data sweep, in which case the scheduling unit is
the entire data sweep (as indicated by the MinibatchSource) and parameters
change their values on the sweep-by-sweep basis specified by the
``schedule``.
Returns:
learning parameter schedule as if it is applied to minibatches of size 1.
'''
return learning_parameter_schedule(schedule, minibatch_size=1, epoch_size=epoch_size)
@typemap
def learning_rate_schedule(lr, unit, epoch_size=None):
'''
@ -384,20 +412,20 @@ def learning_rate_schedule(lr, unit, epoch_size=None):
@typemap
def momentum_schedule(momentum, epoch_size=None, minibatch_size = None):
'''
Create a per-minibatch momentum schedule (using the same semantics as
:func:`training_parameter_schedule` with the `unit=UnitType.minibatch`).
Create a momentum schedule (using the same semantics as
:func:`learning_parameter_schedule`) which applies the momentum
decay every N samples where N is specified by the argument `minibatch_size`.
Args:
momentum (float or list): see parameter ``schedule`` in
:func:`training_parameter_schedule`.
epoch_size (int): see parameter ``epoch_size`` in
:func:`training_parameter_schedule`.
minibatch_size (int): an integer to specify the reference minibatch size that schedule are designed for;
CNTK will scale the schedule internally so as to simulate the behavior of the schedule as much as possible
to match the designed effect.
If you want to provide momentum values in a minibatch-size
agnostic way, use :func:`momentum_as_time_constant_schedule`.
minibatch_size (int): an integer to specify the reference minibatch size;
CNTK will scale the momentum internally so as to simulate the momentum decay of the specified minibatch
size while the actual minibatch sizes of the fed data can vary. In this way, momentum values can be provided
in a minibatch-size agnostic way (equal decay per sample). If minibatch_size is `None` (default), the momentum
is applied to the whole minibatch regardless of the actual minibatch sizes (not in a minibatch-size agnostic way).
Examples:
>>> # Use a fixed momentum of 0.99 for all samples
@ -422,6 +450,23 @@ def momentum_schedule(momentum, epoch_size=None, minibatch_size = None):
return learning_parameter_schedule(momentum, minibatch_size, epoch_size)
@typemap
def momentum_schedule_per_sample(momentum, epoch_size=None):
'''
Create a per-sample momentum schedule (using the same semantics as
:func:`momentum_schedule` but specializing in per sample momentum schedule).
Args:
momentum (float or list): see parameter ``schedule`` in
:func:`training_parameter_schedule`.
epoch_size (int): see parameter ``epoch_size`` in
:func:`momentum_schedule`.
Returns:
momentum schedule
'''
return momentum_schedule(momentum, minibatch_size=1, epoch_size=epoch_size)
@typemap
def momentum_as_time_constant_schedule(momentum, epoch_size=None):
'''
@ -447,16 +492,13 @@ def momentum_as_time_constant_schedule(momentum, epoch_size=None):
:func:`training_parameter_schedule`.
epoch_size (int): see parameter ``epoch_size`` in
:func:`training_parameter_schedule`.
minibatch_size (int): an integer to specify the reference minibatch size that schedule are designed for;
CNTK will scale the schedule internally so as to simulate the behavior of the schedule as much as possible
to match the designed effect.
CNTK specifies momentum in a minibatch-size agnostic way as the time
constant (in samples) of a unit-gain 1st-order IIR filter. The value
specifies the number of samples after which a gradient has an effect of
1/e=37%.
If you want to specify the momentum per sample (or per minibatch),
If you want to specify the momentum per N samples (or per minibatch),
use :func:`momentum_schedule`.
Examples:
@ -896,12 +938,12 @@ def adagrad(parameters, lr, need_ave_multiplier=True,
@typemap
def fsadagrad(parameters, lr, momentum, unit_gain=default_unit_gain_value(),
variance_momentum=momentum_as_time_constant_schedule(720000),
variance_momentum=momentum_schedule_per_sample(0.9999986111120757),
l1_regularization_weight=0.0, l2_regularization_weight=0.0,
gaussian_noise_injection_std_dev=0.0, gradient_clipping_threshold_per_sample=np.inf,
gradient_clipping_with_truncation=True, use_mean_gradient=None,
minibatch_size=None, epoch_size=None):
'''fsadagrad(parameters, lr, momentum, unit_gain=default_unit_gain_value(), variance_momentum=momentum_as_time_constant_schedule(720000), l1_regularization_weight=0, l2_regularization_weight=0, gaussian_noise_injection_std_dev=0, gradient_clipping_threshold_per_sample=np.inf, gradient_clipping_with_truncation=True)
'''fsadagrad(parameters, lr, momentum, unit_gain=default_unit_gain_value(), variance_momentum=momentum_schedule_per_sample(0.9999986111120757), l1_regularization_weight=0, l2_regularization_weight=0, gaussian_noise_injection_std_dev=0, gradient_clipping_threshold_per_sample=np.inf, gradient_clipping_with_truncation=True)
Creates an FSAdaGrad learner instance to learn the parameters.
Args:
@ -913,8 +955,8 @@ def fsadagrad(parameters, lr, momentum, unit_gain=default_unit_gain_value(),
For additional information, please refer to the :cntkwiki:`this CNTK Wiki article <BrainScript-SGD-Block#converting-learning-rate-and-momentum-parameters-from-other-toolkits>`.
unit_gain: when ``True``, momentum is interpreted as a unit-gain filter. Defaults
to the value returned by :func:`default_unit_gain_value`.
variance_momentum (float, list, output of :func:`momentum_schedule` or :func:`momentum_as_time_constant_schedule`): variance momentum schedule. Defaults
to ``momentum_as_time_constant_schedule(720000)``.
variance_momentum (float, list, output of :func:`momentum_schedule`): variance momentum schedule. Defaults
to ``momentum_schedule_per_sample(0.9999986111120757)``.
l1_regularization_weight (float, optional): the L1 regularization weight per sample,
defaults to 0.0
l2_regularization_weight (float, optional): the L2 regularization weight per sample,
@ -970,12 +1012,12 @@ def fsadagrad(parameters, lr, momentum, unit_gain=default_unit_gain_value(),
@typemap
def adam(parameters, lr, momentum, unit_gain=default_unit_gain_value(),
variance_momentum=momentum_as_time_constant_schedule(720000),
variance_momentum=momentum_schedule_per_sample(0.9999986111120757),
l1_regularization_weight=0.0, l2_regularization_weight=0.0,
gaussian_noise_injection_std_dev=0.0, gradient_clipping_threshold_per_sample=np.inf,
gradient_clipping_with_truncation=True, use_mean_gradient=None, epsilon=1e-8, adamax=False,
minibatch_size=None, epoch_size=None):
'''adam(parameters, lr, momentum, unit_gain=default_unit_gain_value(), variance_momentum=momentum_as_time_constant_schedule(720000), l1_regularization_weight=0, l2_regularization_weight=0, gaussian_noise_injection_std_dev=0, gradient_clipping_threshold_per_sample=np.inf, gradient_clipping_with_truncation=True, epsilon=1e-8, adamax=False)
'''adam(parameters, lr, momentum, unit_gain=default_unit_gain_value(), variance_momentum=momentum_schedule_per_sample(0.9999986111120757), l1_regularization_weight=0, l2_regularization_weight=0, gaussian_noise_injection_std_dev=0, gradient_clipping_threshold_per_sample=np.inf, gradient_clipping_with_truncation=True, epsilon=1e-8, adamax=False)
Creates an Adam learner instance to learn the parameters. See [1] for more
information.
@ -988,8 +1030,8 @@ def adam(parameters, lr, momentum, unit_gain=default_unit_gain_value(),
For additional information, please refer to the :cntkwiki:`this CNTK Wiki article <BrainScript-SGD-Block#converting-learning-rate-and-momentum-parameters-from-other-toolkits>`.
unit_gain: when ``True``, momentum is interpreted as a unit-gain filter. Defaults
to the value returned by :func:`default_unit_gain_value`.
variance_momentum (float, list, output of :func:`momentum_schedule` or :func:`momentum_as_time_constant_schedule`): variance momentum schedule.
Note that this is the beta2 parameter in the Adam paper [1]. Defaults to ``momentum_as_time_constant_schedule(720000)``.
variance_momentum (float, list, output of :func:`momentum_schedule`): variance momentum schedule.
Note that this is the beta2 parameter in the Adam paper [1]. Defaults to ``momentum_schedule_per_sample(0.9999986111120757)``.
l1_regularization_weight (float, optional): the L1 regularization weight per sample,
defaults to 0.0
l2_regularization_weight (float, optional): the L2 regularization weight per sample,