Update Tutorials with new learner APIs.

2017-11-10 15:35:50 -08:00 · 2017-11-10 15:35:50 -08:00 · 10cb3328f6
--- a/Tutorials/CNTK_101_LogisticRegression.ipynb
+++ b/Tutorials/CNTK_101_LogisticRegression.ipynb
@ -216,7 +216,7 @@
    "    # class 1 into the vector \"0 1 0\", ...\n",
    "    class_ind = [Y==class_number for class_number in range(num_classes)]\n",
    "    Y = np.asarray(np.hstack(class_ind), dtype=np.float32)\n",
-    "    return X, Y   "
+    "    return X, Y"
   ]
  },
  {
@ -588,8 +588,7 @@
    "    if not (loss == \"NA\" or error ==\"NA\"):\n",
    "        plotdata[\"batchsize\"].append(batchsize)\n",
    "        plotdata[\"loss\"].append(loss)\n",
-    "        plotdata[\"error\"].append(error)\n",
-    "        "
+    "        plotdata[\"error\"].append(error)"
   ]
  },
  {
@ -677,7 +676,7 @@
    "test_minibatch_size = 25\n",
    "features, labels = generate_random_data_sample(test_minibatch_size, input_dim, num_output_classes)\n",
    "\n",
-    "trainer.test_minibatch({feature : features, label : labels}) "
+    "trainer.test_minibatch({feature : features, label : labels})"
   ]
  },
  {
@ -821,7 +820,7 @@
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
-    "version": 3
+    "version": 3.0
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
@ -832,5 +831,5 @@
  }
 },
 "nbformat": 4,
- "nbformat_minor": 1
-}
+ "nbformat_minor": 0
+}
--- a/Tutorials/CNTK_102_FeedForward.ipynb
+++ b/Tutorials/CNTK_102_FeedForward.ipynb
@ -562,7 +562,7 @@
   "source": [
    "# Instantiate the trainer object to drive the model training\n",
    "learning_rate = 0.5\n",
-    "lr_schedule = C.learning_rate_schedule(learning_rate, C.UnitType.minibatch) \n",
+    "lr_schedule = C.learning_parameter_schedule(learning_rate) \n",
    "learner = C.sgd(z.parameters, lr_schedule)\n",
    "trainer = C.Trainer(z, (loss, eval_error), [learner])"
   ]
@ -885,7 +885,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.5.2"
+   "version": "3.5.4"
  }
 },
 "nbformat": 4,
--- a/Tutorials/CNTK_103B_MNIST_LogisticRegression.ipynb
+++ b/Tutorials/CNTK_103B_MNIST_LogisticRegression.ipynb
@ -4,9 +4,7 @@
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
-    "collapsed": true,
-    "deletable": true,
-    "editable": true
+    "collapsed": true
   },
   "outputs": [],
   "source": [
@ -16,8 +14,6 @@
  {
   "cell_type": "markdown",
   "metadata": {
-    "deletable": true,
-    "editable": true,
    "nbpresent": {
     "id": "29b9bd1d-766f-4422-ad96-de0accc1ce58"
    }
@ -38,11 +34,7 @@
  {
   "cell_type": "code",
   "execution_count": 2,
-   "metadata": {
-    "collapsed": false,
-    "deletable": true,
-    "editable": true
-   },
+   "metadata": {},
   "outputs": [
    {
     "data": {
@ -65,10 +57,7 @@
  },
  {
   "cell_type": "markdown",
-   "metadata": {
-    "deletable": true,
-    "editable": true
-   },
+   "metadata": {},
   "source": [
    "**Goal**:\n",
    "Our goal is to train a classifier that will identify the digits in the MNIST dataset. \n",
@ -83,10 +72,7 @@
  },
  {
   "cell_type": "markdown",
-   "metadata": {
-    "deletable": true,
-    "editable": true
-   },
+   "metadata": {},
   "source": [
    "## Logistic Regression\n",
    "[Logistic Regression](https://en.wikipedia.org/wiki/Logistic_regression) (LR) is a fundamental machine learning technique that uses a linear weighted combination of features and generates probability-based predictions of different classes.  \n",
@ -98,10 +84,7 @@
  },
  {
   "cell_type": "markdown",
-   "metadata": {
-    "deletable": true,
-    "editable": true
-   },
+   "metadata": {},
   "source": [
    "In **Binary Logistic Regression** (see top of figure above), the input features are each scaled by an associated weight and summed together.  The sum is passed through a squashing (aka activation) function and generates an output in [0,1].  This output value (which can be thought of as a probability) is then compared with a threshold (such as 0.5) to produce a binary label (0 or 1).  This technique supports only classification problems with two output classes, hence the name binary LR.  In the binary LR example shown above, the [sigmoid][] function is used as the squashing function.\n",
    "\n",
@ -110,10 +93,7 @@
  },
  {
   "cell_type": "markdown",
-   "metadata": {
-    "deletable": true,
-    "editable": true
-   },
+   "metadata": {},
   "source": [
    "In **Multinomial Linear Regression** (see bottom of figure above), 2 or more output nodes are used, one for each output class to be predicted.  Each summation node uses its own set of weights to scale the input features and sum them together. Instead of passing the summed output of the weighted input features through a sigmoid squashing function, the output is often passed through a [softmax][] function (which in addition to squashing, like the sigmoid, the softmax normalizes each nodes' output value using the sum of all unnormalized nodes). (Details in the context of MNIST image to follow)\n",
    "\n",
@ -127,8 +107,6 @@
   "execution_count": 3,
   "metadata": {
    "collapsed": true,
-    "deletable": true,
-    "editable": true,
    "nbpresent": {
     "id": "138d1a78-02e2-4bd6-a20e-07b83f303563"
    }
@ -153,10 +131,7 @@
  },
  {
   "cell_type": "markdown",
-   "metadata": {
-    "deletable": true,
-    "editable": true
-   },
+   "metadata": {},
   "source": [
    "## Initialization"
   ]
@ -165,9 +140,7 @@
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
-    "collapsed": true,
-    "deletable": true,
-    "editable": true
+    "collapsed": true
   },
   "outputs": [],
   "source": [
@ -178,10 +151,7 @@
  },
  {
   "cell_type": "markdown",
-   "metadata": {
-    "deletable": true,
-    "editable": true
-   },
+   "metadata": {},
   "source": [
    "## Data reading\n",
    "\n",
@ -199,9 +169,7 @@
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
-    "collapsed": true,
-    "deletable": true,
-    "editable": true
+    "collapsed": true
   },
   "outputs": [],
   "source": [
@ -220,11 +188,7 @@
  {
   "cell_type": "code",
   "execution_count": 7,
-   "metadata": {
-    "collapsed": false,
-    "deletable": true,
-    "editable": true
-   },
+   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
@ -255,10 +219,7 @@
  },
  {
   "cell_type": "markdown",
-   "metadata": {
-    "deletable": true,
-    "editable": true
-   },
+   "metadata": {},
   "source": [
    "## Model Creation\n",
    "\n",
@ -272,10 +233,7 @@
  },
  {
   "cell_type": "markdown",
-   "metadata": {
-    "deletable": true,
-    "editable": true
-   },
+   "metadata": {},
   "source": [
    "The first step is to compute the evidence for an observation. \n",
    "\n",
@ -288,10 +246,7 @@
  },
  {
   "cell_type": "markdown",
-   "metadata": {
-    "deletable": true,
-    "editable": true
-   },
+   "metadata": {},
   "source": [
    "Network input and output: \n",
    "- **input** variable (a key CNTK concept): \n",
@ -305,9 +260,7 @@
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
-    "collapsed": true,
-    "deletable": true,
-    "editable": true
+    "collapsed": true
   },
   "outputs": [],
   "source": [
@ -317,10 +270,7 @@
  },
  {
   "cell_type": "markdown",
-   "metadata": {
-    "deletable": true,
-    "editable": true
-   },
+   "metadata": {},
   "source": [
    "## Logistic Regression network setup\n",
    "\n",
@ -331,9 +281,7 @@
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
-    "collapsed": true,
-    "deletable": true,
-    "editable": true
+    "collapsed": true
   },
   "outputs": [],
   "source": [
@ -345,10 +293,7 @@
  },
  {
   "cell_type": "markdown",
-   "metadata": {
-    "deletable": true,
-    "editable": true
-   },
+   "metadata": {},
   "source": [
    "`z` will be used to represent the output of a network."
   ]
@ -357,9 +302,7 @@
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
-    "collapsed": true,
-    "deletable": true,
-    "editable": true
+    "collapsed": true
   },
   "outputs": [],
   "source": [
@ -369,10 +312,7 @@
  },
  {
   "cell_type": "markdown",
-   "metadata": {
-    "deletable": true,
-    "editable": true
-   },
+   "metadata": {},
   "source": [
    "### Learning model parameters\n",
    "\n",
@ -381,10 +321,7 @@
  },
  {
   "cell_type": "markdown",
-   "metadata": {
-    "deletable": true,
-    "editable": true
-   },
+   "metadata": {},
   "source": [
    "## Training\n",
    "\n",
@ -395,9 +332,7 @@
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
-    "collapsed": true,
-    "deletable": true,
-    "editable": true
+    "collapsed": true
   },
   "outputs": [],
   "source": [
@ -406,10 +341,7 @@
  },
  {
   "cell_type": "markdown",
-   "metadata": {
-    "deletable": true,
-    "editable": true
-   },
+   "metadata": {},
   "source": [
    "### Evaluation\n",
    "\n",
@ -420,9 +352,7 @@
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
-    "collapsed": true,
-    "deletable": true,
-    "editable": true
+    "collapsed": true
   },
   "outputs": [],
   "source": [
@ -431,10 +361,7 @@
  },
  {
   "cell_type": "markdown",
-   "metadata": {
-    "deletable": true,
-    "editable": true
-   },
+   "metadata": {},
   "source": [
    "### Configure training\n",
    "\n",
@ -452,25 +379,20 @@
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
-    "collapsed": true,
-    "deletable": true,
-    "editable": true
+    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Instantiate the trainer object to drive the model training\n",
    "learning_rate = 0.2\n",
-    "lr_schedule = C.learning_rate_schedule(learning_rate, C.UnitType.minibatch)\n",
+    "lr_schedule = C.learning_parameter_schedule(learning_rate)\n",
    "learner = C.sgd(z.parameters, lr_schedule)\n",
    "trainer = C.Trainer(z, (loss, label_error), [learner])"
   ]
  },
  {
   "cell_type": "markdown",
-   "metadata": {
-    "deletable": true,
-    "editable": true
-   },
+   "metadata": {},
   "source": [
    "First let us create some helper functions that will be needed to visualize different functions associated with training."
   ]
@ -479,9 +401,7 @@
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
-    "collapsed": true,
-    "deletable": true,
-    "editable": true
+    "collapsed": true
   },
   "outputs": [],
   "source": [
@ -509,10 +429,7 @@
  },
  {
   "cell_type": "markdown",
-   "metadata": {
-    "deletable": true,
-    "editable": true
-   },
+   "metadata": {},
   "source": [
    "### Run the trainer\n",
    "\n",
@ -525,9 +442,7 @@
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
-    "collapsed": true,
-    "deletable": true,
-    "editable": true
+    "collapsed": true
   },
   "outputs": [],
   "source": [
@ -541,11 +456,7 @@
  {
   "cell_type": "code",
   "execution_count": 16,
-   "metadata": {
-    "collapsed": false,
-    "deletable": true,
-    "editable": true
-   },
+   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
@ -604,10 +515,7 @@
  },
  {
   "cell_type": "markdown",
-   "metadata": {
-    "deletable": true,
-    "editable": true
-   },
+   "metadata": {},
   "source": [
    "Let us plot the errors over the different training minibatches. Note that as we iterate the training loss decreases though we do see some intermediate bumps. \n",
    "\n",
@ -617,11 +525,7 @@
  {
   "cell_type": "code",
   "execution_count": 17,
-   "metadata": {
-    "collapsed": false,
-    "deletable": true,
-    "editable": true
-   },
+   "metadata": {},
   "outputs": [
    {
     "data": {
@ -671,10 +575,7 @@
  },
  {
   "cell_type": "markdown",
-   "metadata": {
-    "deletable": true,
-    "editable": true
-   },
+   "metadata": {},
   "source": [
    "### Run  evaluation / Testing \n",
    "\n",
@ -684,11 +585,7 @@
  {
   "cell_type": "code",
   "execution_count": 18,
-   "metadata": {
-    "collapsed": false,
-    "deletable": true,
-    "editable": true
-   },
+   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
@ -731,20 +628,14 @@
  },
  {
   "cell_type": "markdown",
-   "metadata": {
-    "deletable": true,
-    "editable": true
-   },
+   "metadata": {},
   "source": [
    "Note, this error is very comparable to our training error indicating that our model has good \"out of sample\" error a.k.a. generalization error. This implies that our model can very effectively deal with previously unseen observations (during the training process). This is key to avoid the phenomenon of overfitting."
   ]
  },
  {
   "cell_type": "markdown",
-   "metadata": {
-    "deletable": true,
-    "editable": true
-   },
+   "metadata": {},
   "source": [
    "We have so far been dealing with aggregate measures of error. Let us now get the probabilities associated with individual data points. For each observation, the `eval` function returns the probability distribution across all the classes. The classifier is trained to recognize digits, hence has 10 classes. First let us route the network output through a `softmax` function. This maps the aggregated activations across the network to probabilities across the 10 classes."
   ]
@ -753,9 +644,7 @@
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
-    "collapsed": true,
-    "deletable": true,
-    "editable": true
+    "collapsed": true
   },
   "outputs": [],
   "source": [
@ -764,10 +653,7 @@
  },
  {
   "cell_type": "markdown",
-   "metadata": {
-    "deletable": true,
-    "editable": true
-   },
+   "metadata": {},
   "source": [
    "Let us a small minibatch sample from the test data."
   ]
@ -776,9 +662,7 @@
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
-    "collapsed": true,
-    "deletable": true,
-    "editable": true
+    "collapsed": true
   },
   "outputs": [],
   "source": [
@ -799,9 +683,7 @@
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
-    "collapsed": true,
-    "deletable": true,
-    "editable": true
+    "collapsed": true
   },
   "outputs": [],
   "source": [
@ -813,11 +695,7 @@
  {
   "cell_type": "code",
   "execution_count": 22,
-   "metadata": {
-    "collapsed": false,
-    "deletable": true,
-    "editable": true
-   },
+   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
@ -835,10 +713,7 @@
  },
  {
   "cell_type": "markdown",
-   "metadata": {
-    "deletable": true,
-    "editable": true
-   },
+   "metadata": {},
   "source": [
    "Let us visualize some of the results"
   ]
@ -846,11 +721,7 @@
  {
   "cell_type": "code",
   "execution_count": 23,
-   "metadata": {
-    "collapsed": false,
-    "deletable": true,
-    "editable": true
-   },
+   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
@ -883,9 +754,7 @@
  {
   "cell_type": "markdown",
   "metadata": {
-    "collapsed": true,
-    "deletable": true,
-    "editable": true
+    "collapsed": true
   },
   "source": [
    "**Exploration Suggestion**\n",
@ -912,7 +781,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.5.3"
+   "version": "3.5.4"
  }
 },
 "nbformat": 4,
--- a/Tutorials/CNTK_103C_MNIST_MultiLayerPerceptron.ipynb
+++ b/Tutorials/CNTK_103C_MNIST_MultiLayerPerceptron.ipynb
@ -374,7 +374,7 @@
   "source": [
    "# Instantiate the trainer object to drive the model training\n",
    "learning_rate = 0.2\n",
-    "lr_schedule = C.learning_rate_schedule(learning_rate, C.UnitType.minibatch)\n",
+    "lr_schedule = C.learning_parameter_schedule(learning_rate)\n",
    "learner = C.sgd(z.parameters, lr_schedule)\n",
    "trainer = C.Trainer(z, (loss, label_error), [learner])"
   ]
@ -783,7 +783,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.5.2"
+   "version": "3.5.4"
  }
 },
 "nbformat": 4,
--- a/Tutorials/CNTK_103D_MNIST_ConvolutionalNeuralNetwork.ipynb
+++ b/Tutorials/CNTK_103D_MNIST_ConvolutionalNeuralNetwork.ipynb
@ -585,7 +585,7 @@
    "    \n",
    "    # Instantiate the trainer object to drive the model training\n",
    "    learning_rate = 0.2\n",
-    "    lr_schedule = C.learning_rate_schedule(learning_rate, C.UnitType.minibatch)\n",
+    "    lr_schedule = C.learning_parameter_schedule(learning_rate)\n",
    "    learner = C.sgd(z.parameters, lr_schedule)\n",
    "    trainer = C.Trainer(z, (loss, label_error), [learner])\n",
    "    \n",
@ -1071,7 +1071,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.5.2"
+   "version": "3.5.4"
  }
 },
 "nbformat": 4,
--- a/Tutorials/CNTK_104_Finance_Timeseries_Basic_with_Pandas_Numpy.ipynb
+++ b/Tutorials/CNTK_104_Finance_Timeseries_Basic_with_Pandas_Numpy.ipynb
@ -570,7 +570,7 @@
    "z = create_model(input, num_output_classes)\n",
    "loss = C.cross_entropy_with_softmax(z, label)\n",
    "label_error = C.classification_error(z, label)\n",
-    "lr_per_minibatch = C.learning_rate_schedule(0.125,C.UnitType.minibatch)\n",
+    "lr_per_minibatch = C.learning_parameter_schedule(0.125)\n",
    "trainer = C.Trainer(z, (loss, label_error), [C.sgd(z.parameters, lr=lr_per_minibatch)])"
   ]
  },
@ -1089,7 +1089,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.5.2"
+   "version": "3.5.4"
  }
 },
 "nbformat": 4,
--- a/Tutorials/CNTK_105_Basic_Autoencoder_for_Dimensionality_Reduction.ipynb
+++ b/Tutorials/CNTK_105_Basic_Autoencoder_for_Dimensionality_Reduction.ipynb
--- a/Tutorials/CNTK_106A_LSTM_Timeseries_with_Simulated_Data.ipynb
+++ b/Tutorials/CNTK_106A_LSTM_Timeseries_with_Simulated_Data.ipynb
@ -359,7 +359,7 @@
    "\n",
    "# the learning rate\n",
    "learning_rate = 0.02\n",
-    "lr_schedule = C.learning_rate_schedule(learning_rate, C.UnitType.minibatch)\n",
+    "lr_schedule = C.learning_parameter_schedule(learning_rate)\n",
    "\n",
    "# loss function\n",
    "loss = C.squared_error(z, l)\n",
@ -368,10 +368,10 @@
    "error = C.squared_error(z, l)\n",
    "\n",
    "# use fsadagrad optimizer\n",
-    "momentum_time_constant = C.momentum_as_time_constant_schedule(BATCH_SIZE / -math.log(0.9)) \n",
+    "momentum_schedule = C.momentum_schedule(0.9, minibatch_size=BATCH_SIZE)\n",
    "learner = C.fsadagrad(z.parameters, \n",
    "                      lr = lr_schedule, \n",
-    "                      momentum = momentum_time_constant, \n",
+    "                      momentum = momentum_schedule, \n",
    "                      unit_gain = True)\n",
    "\n",
    "trainer = C.Trainer(z, (loss, error), [learner])"
@ -573,7 +573,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.5.2"
+   "version": "3.5.4"
  }
 },
 "nbformat": 4,
--- a/Tutorials/CNTK_106B_LSTM_Timeseries_with_IOT_Data.ipynb
+++ b/Tutorials/CNTK_106B_LSTM_Timeseries_with_IOT_Data.ipynb
--- a/Tutorials/CNTK_200_GuidedTour.ipynb
+++ b/Tutorials/CNTK_200_GuidedTour.ipynb
--- a/Tutorials/CNTK_201B_CIFAR-10_ImageHandsOn.ipynb
+++ b/Tutorials/CNTK_201B_CIFAR-10_ImageHandsOn.ipynb
@ -520,15 +520,15 @@
    "    minibatch_size = 64\n",
    "\n",
    "    # Set training parameters\n",
-    "    lr_per_minibatch       = C.learning_rate_schedule([0.01]*10 + [0.003]*10 + [0.001], \n",
-    "                                                      C.UnitType.minibatch, epoch_size)\n",
-    "    momentum_time_constant = C.momentum_as_time_constant_schedule(-minibatch_size/np.log(0.9))\n",
+    "    lr_per_minibatch       = C.learning_parameter_schedule([0.01]*10 + [0.003]*10 + [0.001], \n",
+    "                                                       epoch_size)\n",
+    "    momentums              = C.momentum_schedule(0.9, minibatch_size)\n",
    "    l2_reg_weight          = 0.001\n",
    "    \n",
    "    # trainer object\n",
    "    learner = C.momentum_sgd(z.parameters, \n",
    "                             lr = lr_per_minibatch, \n",
-    "                             momentum = momentum_time_constant, \n",
+    "                             momentum = momentums, \n",
    "                             l2_regularization_weight=l2_reg_weight)\n",
    "    progress_printer = C.logging.ProgressPrinter(tag='Training', num_epochs=max_epochs)\n",
    "    trainer = C.Trainer(z, (ce, pe), [learner], [progress_printer])\n",
@ -1352,7 +1352,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.5.2"
+   "version": "3.5.4"
  }
 },
 "nbformat": 4,
--- a/Tutorials/CNTK_202_Language_Understanding.ipynb
+++ b/Tutorials/CNTK_202_Language_Understanding.ipynb
@ -477,17 +477,17 @@
    "    # do other stuff (e.g. checkpointing, adjust learning rate, etc.)\n",
    "    lr_per_sample = [3e-4]*4+[1.5e-4]\n",
    "    lr_per_minibatch = [lr * minibatch_size for lr in lr_per_sample]\n",
-    "    lr_schedule = C.learning_rate_schedule(lr_per_minibatch, C.UnitType.minibatch, epoch_size)\n",
+    "    lr_schedule = C.learning_parameter_schedule(lr_per_minibatch, epoch_size=epoch_size)\n",
    "    \n",
    "    # Momentum schedule\n",
-    "    momentum_as_time_constant = C.momentum_as_time_constant_schedule(700)\n",
+    "    momentums = C.momentum_schedule(0.9048374180359595, minibatch_size=minibatch_size)\n",
    "    \n",
    "    # We use a the Adam optimizer which is known to work well on this dataset\n",
    "    # Feel free to try other optimizers from \n",
    "    # https://www.cntk.ai/pythondocs/cntk.learner.html#module-cntk.learner\n",
    "    learner = C.adam(parameters=model.parameters,\n",
    "                     lr=lr_schedule,\n",
-    "                     momentum=momentum_as_time_constant,\n",
+    "                     momentum=momentums,\n",
    "                     gradient_clipping_threshold_per_sample=15, \n",
    "                     gradient_clipping_with_truncation=True)\n",
    "\n",
@ -1518,7 +1518,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.5.2"
+   "version": "3.5.4"
  }
 },
 "nbformat": 4,
--- a/Tutorials/CNTK_203_Reinforcement_Learning_Basics.ipynb
+++ b/Tutorials/CNTK_203_Reinforcement_Learning_Basics.ipynb
--- a/Tutorials/CNTK_204_Sequence_To_Sequence.ipynb
+++ b/Tutorials/CNTK_204_Sequence_To_Sequence.ipynb
@ -709,8 +709,9 @@
    "    minibatch_size = 72\n",
    "    lr = 0.001 if use_attention else 0.005\n",
    "    learner = C.fsadagrad(model_train.parameters,\n",
-    "                          lr = C.learning_rate_schedule([lr]*2+[lr/2]*3+[lr/4], C.UnitType.sample, epoch_size),\n",
-    "                          momentum = C.momentum_as_time_constant_schedule(1100),\n",
+    "                          #apply the learning rate as if it is a minibatch of size 1\n",
+    "                          lr = C.learning_parameter_schedule_per_sample([lr]*2+[lr/2]*3+[lr/4], epoch_size),\n",
+    "                          momentum = C.momentum_schedule(0.9366416204111472, minibatch_size=minibatch_size),\n",
    "                          gradient_clipping_threshold_per_sample=2.3,\n",
    "                          gradient_clipping_with_truncation=True)\n",
    "    trainer = C.Trainer(None, criterion, learner)\n",
@ -1331,7 +1332,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.5.2"
+   "version": "3.5.4"
  }
 },
 "nbformat": 4,
--- a/Tutorials/CNTK_206A_Basic_GAN.ipynb
+++ b/Tutorials/CNTK_206A_Basic_GAN.ipynb
@ -367,13 +367,13 @@
    "\n",
    "    G_learner = C.fsadagrad(\n",
    "        parameters = X_fake.parameters,\n",
-    "        lr = C.learning_rate_schedule(lr, C.UnitType.sample),\n",
-    "        momentum = C.momentum_as_time_constant_schedule(700)\n",
+    "        lr = C.learning_parameter_schedule_per_sample(lr),\n",
+    "        momentum = C.momentum_schedule_per_sample(0.9985724484938566)\n",
    "    )\n",
    "    D_learner = C.fsadagrad(\n",
    "        parameters = D_real.parameters,\n",
-    "        lr = C.learning_rate_schedule(lr, C.UnitType.sample),\n",
-    "        momentum = C.momentum_as_time_constant_schedule(700)\n",
+    "        lr = C.learning_parameter_schedule_per_sample(lr),\n",
+    "        momentum = C.momentum_schedule_per_sample(0.9985724484938566)\n",
    "    )\n",
    "\n",
    "    # Instantiate the trainers\n",
@ -695,7 +695,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.5.2"
+   "version": "3.5.4"
  }
 },
 "nbformat": 4,
--- a/Tutorials/CNTK_206B_DCGAN.ipynb
+++ b/Tutorials/CNTK_206B_DCGAN.ipynb
@ -418,12 +418,12 @@
    "\n",
    "    G_learner = C.adam(\n",
    "        parameters = X_fake.parameters,\n",
-    "        lr = C.learning_rate_schedule(lr, C.UnitType.sample),\n",
+    "        lr = C.learning_parameter_schedule_per_sample(lr),\n",
    "        momentum = C.momentum_schedule(momentum)\n",
    "    )\n",
    "    D_learner = C.adam(\n",
    "        parameters = D_real.parameters,\n",
-    "        lr = C.learning_rate_schedule(lr, C.UnitType.sample),\n",
+    "        lr = C.learning_parameter_schedule_per_sample(lr),\n",
    "        momentum = C.momentum_schedule(momentum)\n",
    "    )\n",
    "\n",
@ -680,7 +680,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.5.2"
+   "version": "3.5.4"
  }
 },
 "nbformat": 4,
--- a/Tutorials/CNTK_207_Training_with_Sampled_Softmax.ipynb
+++ b/Tutorials/CNTK_207_Training_with_Sampled_Softmax.ipynb
--- a/Tutorials/CNTK_208_Speech_Connectionist_Temporal_Classification.ipynb
+++ b/Tutorials/CNTK_208_Speech_Connectionist_Temporal_Classification.ipynb
@ -166,7 +166,7 @@
    "err = C.edit_distance_error(z, label, squashInputs=True, tokensToIgnore=[132])\n",
    "# Learning rate parameter schedule per sample: \n",
    "# Use 0.01 for the first 3 epochs, followed by 0.001 for the remaining\n",
-    "lr = C.learning_rate_schedule([(3, .01), (1,.001)], C.UnitType.sample)\n",
+    "lr = C.learning_parameter_schedule_per_sample([(3, .01), (1,.001)])\n",
    "mm = C.momentum_schedule([(1000, 0.9), (0, 0.99)], mbsize)\n",
    "learner = C.momentum_sgd(z.parameters, lr, mm)\n",
    "trainer = C.Trainer(z, (criteria, err), learner)"
@ -282,7 +282,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.5.2"
+   "version": "3.5.4"
  }
 },
 "nbformat": 4,
--- a/Tutorials/CNTK_301_Image_Recognition_with_Deep_Transfer_Learning.ipynb
+++ b/Tutorials/CNTK_301_Image_Recognition_with_Deep_Transfer_Learning.ipynb
@ -655,7 +655,7 @@
    "    pe = C.classification_error(tl_model, label_input)\n",
    "\n",
    "    # Instantiate the trainer object\n",
-    "    lr_schedule = C.learning_rate_schedule(learning_params['lr_per_mb'], unit=C.UnitType.minibatch)\n",
+    "    lr_schedule = C.learning_parameter_schedule(learning_params['lr_per_mb'])\n",
    "    mm_schedule = C.momentum_schedule(learning_params['momentum_per_mb'])\n",
    "    learner = C.momentum_sgd(tl_model.parameters, lr_schedule, mm_schedule, \n",
    "                           l2_regularization_weight=learning_params['l2_reg_weight'])\n",
@ -1232,7 +1232,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.5.2"
+   "version": "3.5.4"
  }
 },
 "nbformat": 4,
--- a/Tutorials/CNTK_303_Deep_Structured_Semantic_Modeling_with_LSTM_Networks.ipynb
+++ b/Tutorials/CNTK_303_Deep_Structured_Semantic_Modeling_with_LSTM_Networks.ipynb
@ -394,12 +394,12 @@
    "                        [0.00015625]*20 + \\\n",
    "                        [0.000046875]*10 + \\\n",
    "                        [0.000015625]\n",
-    "    lr_schedule       = C.learning_rate_schedule(lr_per_sample, \\\n",
-    "                                                 unit=C.learners.UnitType.sample, \\\n",
+    "    lr_schedule       = C.learning_parameter_schedule_per_sample(lr_per_sample, \\\n",
    "                                                 epoch_size=EPOCH_SIZE)\n",
-    "    mm_time_constant  = [0]*20 + [600]*20 + [1200]\n",
-    "    mm_schedule       = C.learners.momentum_as_time_constant_schedule(mm_time_constant, \\\n",
-    "                                                                      epoch_size=EPOCH_SIZE)\n",
+    "    mms               = [0]*20 + [0.9200444146293233]*20 + [0.9591894571091382]\n",
+    "    mm_schedule       = C.learners.momentum_schedule(mms, \\\n",
+    "                                                     epoch_size=EPOCH_SIZE, \\\n",
+    "                                                     minibatch_size=MINIBATCH_SIZE)\n",
    "    l2_reg_weight     = 0.0002\n",
    "\n",
    "    model = C.combine(network['query_vector'], network['answer_vector'])\n",
@ -676,7 +676,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.5.2"
+   "version": "3.5.4"
  }
 },
 "nbformat": 4,
--- a/Tutorials/CNTK_599A_Sequence_To_Sequence.ipynb
+++ b/Tutorials/CNTK_599A_Sequence_To_Sequence.ipynb
@ -153,7 +153,9 @@
  {
   "cell_type": "code",
   "execution_count": 4,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
   "outputs": [],
   "source": [
    "from __future__ import print_function\n",
@ -250,7 +252,7 @@
    "        url = \"https://github.com/Microsoft/CNTK/blob/release/2.2/Examples/SequenceToSequence/CMUDict/Data/%s?raw=true\"%file\n",
    "        print(\"Starting download:\", file)\n",
    "        download(url, file)\n",
-    "        print(\"Download completed\")\n"
+    "        print(\"Download completed\")"
   ]
  },
  {
@ -346,7 +348,9 @@
  {
   "cell_type": "code",
   "execution_count": 10,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
   "outputs": [],
   "source": [
    "def create_reader(path, randomize, size=C.io.INFINITELY_REPEAT):\n",
@ -379,7 +383,9 @@
  {
   "cell_type": "code",
   "execution_count": 11,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
   "outputs": [],
   "source": [
    "model_dir = \".\" # we downloaded our data to the local directory above # TODO check me\n",
@ -413,7 +419,9 @@
  {
   "cell_type": "code",
   "execution_count": 12,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
   "outputs": [],
   "source": [
    "# Source and target inputs to the model\n",
@ -446,7 +454,9 @@
  {
   "cell_type": "code",
   "execution_count": 13,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
   "outputs": [],
   "source": [
    "# Instantiate the sequence to sequence translation model\n",
@ -485,7 +495,9 @@
  {
   "cell_type": "code",
   "execution_count": 14,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
   "outputs": [],
   "source": [
    "def LSTM_layer(input, \n",
@ -536,7 +548,9 @@
  {
   "cell_type": "code",
   "execution_count": 15,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
   "outputs": [],
   "source": [
    "# 1.\n",
@ -583,7 +597,9 @@
  {
   "cell_type": "code",
   "execution_count": 16,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
   "outputs": [],
   "source": [
    "decoder_input = C.element_select(is_first_label, label_sentence_start_scattered, C.sequence.past_value(label_sequence))"
@ -603,7 +619,9 @@
  {
   "cell_type": "code",
   "execution_count": 17,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
   "outputs": [],
   "source": [
    "(output_h, output_c) = LSTM_layer(input_sequence, hidden_dim,\n",
@ -629,7 +647,9 @@
  {
   "cell_type": "code",
   "execution_count": 18,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
   "outputs": [],
   "source": [
    "# 1.\n",
@ -681,7 +701,9 @@
  {
   "cell_type": "code",
   "execution_count": 19,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
   "outputs": [],
   "source": [
    "# 1.\n",
@ -822,17 +844,19 @@
  {
   "cell_type": "code",
   "execution_count": 22,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
   "outputs": [],
   "source": [
    "# training parameters\n",
-    "lr_per_sample = C.learning_rate_schedule(0.007, C.UnitType.sample)\n",
+    "lr_per_sample = C.learning_parameter_schedule_per_sample(0.007)\n",
    "minibatch_size = 72\n",
-    "momentum_time_constant = C.momentum_as_time_constant_schedule(1100)\n",
+    "momentum_schedule = C.momentum_schedule(0.9366416204111472, minibatch_size=minibatch_size)\n",
    "clipping_threshold_per_sample = 2.3\n",
    "gradient_clipping_with_truncation = True\n",
    "learner = C.momentum_sgd(model.parameters,\n",
-    "                         lr_per_sample, momentum_time_constant,\n",
+    "                         lr_per_sample, momentum_schedule,\n",
    "                         gradient_clipping_threshold_per_sample=clipping_threshold_per_sample,\n",
    "                         gradient_clipping_with_truncation=gradient_clipping_with_truncation)\n",
    "trainer = C.Trainer(model, (ce, errs), learner)"
@ -848,7 +872,9 @@
  {
   "cell_type": "code",
   "execution_count": 23,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
   "outputs": [],
   "source": [
    "# helper function to find variables by name\n",
@ -932,7 +958,9 @@
  {
   "cell_type": "code",
   "execution_count": 26,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
   "outputs": [],
   "source": [
    "model = create_model()\n",
@ -965,7 +993,9 @@
  {
   "cell_type": "code",
   "execution_count": 27,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
   "outputs": [],
   "source": [
    "########################\n",
@ -1169,7 +1199,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.5.2"
+   "version": "3.5.4"
  }
 },
 "nbformat": 4,
--- a/Tutorials/FunctionalAPI/CNTK_200_GuidedTour.ipynb
+++ b/Tutorials/FunctionalAPI/CNTK_200_GuidedTour.ipynb
--- a/Tutorials/NumpyInterop/FeedForwardNet.py
+++ b/Tutorials/NumpyInterop/FeedForwardNet.py
@ -8,7 +8,7 @@ import numpy as np
 from cntk.device import cpu, try_set_default_device
 from cntk import Trainer
 from cntk.layers import Dense, Sequential, For
-from cntk.learners import sgd, learning_rate_schedule, UnitType
+from cntk.learners import sgd, learning_parameter_schedule
 from cntk.ops import input_variable, sigmoid
 from cntk.losses import cross_entropy_with_softmax
 from cntk.metrics import classification_error
@ -50,7 +50,7 @@ def ffnet():
    ce = cross_entropy_with_softmax(netout, label)
    pe = classification_error(netout, label)

-    lr_per_minibatch = learning_rate_schedule(0.5, UnitType.minibatch)
+    lr_per_minibatch = learning_parameter_schedule(0.5)
    # Instantiate the trainer object to drive the model training
    learner = sgd(netout.parameters, lr=lr_per_minibatch)
    progress_printer = ProgressPrinter(128)
--- a/bindings/python/cntk/learners/init.py
+++ b/bindings/python/cntk/learners/init.py
@ -270,7 +270,7 @@ def training_parameter_schedule(schedule, unit=UnitType.minibatch, epoch_size=No
        training parameter schedule

    See also:
-        :func:`learning_rate_schedule`
+        :func:`learning_parameter_schedule`
    '''

    if unit == UnitType.sample:
@ -353,6 +353,34 @@ def learning_parameter_schedule(schedule, minibatch_size=None, epoch_size=None):
    raise ValueError(
        'schedule must be either a float or a list, not %s' % type(schedule))

+
+@typemap
+def learning_parameter_schedule_per_sample(schedule, epoch_size=None):
+    '''
+    Create a learning parameter schedule as if the parameter is applied to minibatches of size 1. CNTK
+    will scale the parameters accordingly with respect to the actual minibatch size.
+
+    Args:
+        schedule (float or list): if float, is the parameter schedule to be used
+         for all samples. In case of list [p_1, p_2, .., p_n], the i-th parameter p_i in the list is used as the
+         value from the (``epoch_size`` * (i-1) + 1)-th sample to the (``epoch_size`` * i)-th sample. If list contains
+         pair, i.e. [(num_epoch_1, p_1), (num_epoch_n, p_2), .., (num_epoch_n, p_n)], the i-th parameter is used as a
+         value from the (``epoch_size`` * (num_epoch_0 + ... + num_epoch_2 + ... + num_epoch_(i-1) + 1)-th sample to the
+         (``epoch_size`` * num_epoch_i)-th sample (taking num_epoch_0 = 0 as a special initialization).
+        epoch_size (optional, int): number of samples as a scheduling unit.
+         Parameters in the schedule change their values every ``epoch_size``
+         samples. If no ``epoch_size`` is provided, this parameter is substituted
+         by the size of the full data sweep, in which case the scheduling unit is
+         the entire data sweep (as indicated by the MinibatchSource) and parameters
+         change their values on the sweep-by-sweep basis specified by the
+         ``schedule``.
+
+    Returns:
+        learning parameter schedule as if it is applied to minibatches of size 1.
+    '''
+    return learning_parameter_schedule(schedule, minibatch_size=1, epoch_size=epoch_size)
+
+
@typemap
 def learning_rate_schedule(lr, unit, epoch_size=None):
    '''
@ -384,20 +412,20 @@ def learning_rate_schedule(lr, unit, epoch_size=None):
@typemap
 def momentum_schedule(momentum, epoch_size=None, minibatch_size = None):
    '''
-    Create a per-minibatch momentum schedule (using the same semantics as
-    :func:`training_parameter_schedule` with the `unit=UnitType.minibatch`).
+    Create a momentum schedule (using the same semantics as
+    :func:`learning_parameter_schedule`) which applies the momentum 
+    decay every N samples where N is specified by the argument `minibatch_size`.

    Args:
        momentum (float or list): see parameter ``schedule`` in
         :func:`training_parameter_schedule`.
        epoch_size (int): see parameter ``epoch_size`` in
         :func:`training_parameter_schedule`.
-        minibatch_size (int): an integer to specify the reference minibatch size that schedule are designed for; 
-          CNTK will scale the schedule internally so as to simulate the behavior of the schedule as much as possible
-          to match the designed effect. 
-
-    If you want to provide momentum values in a minibatch-size
-    agnostic way, use :func:`momentum_as_time_constant_schedule`.
+        minibatch_size (int): an integer to specify the reference minibatch size; 
+          CNTK will scale the momentum internally so as to simulate the momentum decay of the specified minibatch 
+          size while the actual minibatch sizes of the fed data can vary. In this way, momentum values can be provided 
+          in a minibatch-size agnostic way (equal decay per sample). If minibatch_size is `None` (default), the momentum
+          is applied to the whole minibatch regardless of the actual minibatch sizes (not in a minibatch-size agnostic way).

    Examples:
        >>> # Use a fixed momentum of 0.99 for all samples
@ -422,6 +450,23 @@ def momentum_schedule(momentum, epoch_size=None, minibatch_size = None):
    return learning_parameter_schedule(momentum, minibatch_size, epoch_size)


+@typemap
+def momentum_schedule_per_sample(momentum, epoch_size=None):
+    '''
+    Create a per-sample momentum schedule (using the same semantics as
+    :func:`momentum_schedule` but specializing in per sample momentum schedule).
+
+    Args:
+        momentum (float or list): see parameter ``schedule`` in
+         :func:`training_parameter_schedule`.
+        epoch_size (int): see parameter ``epoch_size`` in
+         :func:`momentum_schedule`.
+    Returns:
+        momentum schedule
+    '''
+    return momentum_schedule(momentum, minibatch_size=1, epoch_size=epoch_size)
+
+
@typemap
 def momentum_as_time_constant_schedule(momentum, epoch_size=None):
    '''
@ -447,16 +492,13 @@ def momentum_as_time_constant_schedule(momentum, epoch_size=None):
         :func:`training_parameter_schedule`.
        epoch_size (int): see parameter ``epoch_size`` in
         :func:`training_parameter_schedule`.
-        minibatch_size (int): an integer to specify the reference minibatch size that schedule are designed for; 
-          CNTK will scale the schedule internally so as to simulate the behavior of the schedule as much as possible
-          to match the designed effect. 

    CNTK specifies momentum in a minibatch-size agnostic way as the time
    constant (in samples) of a unit-gain 1st-order IIR filter. The value
    specifies the number of samples after which a gradient has an effect of
    1/e=37%.

-    If you want to specify the momentum per sample (or per minibatch),
+    If you want to specify the momentum per N samples (or per minibatch),
    use :func:`momentum_schedule`.

    Examples:
@ -896,12 +938,12 @@ def adagrad(parameters, lr, need_ave_multiplier=True,

@typemap
 def fsadagrad(parameters, lr, momentum, unit_gain=default_unit_gain_value(),
-              variance_momentum=momentum_as_time_constant_schedule(720000),
+              variance_momentum=momentum_schedule_per_sample(0.9999986111120757),
              l1_regularization_weight=0.0, l2_regularization_weight=0.0,
              gaussian_noise_injection_std_dev=0.0, gradient_clipping_threshold_per_sample=np.inf,
              gradient_clipping_with_truncation=True, use_mean_gradient=None,
              minibatch_size=None, epoch_size=None):
-    '''fsadagrad(parameters, lr, momentum, unit_gain=default_unit_gain_value(), variance_momentum=momentum_as_time_constant_schedule(720000), l1_regularization_weight=0, l2_regularization_weight=0, gaussian_noise_injection_std_dev=0, gradient_clipping_threshold_per_sample=np.inf, gradient_clipping_with_truncation=True)
+    '''fsadagrad(parameters, lr, momentum, unit_gain=default_unit_gain_value(), variance_momentum=momentum_schedule_per_sample(0.9999986111120757), l1_regularization_weight=0, l2_regularization_weight=0, gaussian_noise_injection_std_dev=0, gradient_clipping_threshold_per_sample=np.inf, gradient_clipping_with_truncation=True)
    Creates an FSAdaGrad learner instance to learn the parameters.

    Args:
@ -913,8 +955,8 @@ def fsadagrad(parameters, lr, momentum, unit_gain=default_unit_gain_value(),
         For additional information, please refer to the :cntkwiki:`this CNTK Wiki article <BrainScript-SGD-Block#converting-learning-rate-and-momentum-parameters-from-other-toolkits>`.
        unit_gain: when ``True``, momentum is interpreted as a unit-gain filter. Defaults
         to the value returned by :func:`default_unit_gain_value`.
-        variance_momentum (float, list, output of :func:`momentum_schedule` or :func:`momentum_as_time_constant_schedule`): variance momentum schedule. Defaults
-         to ``momentum_as_time_constant_schedule(720000)``.
+        variance_momentum (float, list, output of :func:`momentum_schedule`): variance momentum schedule. Defaults
+         to ``momentum_schedule_per_sample(0.9999986111120757)``.
        l1_regularization_weight (float, optional): the L1 regularization weight per sample,
         defaults to 0.0
        l2_regularization_weight (float, optional): the L2 regularization weight per sample,
@ -970,12 +1012,12 @@ def fsadagrad(parameters, lr, momentum, unit_gain=default_unit_gain_value(),

@typemap
 def adam(parameters, lr, momentum, unit_gain=default_unit_gain_value(),
-         variance_momentum=momentum_as_time_constant_schedule(720000),
+         variance_momentum=momentum_schedule_per_sample(0.9999986111120757),
         l1_regularization_weight=0.0, l2_regularization_weight=0.0,
         gaussian_noise_injection_std_dev=0.0, gradient_clipping_threshold_per_sample=np.inf,
         gradient_clipping_with_truncation=True, use_mean_gradient=None, epsilon=1e-8, adamax=False,
         minibatch_size=None, epoch_size=None):
-    '''adam(parameters, lr, momentum, unit_gain=default_unit_gain_value(), variance_momentum=momentum_as_time_constant_schedule(720000), l1_regularization_weight=0, l2_regularization_weight=0, gaussian_noise_injection_std_dev=0, gradient_clipping_threshold_per_sample=np.inf, gradient_clipping_with_truncation=True, epsilon=1e-8, adamax=False)
+    '''adam(parameters, lr, momentum, unit_gain=default_unit_gain_value(), variance_momentum=momentum_schedule_per_sample(0.9999986111120757), l1_regularization_weight=0, l2_regularization_weight=0, gaussian_noise_injection_std_dev=0, gradient_clipping_threshold_per_sample=np.inf, gradient_clipping_with_truncation=True, epsilon=1e-8, adamax=False)
    Creates an Adam learner instance to learn the parameters. See [1] for more
    information.

@ -988,8 +1030,8 @@ def adam(parameters, lr, momentum, unit_gain=default_unit_gain_value(),
         For additional information, please refer to the :cntkwiki:`this CNTK Wiki article <BrainScript-SGD-Block#converting-learning-rate-and-momentum-parameters-from-other-toolkits>`.
        unit_gain: when ``True``, momentum is interpreted as a unit-gain filter. Defaults
         to the value returned by :func:`default_unit_gain_value`.
-        variance_momentum (float, list, output of :func:`momentum_schedule` or :func:`momentum_as_time_constant_schedule`): variance momentum schedule. 
-         Note that this is the beta2 parameter in the Adam paper [1]. Defaults to ``momentum_as_time_constant_schedule(720000)``. 
+        variance_momentum (float, list, output of :func:`momentum_schedule`): variance momentum schedule. 
+         Note that this is the beta2 parameter in the Adam paper [1]. Defaults to ``momentum_schedule_per_sample(0.9999986111120757)``. 
        l1_regularization_weight (float, optional): the L1 regularization weight per sample,
         defaults to 0.0
        l2_regularization_weight (float, optional): the L2 regularization weight per sample,