senteval local and azureml 📓

2019-05-06 15:26:47 -04:00 · 2019-05-06 15:26:47 -04:00 · 23d9635230
--- a/environment.yml
+++ b/environment.yml
@ -1,37 +0,0 @@
-# 
-# To create the conda environment:
-# $ conda env create -f environment.yml
-# 
-# To update the conda environment:
-# $ conda env update -f environment.yml
-# 
-# To register the conda environment in Jupyter:
-# $ conda activate nlp
-# $ python -m ipykernel install --user --name nlp
-# 
-name: nlp
-channels:
- defaults
- conda-forge
-dependencies:
- python==3.6.8
- ipykernel>=4.6.1
- jupyter>=1.0.0
- pytest>=3.6.4
- cudatoolkit>=10.0
- pip:
-  - black>=18.6b4
-  - papermill>=0.15.0
-  - ipywebrtc
-  - pre-commit>=1.14.4
-  - azureml-dataprep
-  - azureml-sdk>=1.0.33
-  - https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.1.0/en_core_web_sm-2.1.0.tar.gz
-  - gensim>=3.7.0
-  - nltk>=3.4
-  - scikit-learn>=0.20
-  - pytorch-pretrained-bert>=0.6
-  - horovod>=0.16.1
-  - torch>=1.0
-  - urllib3>=1.24
-
--- a/scenarios/sentence_similarity/senteval_azureml.ipynb
+++ b/scenarios/sentence_similarity/senteval_azureml.ipynb
@ -0,0 +1,390 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# SentEval with AzureML\n",
+    "SentEval is a widely used benchmarking tool for evaluating general-purpose sentence embeddings. It provides a simple interface for evaluating your embeddings on up to 17 supported downstream tasks (such as sentiment classification, natural language inference, semantic similarity, etc.)\n",
+    "\n",
+    "This notebook shows how to use SentEval with the AzureML SDK. Running SentEval locally is easy, but not necessarily efficient depending on the model specs. For example, it can quickly become expensive if you are trying to benchmark a model that runs on GPU, even if you are starting with pretrained weights (loading the embeddings and vocabulary for inferencing can take a nontrivial amount of time). In this example we show how to run SentEval for Gensen, where\n",
+    "- the model weights are on AzureML Datastore\n",
+    "- the pretrained embeddings are on AzureML Datastore\n",
+    "- the data for the SentEval transfer tasks are on AzureML Datastore\n",
+    "- evaluation runs on the AzureML Workspace GPU Compute Target (no extra provisioning/config needed)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Global Settings"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "\n",
+    "AZUREML_VERBOSE = False\n",
+    "\n",
+    "src_dir = \"./senteval-pytorch-gensen\"\n",
+    "os.makedirs(src_dir, exist_ok=True)\n",
+    "\n",
+    "PATH_TO_GENSEN = (\n",
+    "    \"../../../gensen\"\n",
+    ")  # Set this path to where you have cloned the gensen source code\n",
+    "PATH_TO_SENTEVAL = (\n",
+    "    \"../../../SentEval\"\n",
+    ")  # Set this path to where you have cloned the senteval source code\n",
+    "\n",
+    "cluster_name = \"eval-gpu\"\n",
+    "ds_root = \"senteval_pytorch_gensen\"  # Root path for the datastore"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Define the AzureML Workspace"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "import azureml.core\n",
+    "from azureml.core.workspace import Workspace\n",
+    "\n",
+    "ws = Workspace.from_config()\n",
+    "if AZUREML_VERBOSE:\n",
+    "    print(\"Workspace name: {}\".format(ws.name))\n",
+    "    print(\"Resource group: {}\".format(ws.resource_group))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Attach the gpu-enabled compute target, or create a new one if it doesn't already exist."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from azureml.core.compute import ComputeTarget, AmlCompute\n",
+    "from azureml.core.compute_target import ComputeTargetException\n",
+    "\n",
+    "try:\n",
+    "    compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",
+    "    print(\"Found compute target: {}\".format(cluster_name))\n",
+    "except ComputeTargetException:\n",
+    "    print(\"Creating new compute target: {}\".format(cluster_name))\n",
+    "    compute_config = AmlCompute.provisioning_configuration(\n",
+    "        vm_size=\"STANDARD_NC6\", max_nodes=4\n",
+    "    )\n",
+    "    compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n",
+    "    compute_target.wait_for_completion(show_output=True)\n",
+    "\n",
+    "if AZUREML_VERBOSE:\n",
+    "    print(compute_target.get_status().serialize())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Define the datastore. Here we will use the default datastore and then upload our external dependencies. \n",
+    "\n",
+    "If your data is already on the cloud, you can register your resource on any Azure storage account as the datastore. (Currently, the list of supported Azure storage services that can be registered as datastores are Azure Blob Container, Azure File Share, Azure Data Lake, Azure Data Lake Gen2, Azure SQL Database, Azure PostgreSQL, and Databricks File System. Learn more about the Datastore module [here](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.datastore?view=azure-ml-py).)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "from azureml.core import Datastore\n",
+    "\n",
+    "ds = ws.get_default_datastore()\n",
+    "if AZUREML_VERBOSE:\n",
+    "    print(\"Default datastore: {}\".format(ds.name))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "import azureml.data\n",
+    "from azureml.data.azure_storage_datastore import AzureFileDatastore\n",
+    "\n",
+    "# Upload the gensen dependency\n",
+    "ds.upload(\n",
+    "    src_dir=os.path.join(PATH_TO_GENSEN),\n",
+    "    target_path=os.path.join(ds_root, \"gensen_lib\"),\n",
+    "    overwrite=False,\n",
+    "    show_progress=AZUREML_VERBOSE,\n",
+    ")\n",
+    "\n",
+    "# Upload the senteval dependency\n",
+    "ds.upload(\n",
+    "    src_dir=os.path.join(PATH_TO_SENTEVAL),\n",
+    "    target_path=os.path.join(ds_root, \"senteval_lib\"),\n",
+    "    overwrite=False,\n",
+    "    show_progress=AZUREML_VERBOSE,\n",
+    ")\n",
+    "\n",
+    "# Upload the utils_nlp/eval/senteval.py dependency (this defines the azureml-compatible wrapper for senteval)\n",
+    "ds.upload_files(\n",
+    "    files=[\"../../utils_nlp/eval/senteval.py\"],\n",
+    "    target_path=os.path.join(ds_root, \"utils_nlp/eval\"),\n",
+    "    overwrite=False,\n",
+    "    show_progress=AZUREML_VERBOSE,\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Note that after the upload is complete, you can safely delete the dependencies from your local machine to free up some memory."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Create the evaluation script"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%writefile $src_dir/evaluate.py\n",
+    "import os\n",
+    "import sys\n",
+    "import argparse\n",
+    "import torch\n",
+    "import pandas as pd\n",
+    "\n",
+    "if __name__ == \"__main__\":\n",
+    "    parser = argparse.ArgumentParser()\n",
+    "    parser.add_argument(\"--ds_gensen\", type=str, dest=\"ds_gensen\")\n",
+    "    parser.add_argument(\"--ds_senteval\", type=str, dest=\"ds_senteval\")\n",
+    "    parser.add_argument(\"--ds_utils\", type=str, dest=\"ds_utils\")\n",
+    "    args = parser.parse_args()\n",
+    "    \n",
+    "    # Import the dependencies\n",
+    "    sys.path.insert(0, args.ds_gensen)\n",
+    "    from gensen import GenSen, GenSenSingle\n",
+    "    sys.path.insert(0, args.ds_utils)\n",
+    "    from eval.senteval import SentEvalRunner\n",
+    "\n",
+    "    # Define the model\n",
+    "    model_params = {}\n",
+    "    model_params[\"folder_path\"] = os.path.join(args.ds_gensen, \"data/models\")\n",
+    "    model_params[\"prefix_1\"] = \"nli_large_bothskip_parse\"\n",
+    "    model_params[\"prefix_2\"] = \"nli_large_bothskip\"\n",
+    "    model_params[\"pretrain\"] = os.path.join(\n",
+    "        args.ds_gensen, \"data/embedding/glove.840B.300d.h5\"\n",
+    "    )\n",
+    "    model_params[\"cuda\"] = torch.cuda.is_available()\n",
+    "\n",
+    "    gensen_1 = GenSenSingle(\n",
+    "        model_folder=model_params[\"folder_path\"],\n",
+    "        filename_prefix=model_params[\"prefix_1\"],\n",
+    "        pretrained_emb=model_params[\"pretrain\"],\n",
+    "        cuda=model_params[\"cuda\"],\n",
+    "    )\n",
+    "    gensen_2 = GenSenSingle(\n",
+    "        model_folder=model_params[\"folder_path\"],\n",
+    "        filename_prefix=model_params[\"prefix_2\"],\n",
+    "        pretrained_emb=model_params[\"pretrain\"],\n",
+    "        cuda=model_params[\"cuda\"],\n",
+    "    )\n",
+    "    gensen = GenSen(gensen_1, gensen_2)\n",
+    "\n",
+    "    # Define the SentEval Runner, an AzureML-compatible wrapper class for SentEval\n",
+    "    ser = SentEvalRunner(path_to_senteval=args.ds_senteval, use_azureml=True)\n",
+    "    ser.set_transfer_data_path(relative_path=\"data\")\n",
+    "    ser.set_transfer_tasks(\n",
+    "        [\"STSBenchmark\", \"STS12\", \"STS13\", \"STS14\", \"STS15\", \"STS16\"]\n",
+    "    )\n",
+    "    ser.set_model(gensen)\n",
+    "    ser.set_params_senteval()  # accepts defaults\n",
+    "\n",
+    "    # Define the batcher and prepare functions for SentEval\n",
+    "    def prepare(params, samples):\n",
+    "        vocab = set()\n",
+    "        for sample in samples:\n",
+    "            if params.current_task != \"TREC\":\n",
+    "                sample = \" \".join(sample).lower().split()\n",
+    "            else:\n",
+    "                sample = \" \".join(sample).split()\n",
+    "            for word in sample:\n",
+    "                if word not in vocab:\n",
+    "                    vocab.add(word)\n",
+    "\n",
+    "        vocab.add(\"<s>\")\n",
+    "        vocab.add(\"<pad>\")\n",
+    "        vocab.add(\"<unk>\")\n",
+    "        vocab.add(\"</s>\")\n",
+    "        # Optional vocab expansion\n",
+    "        # params[\"model\"].vocab_expansion(vocab)\n",
+    "\n",
+    "    def batcher(params, batch):\n",
+    "        # batch contains list of words\n",
+    "        max_tasks = [\"MR\", \"CR\", \"SUBJ\", \"MPQA\", \"ImageCaptionRetrieval\"]\n",
+    "        if params.current_task in max_tasks:\n",
+    "            strategy = \"max\"\n",
+    "        else:\n",
+    "            strategy = \"last\"\n",
+    "\n",
+    "        sentences = [\" \".join(s).lower() for s in batch]\n",
+    "        _, embeddings = params[\"model\"].get_representation(\n",
+    "            sentences, pool=strategy, return_numpy=True\n",
+    "        )\n",
+    "        return embeddings\n",
+    "\n",
+    "    # Run SentEval\n",
+    "    results = ser.run(batcher, prepare)\n",
+    "\n",
+    "    # Print results as table\n",
+    "    eval_metrics = ser.print_mean(\n",
+    "        results,\n",
+    "        selected_metrics=[\"pearson\", \"spearman\"],\n",
+    "    )\n",
+    "    print(eval_metrics.head(eval_metrics.shape[0]))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Create a Pytorch Estimator to submit the evaluation script to the compute target"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from azureml.train.dnn import PyTorch\n",
+    "from azureml.core.runconfig import MpiConfiguration\n",
+    "\n",
+    "est = PyTorch(\n",
+    "    source_directory=src_dir,\n",
+    "    script_params={\n",
+    "        \"--ds_gensen\": ds.path(\"{}/gensen_lib\".format(ds_root)).as_mount(),\n",
+    "        \"--ds_senteval\": ds.path(\"{}/senteval_lib\".format(ds_root)).as_mount(),\n",
+    "        \"--ds_utils\": ds.path(\"{}/utils_nlp\".format(ds_root)).as_mount(),\n",
+    "    },\n",
+    "    compute_target=compute_target,\n",
+    "    entry_script=\"evaluate.py\",\n",
+    "    node_count=4,\n",
+    "    process_count_per_node=1,\n",
+    "    distributed_training=MpiConfiguration(),\n",
+    "    use_gpu=True,\n",
+    "    framework_version=\"1.0\",\n",
+    "    conda_packages=[\"h5py\", \"nltk\"],\n",
+    "    pip_packages=[\"pandas\"],\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Run Evaluation"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from azureml.core import Experiment\n",
+    "\n",
+    "experiment = Experiment(ws, name=\"senteval-pytorch-gensen\")\n",
+    "run = experiment.submit(est)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Visualize the run via a Jupyter widget."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from azureml.widgets import RunDetails\n",
+    "\n",
+    "RunDetails(run).show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Alternatively, block until the script has completed."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#run.wait_for_completion(show_output=AZUREML_VERBOSE)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.8"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/scenarios/sentence_similarity/senteval_local.ipynb
+++ b/scenarios/sentence_similarity/senteval_local.ipynb
@ -0,0 +1,237 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# SentEval on Local\n",
+    "\n",
+    "SentEval is a widely used benchmarking tool for evaluating general-purpose sentence embeddings. It provides a simple interface for evaluating your embeddings on up to 17 supported downstream tasks (such as sentiment classification, natural language inference, semantic similarity, etc.)\n",
+    "\n",
+    "Running SentEval locally is simple. Clone the [repository](https://github.com/facebookresearch/SentEval), follow their setup instructions to get the data for the transfer tasks, and implement two functions `prepare(params, samples)` and `batcher(params, batch)` specific to your model. The authors provide some guidance on how to do this in the [examples](https://github.com/facebookresearch/SentEval/tree/master/examples) directory of their repository. In this notebook we show an example for evaluating the GenSen model on the available STS downstream tasks."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 00 Global Settings"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "import sys\n",
+    "import json\n",
+    "import torch\n",
+    "import pandas as pd\n",
+    "\n",
+    "sys.path.append(\"../../\")\n",
+    "from utils_nlp.eval.senteval import SentEvalRunner\n",
+    "\n",
+    "print(\"System version: {}\".format(sys.version))\n",
+    "print(\"Torch version: {}\".format(torch.__version__))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 01 SentEval Settings"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "PATH_TO_SENTEVAL = (\n",
+    "    \"../../../../SentEval\"\n",
+    ")  # Set this path to where you have cloned the senteval source code\n",
+    "sys.path.insert(0, PATH_TO_SENTEVAL)\n",
+    "import senteval\n",
+    "\n",
+    "transfer_tasks = [\"STSBenchmark\", \"STS12\", \"STS13\", \"STS14\", \"STS15\", \"STS16\"]\n",
+    "\n",
+    "params_senteval = {\n",
+    "    \"task_path\": os.path.join(PATH_TO_SENTEVAL, \"data\"),\n",
+    "    \"usepytorch\": True,\n",
+    "    \"kfold\": 10,\n",
+    "}\n",
+    "params_senteval[\"classifier\"] = {\n",
+    "    \"nhid\": 0,\n",
+    "    \"optim\": \"adam\",\n",
+    "    \"batch_size\": 64,\n",
+    "    \"tenacity\": 5,\n",
+    "    \"epoch_size\": 4,\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 02 GenSen Settings"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "PATH_TO_GENSEN = (\n",
+    "    \"../../../../gensen\"\n",
+    ")  # Set this path to where you have cloned the gensen source code\n",
+    "sys.path.append(PATH_TO_GENSEN)\n",
+    "from gensen import GenSen, GenSenSingle\n",
+    "\n",
+    "model_params = {}\n",
+    "model_params[\"folder_path\"] = os.path.join(PATH_TO_GENSEN, \"data/models\")\n",
+    "model_params[\"prefix_1\"] = \"nli_large_bothskip_parse\"\n",
+    "model_params[\"prefix_2\"] = \"nli_large_bothskip\"\n",
+    "model_params[\"pretrain\"] = os.path.join(\n",
+    "    PATH_TO_GENSEN, \"data/embedding/glove.840B.300d.h5\"\n",
+    ")\n",
+    "model_params[\"cuda\"] = torch.cuda.is_available()\n",
+    "\n",
+    "print(\"model params: {}\".format(json.dumps(model_params, indent=4)))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 03 SentEval Functions"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "As specified in the SentEval [repo](https://github.com/facebookresearch/SentEval#how-to-use-senteval), we implement 2 functions:\n",
+    "\n",
+    "<b>prepare</b> (sees the whole dataset of each task and can thus construct the word vocabulary, the dictionary of word vectors etc)         \n",
+    "<b>batcher</b> (transforms a batch of text sentences into sentence embeddings)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def prepare(params, samples):\n",
+    "    vocab = set()\n",
+    "    for sample in samples:\n",
+    "        if params.current_task != \"TREC\":\n",
+    "            sample = \" \".join(sample).lower().split()\n",
+    "        else:\n",
+    "            sample = \" \".join(sample).split()\n",
+    "        for word in sample:\n",
+    "            if word not in vocab:\n",
+    "                vocab.add(word)\n",
+    "\n",
+    "    vocab.add(\"<s>\")\n",
+    "    vocab.add(\"<pad>\")\n",
+    "    vocab.add(\"<unk>\")\n",
+    "    vocab.add(\"</s>\")\n",
+    "    # Optional vocab expansion\n",
+    "    # params[\"model\"].vocab_expansion(vocab)\n",
+    "\n",
+    "\n",
+    "def batcher(params, batch):\n",
+    "    # batch contains list of words\n",
+    "    max_tasks = [\"MR\", \"CR\", \"SUBJ\", \"MPQA\", \"ImageCaptionRetrieval\"]\n",
+    "    if params.current_task in max_tasks:\n",
+    "        strategy = \"max\"\n",
+    "    else:\n",
+    "        strategy = \"last\"\n",
+    "\n",
+    "    sentences = [\" \".join(s).lower() for s in batch]\n",
+    "    _, embeddings = params[\"model\"].get_representation(\n",
+    "        sentences, pool=strategy, return_numpy=True\n",
+    "    )\n",
+    "    return embeddings"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 04 Run SentEval on GenSen"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "gensen_1 = GenSenSingle(\n",
+    "    model_folder=model_params[\"folder_path\"],\n",
+    "    filename_prefix=model_params[\"prefix_1\"],\n",
+    "    pretrained_emb=model_params[\"pretrain\"],\n",
+    "    cuda=model_params[\"cuda\"],\n",
+    ")\n",
+    "gensen_2 = GenSenSingle(\n",
+    "    model_folder=model_params[\"folder_path\"],\n",
+    "    filename_prefix=model_params[\"prefix_2\"],\n",
+    "    pretrained_emb=model_params[\"pretrain\"],\n",
+    "    cuda=model_params[\"cuda\"],\n",
+    ")\n",
+    "gensen = GenSen(gensen_1, gensen_2)\n",
+    "\n",
+    "ser = SentEvalRunner(path_to_senteval=PATH_TO_SENTEVAL, use_azureml=False)\n",
+    "ser.set_transfer_data_path(os.path.join(PATH_TO_SENTEVAL, \"data\"))\n",
+    "ser.set_transfer_tasks(transfer_tasks)\n",
+    "ser.set_model(gensen)\n",
+    "ser.set_params_senteval()\n",
+    "results = ser.run(batcher, prepare)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Print selected metrics from the model's results on the transfer tasks as a table."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "eval_metrics = ser.print_mean(results, selected_metrics=[\"pearson\", \"spearman\"])\n",
+    "print(eval_metrics.head(eval_metrics.shape[0]))"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.8"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/utils_nlp/eval/senteval.py
+++ b/utils_nlp/eval/senteval.py
@ -0,0 +1,128 @@
+import os
+import sys
+import pandas as pd
+
+
+class SentEvalRunner:
+    def __init__(self, path_to_senteval=".", use_azureml=False):
+        """AzureML-compatible wrapper class that interfaces with the original implementation of SentEval
+        
+        Args:
+            path_to_senteval (str, optional): Path to the SentEval source code.
+            use_azureml (bool, optional): Defaults to false.
+        """
+        self.path_to_senteval = path_to_senteval
+        self.use_azureml = use_azureml
+
+    def set_transfer_data_path(self, relative_path):
+        """Set the datapath that contains the datasets for the SentEval transfer tasks
+        
+        Args:
+            relative_path (str): Relative datapath
+        """
+        self.transfer_data_path = os.path.join(
+            self.path_to_senteval, relative_path
+        )
+
+    def set_transfer_tasks(self, task_list):
+        """Set the transfer tasks to use for evaluation
+        
+        Args:
+            task_list (list(str)): List of downstream transfer tasks
+        """
+        self.transfer_tasks = task_list
+
+    def set_model(self, model):
+        """Set the model to evaluate"""
+        self.model = model
+
+    def set_params_senteval(
+        self,
+        use_pytorch=True,
+        kfold=10,
+        nhid=0,
+        optim="adam",
+        batch_size=64,
+        tenacity=5,
+        epoch_size=4,
+    ):
+        """
+        Define the required parameters for SentEval (model, task_path, usepytorch, kfold).
+        Also gives the option to directly set parameters for a classifier if necessary.
+        """
+        self.params_senteval = {
+            "model": self.model,
+            "task_path": self.transfer_data_path,
+            "usepytorch": use_pytorch,
+            "kfold": kfold,
+        }
+        classifying_tasks = {
+            "MR",
+            "CR",
+            "SUBJ",
+            "MPQA",
+            "SST2",
+            "SST5",
+            "TREC",
+            "SICKEntailment",
+            "SNLI",
+            "MRPC",
+        }
+        if any(t in classifying_tasks for t in self.transfer_tasks):
+            self.params_senteval["classifier"] = {
+                "nhid": nhid,
+                "optim": optim,
+                "batch_size": batch_size,
+                "tenacity": tenacity,
+                "epoch_size": epoch_size,
+            }
+
+    def run(self, batcher_func, prepare_func):
+        """Run the SentEval engine on the model on the transfer tasks
+        
+        Args:
+            batcher_func (function): Function required by SentEval that transforms a batch of text sentences into 
+                                     sentence embeddings
+            prepare_func (function): Function that sees the whole dataset of each task and can thus construct the word 
+                                     vocabulary, the dictionary of word vectors, etc
+        
+        Returns:
+            dict: Dictionary of results
+        """
+        if self.use_azureml:
+            sys.path.insert(
+                0, os.path.relpath(self.path_to_senteval, os.getcwd())
+            )
+            import senteval
+        else:
+            sys.path.insert(0, self.path_to_senteval)
+            import senteval
+
+        se = senteval.engine.SE(
+            self.params_senteval, batcher_func, prepare_func
+        )
+
+        return se.eval(self.transfer_tasks)
+
+    def print_mean(self, results, selected_metrics=[], round_decimals=3):
+        """Print the means of selected metrics of the transfer tasks as a table
+        
+        Args:
+            results (dict): Results from the SentEval evaluation engine
+            selected_metrics (list(str), optional): List of metric names
+            round_decimals (int, optional): Number of decimal digits to round to; defaults to 3
+        """
+        data = []
+        for task in self.transfer_tasks:
+            if "all" in results[task]:
+                row = [
+                    results[task]["all"][metric]["mean"]
+                    for metric in selected_metrics
+                ]
+            else:
+                row = [results[task][metric] for metric in selected_metrics]
+            data.append(row)
+        table = pd.DataFrame(
+            data=data, columns=selected_metrics, index=self.transfer_tasks
+        )
+        return table.round(round_decimals)