Merge pull request #85 from microsoft/casey-senteval

SentEval examples (local and with azureml support)
2019-06-11 14:39:51 -04:00 · 2019-06-11 14:39:51 -04:00 · 98a7071294
--- a/scenarios/sentence_similarity/senteval_azureml.ipynb
+++ b/scenarios/sentence_similarity/senteval_azureml.ipynb
@ -0,0 +1,404 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# SentEval with AzureML\n",
+    "[SentEval](https://github.com/facebookresearch/SentEval) is a widely used benchmarking tool for evaluating general-purpose sentence embeddings. It provides a simple interface for evaluating your embeddings on up to 17 supported downstream tasks (such as sentiment classification, natural language inference, semantic similarity, etc.)\n",
+    "\n",
+    "This notebook shows how to use SentEval with the AzureML SDK. Running SentEval locally is easy, but not necessarily efficient depending on the model specs. For example, it can quickly become expensive if you are trying to benchmark a model that runs on GPU, even if you are starting with pretrained weights (loading the embeddings and vocabulary for inferencing can take a nontrivial amount of time). In this example we show how to run SentEval for [Gensen](https://github.com/Maluuba/gensen), where\n",
+    "- the model weights are on AzureML Datastore. To download the pre-trained Gensen model, run `bash download_models.sh` from the gensen/data/models directory. \n",
+    "- the embeddings are on AzureML Datastore. To download the pre-trained embeddings, run `bash glove2h5.sh` from the gensen/data/embedding directory.\n",
+    "- the data for the SentEval transfer tasks are on AzureML Datastore. To download these datasets, run `bash get_transfer_data.bash` from the SentEval/data/downstream directory.\n",
+    "- evaluation runs on the AzureML Workspace GPU Compute Target (no extra provisioning/config needed)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Global Settings"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "import sys\n",
+    "\n",
+    "import azureml.core\n",
+    "from azureml.core.workspace import Workspace\n",
+    "\n",
+    "from azureml.core.compute import ComputeTarget, AmlCompute\n",
+    "from azureml.core.compute_target import ComputeTargetException\n",
+    "\n",
+    "from azureml.core import Datastore\n",
+    "import azureml.data\n",
+    "from azureml.data.azure_storage_datastore import AzureFileDatastore\n",
+    "\n",
+    "from azureml.train.dnn import PyTorch\n",
+    "from azureml.core.runconfig import MpiConfiguration\n",
+    "from azureml.core import Experiment\n",
+    "from azureml.widgets import RunDetails\n",
+    "\n",
+    "sys.path.append(\"../../\")\n",
+    "from utils_nlp.azureml.azureml_utils import get_or_create_workspace\n",
+    "\n",
+    "AZUREML_VERBOSE = False\n",
+    "\n",
+    "PATH_TO_GENSEN = (\n",
+    "    \"../../../gensen\"\n",
+    ")  # Set this path to where you have cloned the gensen source code\n",
+    "PATH_TO_SENTEVAL = (\n",
+    "    \"../../../SentEval\"\n",
+    ")  # Set this path to where you have cloned the senteval source code\n",
+    "\n",
+    "cluster_name = \"eval-gpu\"  # Name of AzureML Compute Target cluster\n",
+    "ds_root = \"senteval_pytorch_gensen\"  # Name of root directory for the datastore"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Define the AzureML Workspace"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "ws = get_or_create_workspace(\n",
+    "    subscription_id=\"<SUBSCRIPTION_ID>\",\n",
+    "    resource_group=\"<RESOURCE_GROUP>\",\n",
+    "    workspace_name=\"<WORKSPACE_NAME>\",\n",
+    "    workspace_region=\"<WORKSPACE_REGION>\",\n",
+    ")\n",
+    "\n",
+    "if AZUREML_VERBOSE:\n",
+    "    print(\"Workspace name: {}\".format(ws.name))\n",
+    "    print(\"Resource group: {}\".format(ws.resource_group))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Attach the gpu-enabled compute target, or create a new one if it doesn't already exist."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "try:\n",
+    "    compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",
+    "    print(\"Found compute target: {}\".format(cluster_name))\n",
+    "except ComputeTargetException:\n",
+    "    print(\"Creating new compute target: {}\".format(cluster_name))\n",
+    "    compute_config = AmlCompute.provisioning_configuration(\n",
+    "        vm_size=\"STANDARD_NC6\", max_nodes=4\n",
+    "    )\n",
+    "    compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n",
+    "    compute_target.wait_for_completion(show_output=True)\n",
+    "\n",
+    "if AZUREML_VERBOSE:\n",
+    "    print(compute_target.get_status().serialize())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Define the datastore. Here we will use the default datastore and then upload our external dependencies. \n",
+    "\n",
+    "If your data is already on the cloud, you can register your resource on any Azure storage account as the datastore. (Currently, the list of supported Azure storage services that can be registered as datastores are Azure Blob Container, Azure File Share, Azure Data Lake, Azure Data Lake Gen2, Azure SQL Database, Azure PostgreSQL, and Databricks File System. Learn more about the Datastore module [here](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.datastore?view=azure-ml-py).)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "ds = ws.get_default_datastore()\n",
+    "if AZUREML_VERBOSE:\n",
+    "    print(\"Default datastore: {}\".format(ds.name))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "# Upload the gensen dependency\n",
+    "ds.upload(\n",
+    "    src_dir=os.path.join(PATH_TO_GENSEN),\n",
+    "    target_path=os.path.join(ds_root, \"gensen_lib\"),\n",
+    "    overwrite=False,\n",
+    "    show_progress=AZUREML_VERBOSE,\n",
+    ")\n",
+    "\n",
+    "# Upload the senteval dependency\n",
+    "ds.upload(\n",
+    "    src_dir=os.path.join(PATH_TO_SENTEVAL),\n",
+    "    target_path=os.path.join(ds_root, \"senteval_lib\"),\n",
+    "    overwrite=False,\n",
+    "    show_progress=AZUREML_VERBOSE,\n",
+    ")\n",
+    "\n",
+    "# Upload the utils_nlp/eval/senteval.py dependency (this defines the azureml-compatible wrapper for senteval)\n",
+    "ds.upload_files(\n",
+    "    files=[\"../../utils_nlp/eval/senteval.py\"],\n",
+    "    target_path=os.path.join(ds_root, \"utils_nlp/eval\"),\n",
+    "    overwrite=False,\n",
+    "    show_progress=AZUREML_VERBOSE,\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Note that after the upload is complete, you can safely delete the dependencies from your local machine to free up some memory."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Create the evaluation script"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "src_dir = \"./senteval-pytorch-gensen\"\n",
+    "os.makedirs(src_dir, exist_ok=True)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%writefile $src_dir/evaluate.py\n",
+    "import os\n",
+    "import sys\n",
+    "import argparse\n",
+    "import torch\n",
+    "import pandas as pd\n",
+    "\n",
+    "if __name__ == \"__main__\":\n",
+    "    parser = argparse.ArgumentParser()\n",
+    "    parser.add_argument(\"--ds_gensen\", type=str, dest=\"ds_gensen\")\n",
+    "    parser.add_argument(\"--ds_senteval\", type=str, dest=\"ds_senteval\")\n",
+    "    parser.add_argument(\"--ds_utils\", type=str, dest=\"ds_utils\")\n",
+    "    args = parser.parse_args()\n",
+    "    \n",
+    "    # Import the dependencies\n",
+    "    sys.path.insert(0, args.ds_gensen)\n",
+    "    from gensen import GenSen, GenSenSingle\n",
+    "    sys.path.insert(0, args.ds_utils)\n",
+    "    from eval.senteval import SentEvalRunner\n",
+    "\n",
+    "    # Define the model\n",
+    "    model_params = {}\n",
+    "    model_params[\"folder_path\"] = os.path.join(args.ds_gensen, \"data/models\")\n",
+    "    model_params[\"prefix_1\"] = \"nli_large_bothskip_parse\"\n",
+    "    model_params[\"prefix_2\"] = \"nli_large_bothskip\"\n",
+    "    model_params[\"pretrain\"] = os.path.join(\n",
+    "        args.ds_gensen, \"data/embedding/glove.840B.300d.h5\"\n",
+    "    )\n",
+    "    model_params[\"cuda\"] = torch.cuda.is_available()\n",
+    "\n",
+    "    gensen_1 = GenSenSingle(\n",
+    "        model_folder=model_params[\"folder_path\"],\n",
+    "        filename_prefix=model_params[\"prefix_1\"],\n",
+    "        pretrained_emb=model_params[\"pretrain\"],\n",
+    "        cuda=model_params[\"cuda\"],\n",
+    "    )\n",
+    "    gensen_2 = GenSenSingle(\n",
+    "        model_folder=model_params[\"folder_path\"],\n",
+    "        filename_prefix=model_params[\"prefix_2\"],\n",
+    "        pretrained_emb=model_params[\"pretrain\"],\n",
+    "        cuda=model_params[\"cuda\"],\n",
+    "    )\n",
+    "    gensen = GenSen(gensen_1, gensen_2)\n",
+    "\n",
+    "    # Define the SentEval Runner, an AzureML-compatible wrapper class for SentEval\n",
+    "    ser = SentEvalRunner(path_to_senteval=args.ds_senteval, use_azureml=True)\n",
+    "    ser.set_transfer_data_path(relative_path=\"data\")\n",
+    "    ser.set_transfer_tasks(\n",
+    "        [\"STSBenchmark\", \"STS12\", \"STS13\", \"STS14\", \"STS15\", \"STS16\"]\n",
+    "    )\n",
+    "    ser.set_model(gensen)\n",
+    "    ser.set_params_senteval()  # accepts defaults\n",
+    "\n",
+    "    # Define the batcher and prepare functions for SentEval\n",
+    "    def prepare(params, samples):\n",
+    "        vocab = set()\n",
+    "        for sample in samples:\n",
+    "            if params.current_task != \"TREC\":\n",
+    "                sample = \" \".join(sample).lower().split()\n",
+    "            else:\n",
+    "                sample = \" \".join(sample).split()\n",
+    "            for word in sample:\n",
+    "                if word not in vocab:\n",
+    "                    vocab.add(word)\n",
+    "\n",
+    "        vocab.add(\"<s>\")\n",
+    "        vocab.add(\"<pad>\")\n",
+    "        vocab.add(\"<unk>\")\n",
+    "        vocab.add(\"</s>\")\n",
+    "        # Optional vocab expansion\n",
+    "        # params[\"model\"].vocab_expansion(vocab)\n",
+    "\n",
+    "    def batcher(params, batch):\n",
+    "        # batch contains list of words\n",
+    "        max_tasks = [\"MR\", \"CR\", \"SUBJ\", \"MPQA\", \"ImageCaptionRetrieval\"]\n",
+    "        if params.current_task in max_tasks:\n",
+    "            strategy = \"max\"\n",
+    "        else:\n",
+    "            strategy = \"last\"\n",
+    "\n",
+    "        sentences = [\" \".join(s).lower() for s in batch]\n",
+    "        _, embeddings = params[\"model\"].get_representation(\n",
+    "            sentences, pool=strategy, return_numpy=True\n",
+    "        )\n",
+    "        return embeddings\n",
+    "\n",
+    "    # Run SentEval\n",
+    "    results = ser.run(batcher, prepare)\n",
+    "\n",
+    "    # Print results as table\n",
+    "    eval_metrics = ser.print_mean(\n",
+    "        results,\n",
+    "        selected_metrics=[\"pearson\", \"spearman\"],\n",
+    "    )\n",
+    "    print(eval_metrics.head(eval_metrics.shape[0]))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Create a Pytorch Estimator to submit the evaluation script to the compute target"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "est = PyTorch(\n",
+    "    source_directory=src_dir,\n",
+    "    script_params={\n",
+    "        \"--ds_gensen\": ds.path(\"{}/gensen_lib\".format(ds_root)).as_mount(),\n",
+    "        \"--ds_senteval\": ds.path(\"{}/senteval_lib\".format(ds_root)).as_mount(),\n",
+    "        \"--ds_utils\": ds.path(\"{}/utils_nlp\".format(ds_root)).as_mount(),\n",
+    "    },\n",
+    "    compute_target=compute_target,\n",
+    "    entry_script=\"evaluate.py\",\n",
+    "    node_count=4,\n",
+    "    process_count_per_node=1,\n",
+    "    distributed_training=MpiConfiguration(),\n",
+    "    use_gpu=True,\n",
+    "    framework_version=\"1.0\",\n",
+    "    conda_packages=[\"h5py\", \"nltk\"],\n",
+    "    pip_packages=[\"pandas\"],\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Run Evaluation"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "experiment = Experiment(ws, name=\"senteval-pytorch-gensen\")\n",
+    "run = experiment.submit(est)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Visualize the run via a Jupyter widget."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "RunDetails(run).show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Alternatively, block until the script has completed."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#run.wait_for_completion(show_output=AZUREML_VERBOSE)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python (nlp_cpu)",
+   "language": "python",
+   "name": "nlp_cpu"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.8"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/scenarios/sentence_similarity/senteval_local.ipynb
+++ b/scenarios/sentence_similarity/senteval_local.ipynb
@ -0,0 +1,336 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# SentEval on Local\n",
+    "\n",
+    "SentEval is a widely used benchmarking tool for evaluating general-purpose sentence embeddings. It provides a simple interface for evaluating your embeddings on up to 17 supported downstream tasks (such as sentiment classification, natural language inference, semantic similarity, etc.)\n",
+    "\n",
+    "Running SentEval locally is simple. Clone the [repository](https://github.com/facebookresearch/SentEval), follow their setup instructions to get the data for the transfer tasks, and implement two functions `prepare(params, samples)` and `batcher(params, batch)` specific to your model. The authors provide some guidance on how to do this in the [examples](https://github.com/facebookresearch/SentEval/tree/master/examples) directory of their repository. In this notebook we show an example for evaluating the GenSen model on the available STS downstream tasks."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 00 Global Settings"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "System version: 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34) \n",
+      "[GCC 7.3.0]\n",
+      "Torch version: 1.0.1\n"
+     ]
+    }
+   ],
+   "source": [
+    "import os\n",
+    "import sys\n",
+    "import json\n",
+    "import torch\n",
+    "import pandas as pd\n",
+    "\n",
+    "sys.path.append(\"../../\")\n",
+    "from utils_nlp.eval.senteval import SentEvalRunner\n",
+    "\n",
+    "print(\"System version: {}\".format(sys.version))\n",
+    "print(\"Torch version: {}\".format(torch.__version__))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 01 SentEval Settings"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "PATH_TO_SENTEVAL = (\n",
+    "    \"../../../SentEval\"\n",
+    ")  # Set this path to where you have cloned the senteval source code\n",
+    "sys.path.insert(0, PATH_TO_SENTEVAL)\n",
+    "import senteval\n",
+    "\n",
+    "transfer_tasks = [\"STSBenchmark\", \"STS12\", \"STS13\", \"STS14\", \"STS15\", \"STS16\"]\n",
+    "\n",
+    "params_senteval = {\n",
+    "    \"task_path\": os.path.join(PATH_TO_SENTEVAL, \"data\"),\n",
+    "    \"usepytorch\": True,\n",
+    "    \"kfold\": 10,\n",
+    "}\n",
+    "params_senteval[\"classifier\"] = {\n",
+    "    \"nhid\": 0,\n",
+    "    \"optim\": \"adam\",\n",
+    "    \"batch_size\": 64,\n",
+    "    \"tenacity\": 5,\n",
+    "    \"epoch_size\": 4,\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 02 GenSen Settings"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "model params: {\n",
+      "    \"folder_path\": \"../../../gensen/data/models\",\n",
+      "    \"prefix_1\": \"nli_large_bothskip_parse\",\n",
+      "    \"prefix_2\": \"nli_large_bothskip\",\n",
+      "    \"pretrain\": \"../../../gensen/data/embedding/glove.840B.300d.h5\",\n",
+      "    \"cuda\": true\n",
+      "}\n"
+     ]
+    }
+   ],
+   "source": [
+    "PATH_TO_GENSEN = (\n",
+    "    \"../../../gensen\"\n",
+    ")  # Set this path to where you have cloned the gensen source code\n",
+    "sys.path.append(PATH_TO_GENSEN)\n",
+    "from gensen import GenSen, GenSenSingle\n",
+    "\n",
+    "model_params = {}\n",
+    "model_params[\"folder_path\"] = os.path.join(PATH_TO_GENSEN, \"data/models\")\n",
+    "model_params[\"prefix_1\"] = \"nli_large_bothskip_parse\"\n",
+    "model_params[\"prefix_2\"] = \"nli_large_bothskip\"\n",
+    "model_params[\"pretrain\"] = os.path.join(\n",
+    "    PATH_TO_GENSEN, \"data/embedding/glove.840B.300d.h5\"\n",
+    ")\n",
+    "model_params[\"cuda\"] = torch.cuda.is_available()\n",
+    "\n",
+    "print(\"model params: {}\".format(json.dumps(model_params, indent=4)))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 03 SentEval Functions"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "As specified in the SentEval [repo](https://github.com/facebookresearch/SentEval#how-to-use-senteval), we implement 2 functions:\n",
+    "\n",
+    "<b>prepare</b> (sees the whole dataset of each task and can thus construct the word vocabulary, the dictionary of word vectors etc)         \n",
+    "<b>batcher</b> (transforms a batch of text sentences into sentence embeddings)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def prepare(params, samples):\n",
+    "    vocab = set()\n",
+    "    for sample in samples:\n",
+    "        if params.current_task != \"TREC\":\n",
+    "            sample = \" \".join(sample).lower().split()\n",
+    "        else:\n",
+    "            sample = \" \".join(sample).split()\n",
+    "        for word in sample:\n",
+    "            if word not in vocab:\n",
+    "                vocab.add(word)\n",
+    "\n",
+    "    vocab.add(\"<s>\")\n",
+    "    vocab.add(\"<pad>\")\n",
+    "    vocab.add(\"<unk>\")\n",
+    "    vocab.add(\"</s>\")\n",
+    "    # Optional vocab expansion\n",
+    "    # params[\"model\"].vocab_expansion(vocab)\n",
+    "\n",
+    "\n",
+    "def batcher(params, batch):\n",
+    "    # batch contains list of words\n",
+    "    max_tasks = [\"MR\", \"CR\", \"SUBJ\", \"MPQA\", \"ImageCaptionRetrieval\"]\n",
+    "    if params.current_task in max_tasks:\n",
+    "        strategy = \"max\"\n",
+    "    else:\n",
+    "        strategy = \"last\"\n",
+    "\n",
+    "    sentences = [\" \".join(s).lower() for s in batch]\n",
+    "    _, embeddings = params[\"model\"].get_representation(\n",
+    "        sentences, pool=strategy, return_numpy=True\n",
+    "    )\n",
+    "    return embeddings"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 04 Run SentEval on GenSen"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "gensen_1 = GenSenSingle(\n",
+    "    model_folder=model_params[\"folder_path\"],\n",
+    "    filename_prefix=model_params[\"prefix_1\"],\n",
+    "    pretrained_emb=model_params[\"pretrain\"],\n",
+    "    cuda=model_params[\"cuda\"],\n",
+    ")\n",
+    "gensen_2 = GenSenSingle(\n",
+    "    model_folder=model_params[\"folder_path\"],\n",
+    "    filename_prefix=model_params[\"prefix_2\"],\n",
+    "    pretrained_emb=model_params[\"pretrain\"],\n",
+    "    cuda=model_params[\"cuda\"],\n",
+    ")\n",
+    "gensen = GenSen(gensen_1, gensen_2)\n",
+    "\n",
+    "ser = SentEvalRunner(path_to_senteval=PATH_TO_SENTEVAL, use_azureml=False)\n",
+    "ser.set_transfer_data_path(\"data\")\n",
+    "ser.set_transfer_tasks(transfer_tasks)\n",
+    "ser.set_model(gensen)\n",
+    "ser.set_params_senteval()\n",
+    "results = ser.run(batcher, prepare)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Print selected metrics from the model's results on the transfer tasks as a table."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>pearson</th>\n",
+       "      <th>spearman</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>STSBenchmark</th>\n",
+       "      <td>0.782</td>\n",
+       "      <td>0.786</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>STS12</th>\n",
+       "      <td>0.608</td>\n",
+       "      <td>0.609</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>STS13</th>\n",
+       "      <td>0.540</td>\n",
+       "      <td>0.551</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>STS14</th>\n",
+       "      <td>0.651</td>\n",
+       "      <td>0.636</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>STS15</th>\n",
+       "      <td>0.736</td>\n",
+       "      <td>0.738</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>STS16</th>\n",
+       "      <td>0.668</td>\n",
+       "      <td>0.672</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "              pearson  spearman\n",
+       "STSBenchmark    0.782     0.786\n",
+       "STS12           0.608     0.609\n",
+       "STS13           0.540     0.551\n",
+       "STS14           0.651     0.636\n",
+       "STS15           0.736     0.738\n",
+       "STS16           0.668     0.672"
+      ]
+     },
+     "execution_count": 6,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "eval_metrics = ser.print_mean(results, selected_metrics=[\"pearson\", \"spearman\"])\n",
+    "eval_metrics.head(eval_metrics.shape[0])"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.8"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/tests/unit/test_word_embeddings.py
+++ b/tests/unit/test_word_embeddings.py
@ -36,6 +36,7 @@ def test_load_pretrained_vectors_word2vec():

    shutil.rmtree(os.path.join(os.getcwd(), dir_path))

+    assert isinstance(load_word2vec(dir_path), Word2VecKeyedVectors)

 def test_load_pretrained_vectors_glove():
    dir_path = "temp_data/"
--- a/utils_nlp/azureml/azureml_utils.py
+++ b/utils_nlp/azureml/azureml_utils.py
@ -0,0 +1,75 @@
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT License.
+
+import os
+
+from azureml.core import Workspace
+
+
+def get_or_create_workspace(
+    config_path=None,
+    subscription_id=None,
+    resource_group=None,
+    workspace_name=None,
+    workspace_region=None,
+):
+    """Get or create AzureML Workspace this will save the config to the path specified for later use
+
+    Args:
+        config_path (str): optional directory to look for / store config.json file (defaults to current directory)
+        subscription_id (str): subscription id
+        resource_group (str): resource group
+        workspace_name (str): workspace name
+        workspace_region (str): region
+
+    Returns:
+        Workspace
+    """
+
+    # use environment variables if needed
+    if subscription_id is None:
+        subscription_id = os.getenv("SUBSCRIPTION_ID")
+    if resource_group is None:
+        resource_group = os.getenv("RESOURCE_GROUP")
+    if workspace_name is None:
+        workspace_name = os.getenv("WORKSPACE_NAME")
+    if workspace_region is None:
+        workspace_region = os.getenv("WORKSPACE_REGION")
+
+    # define fallback options in order to try
+    options = [
+        (
+            Workspace,
+            dict(
+                subscription_id=subscription_id,
+                resource_group=resource_group,
+                workspace_name=workspace_name,
+            ),
+        ),
+        (Workspace.from_config, dict(path=config_path)),
+        (
+            Workspace.create,
+            dict(
+                subscription_id=subscription_id,
+                resource_group=resource_group,
+                name=workspace_name,
+                location=workspace_region,
+                create_resource_group=True,
+                exist_ok=True,
+            ),
+        ),
+    ]
+
+    for function, kwargs in options:
+        try:
+            ws = function(**kwargs)
+            break
+        except Exception:
+            continue
+    else:
+        raise ValueError(
+            "Failed to get or create AzureML Workspace with the configuration information provided"
+        )
+
+    ws.write_config(path=config_path)
+    return ws
--- a/utils_nlp/eval/senteval.py
+++ b/utils_nlp/eval/senteval.py
@ -0,0 +1,128 @@
+import os
+import sys
+import pandas as pd
+
+
+class SentEvalRunner:
+    def __init__(self, path_to_senteval=".", use_azureml=False):
+        """AzureML-compatible wrapper class that interfaces with the original implementation of SentEval
+        
+        Args:
+            path_to_senteval (str, optional): Path to the SentEval source code.
+            use_azureml (bool, optional): Defaults to false.
+        """
+        self.path_to_senteval = path_to_senteval
+        self.use_azureml = use_azureml
+
+    def set_transfer_data_path(self, relative_path):
+        """Set the datapath that contains the datasets for the SentEval transfer tasks
+        
+        Args:
+            relative_path (str): Relative datapath
+        """
+        self.transfer_data_path = os.path.join(
+            self.path_to_senteval, relative_path
+        )
+
+    def set_transfer_tasks(self, task_list):
+        """Set the transfer tasks to use for evaluation
+        
+        Args:
+            task_list (list(str)): List of downstream transfer tasks
+        """
+        self.transfer_tasks = task_list
+
+    def set_model(self, model):
+        """Set the model to evaluate"""
+        self.model = model
+
+    def set_params_senteval(
+        self,
+        use_pytorch=True,
+        kfold=10,
+        nhid=0,
+        optim="adam",
+        batch_size=64,
+        tenacity=5,
+        epoch_size=4,
+    ):
+        """
+        Define the required parameters for SentEval (model, task_path, usepytorch, kfold).
+        Also gives the option to directly set parameters for a classifier if necessary.
+        """
+        self.params_senteval = {
+            "model": self.model,
+            "task_path": self.transfer_data_path,
+            "usepytorch": use_pytorch,
+            "kfold": kfold,
+        }
+        classifying_tasks = {
+            "MR",
+            "CR",
+            "SUBJ",
+            "MPQA",
+            "SST2",
+            "SST5",
+            "TREC",
+            "SICKEntailment",
+            "SNLI",
+            "MRPC",
+        }
+        if any(t in classifying_tasks for t in self.transfer_tasks):
+            self.params_senteval["classifier"] = {
+                "nhid": nhid,
+                "optim": optim,
+                "batch_size": batch_size,
+                "tenacity": tenacity,
+                "epoch_size": epoch_size,
+            }
+
+    def run(self, batcher_func, prepare_func):
+        """Run the SentEval engine on the model on the transfer tasks
+        
+        Args:
+            batcher_func (function): Function required by SentEval that transforms a batch of text sentences into 
+                                     sentence embeddings
+            prepare_func (function): Function that sees the whole dataset of each task and can thus construct the word 
+                                     vocabulary, the dictionary of word vectors, etc
+        
+        Returns:
+            dict: Dictionary of results
+        """
+        if self.use_azureml:
+            sys.path.insert(
+                0, os.path.relpath(self.path_to_senteval, os.getcwd())
+            )
+            import senteval
+        else:
+            sys.path.insert(0, self.path_to_senteval)
+            import senteval
+
+        se = senteval.engine.SE(
+            self.params_senteval, batcher_func, prepare_func
+        )
+
+        return se.eval(self.transfer_tasks)
+
+    def print_mean(self, results, selected_metrics=[], round_decimals=3):
+        """Print the means of selected metrics of the transfer tasks as a table
+        
+        Args:
+            results (dict): Results from the SentEval evaluation engine
+            selected_metrics (list(str), optional): List of metric names
+            round_decimals (int, optional): Number of decimal digits to round to; defaults to 3
+        """
+        data = []
+        for task in self.transfer_tasks:
+            if "all" in results[task]:
+                row = [
+                    results[task]["all"][metric]["mean"]
+                    for metric in selected_metrics
+                ]
+            else:
+                row = [results[task][metric] for metric in selected_metrics]
+            data.append(row)
+        table = pd.DataFrame(
+            data=data, columns=selected_metrics, index=self.transfer_tasks
+        )
+        return table.round(round_decimals)
--- a/utils_nlp/pretrained_embeddings/fasttext.py
+++ b/utils_nlp/pretrained_embeddings/fasttext.py
@ -31,7 +31,6 @@ def _extract_fasttext_vectors(zip_path, dest_path="."):
    os.remove(zip_path)
    return dest_path

-
 def _download_fasttext_vectors(download_dir, file_name="wiki.simple.zip"):
    """ Downloads pre-trained word vectors for English, trained on Wikipedia using
    fastText. You can directly download the vectors from here:
--- a/utils_nlp/pretrained_embeddings/glove.py
+++ b/utils_nlp/pretrained_embeddings/glove.py
@ -87,7 +87,6 @@ def load_pretrained_vectors(dir_path, file_name="glove.840B.300d.txt", limit=Non

    Returns:
        gensim.models.keyedvectors.Word2VecKeyedVectors: Loaded word2vectors
-
    """

    file_path = _maybe_download_and_extract(dir_path, file_name)
--- a/utils_nlp/pretrained_embeddings/word2vec.py
+++ b/utils_nlp/pretrained_embeddings/word2vec.py
@ -16,7 +16,6 @@ def _extract_word2vec_vectors(zip_path, dest_filepath):
    Args:
        zip_path: Path to the downloaded compressed file.
        dest_filepath: Final destination file path to the extracted zip file.
-
    """

    if os.path.exists(zip_path):