NimbusML-Samples/samples/2.1 [Text] Sentiment Analys...

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Sentiment Analysis 1 - Data Loading with Pandas\n",
    "\n",
    "In this example, we develop a binary classifier using the manually generated Twitter data to detect the sentiment of each tweet. For example, \"This is awesome!\" will be a positive one and \"I am sad\" will be negative. The input data is the text and we use NimbusML NGramFeaturizer to extract numeric features and input them to a AveragedPerceptron classifier."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Loading Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import time\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "import os\n",
    "from IPython.display import display\n",
    "from nimbusml.datasets import get_dataset\n",
    "from nimbusml.feature_extraction.text import NGramFeaturizer\n",
    "from nimbusml.feature_extraction.text.extractor import Ngram\n",
    "from nimbusml.linear_model import AveragedPerceptronBinaryClassifier\n",
    "from nimbusml.decomposition import PcaTransformer\n",
    "from nimbusml import Pipeline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Train data file path: train-twitter.gen-sample.tsv\n",
      "Test data file path: test-twitter.gen-sample.tsv\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Sentiment</th>\n",
       "      <th>Text</th>\n",
       "      <th>Label</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Negative</td>\n",
       "      <td>Oh you are hurting me</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Positive</td>\n",
       "      <td>So long</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Positive</td>\n",
       "      <td>Ths sofa is comfortable</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Negative</td>\n",
       "      <td>The place suck.    No?</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Positive</td>\n",
       "      <td>@fakeid &amp;quot;Chillin&amp;quot; I love it!!</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Sentiment                                     Text  Label\n",
       "0  Negative                    Oh you are hurting me      0\n",
       "1  Positive                                  So long      1\n",
       "2  Positive                  Ths sofa is comfortable      1\n",
       "3  Negative                   The place suck.    No?      0\n",
       "4  Positive  @fakeid &quot;Chillin&quot; I love it!!      1"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Load data from package\n",
    "trainDataFile = get_dataset('gen_twittertrain').as_filepath()\n",
    "testDataFile = get_dataset('gen_twittertest').as_filepath()\n",
    "print(\"Train data file path: \" + str(os.path.basename(trainDataFile)))\n",
    "print(\"Test data file path: \" + str(os.path.basename(testDataFile)))\n",
    "\n",
    "trainData = pd.read_csv(trainDataFile, sep = \"\\t\")\n",
    "testData = pd.read_csv(testDataFile, sep = \"\\t\")\n",
    "\n",
    "trainData.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We use the \"Text\" column as the input feature and the \"Sentiment\" column as the label column (after converting to numeric)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Preprocessing"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The NGramFeaturizer transform produces a bag of counts of sequences of consecutive words, called n-grams, from a given corpus of text. The word counts are then normalized using term frequency-inverse document frequency (TF-IDF) method. \n",
    "\n",
    "In NimbusML, the user can specify the input column names for each operator to be executed on. If not, all the columns from the previous operator or the origin dataset will be used. In Tutorial 2.2, the column syntax of nimbusml will be discussed in more details.\n",
    "\n",
    "For text featurizer, since the output has multiple columns, for visualization, the names for those will become \"output_col_name.[word sequence] \" to represent the count for word sequence [word sequence] after normalization. In this example, we train the model with only one column, column \"Text\"."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "featurizer = NGramFeaturizer(word_feature_extractor=Ngram(weighting = 'TfIdf'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then we can call .fit_transform() to train the featurizer."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(71, 1007)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Text.Char.&lt;␂&gt;|o|h</th>\n",
       "      <th>Text.Char.o|h|&lt;␠&gt;</th>\n",
       "      <th>Text.Char.h|&lt;␠&gt;|y</th>\n",
       "      <th>Text.Char.&lt;␠&gt;|y|o</th>\n",
       "      <th>Text.Char.y|o|u</th>\n",
       "      <th>Text.Char.o|u|&lt;␠&gt;</th>\n",
       "      <th>Text.Char.u|&lt;␠&gt;|a</th>\n",
       "      <th>Text.Char.&lt;␠&gt;|a|r</th>\n",
       "      <th>Text.Char.a|r|e</th>\n",
       "      <th>Text.Char.r|e|&lt;␠&gt;</th>\n",
       "      <th>...</th>\n",
       "      <th>Text.Word.upset</th>\n",
       "      <th>Text.Word.vocation</th>\n",
       "      <th>Text.Word.flight</th>\n",
       "      <th>Text.Word.late</th>\n",
       "      <th>Text.Word.again</th>\n",
       "      <th>Text.Word.commute</th>\n",
       "      <th>Text.Word.died</th>\n",
       "      <th>Text.Word.cancer</th>\n",
       "      <th>Text.Word.oh,</th>\n",
       "      <th>Text.Word.finally</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.218218</td>\n",
       "      <td>0.218218</td>\n",
       "      <td>0.218218</td>\n",
       "      <td>0.218218</td>\n",
       "      <td>0.218218</td>\n",
       "      <td>0.218218</td>\n",
       "      <td>0.218218</td>\n",
       "      <td>0.218218</td>\n",
       "      <td>0.218218</td>\n",
       "      <td>0.218218</td>\n",
       "      <td>...</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>...</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>...</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>...</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>...</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 1007 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "   Text.Char.<␂>|o|h  Text.Char.o|h|<␠>  Text.Char.h|<␠>|y  Text.Char.<␠>|y|o  \\\n",
       "0           0.218218           0.218218           0.218218           0.218218   \n",
       "1           0.000000           0.000000           0.000000           0.000000   \n",
       "2           0.000000           0.000000           0.000000           0.000000   \n",
       "3           0.000000           0.000000           0.000000           0.000000   \n",
       "4           0.000000           0.000000           0.000000           0.000000   \n",
       "\n",
       "   Text.Char.y|o|u  Text.Char.o|u|<␠>  Text.Char.u|<␠>|a  Text.Char.<␠>|a|r  \\\n",
       "0         0.218218           0.218218           0.218218           0.218218   \n",
       "1         0.000000           0.000000           0.000000           0.000000   \n",
       "2         0.000000           0.000000           0.000000           0.000000   \n",
       "3         0.000000           0.000000           0.000000           0.000000   \n",
       "4         0.000000           0.000000           0.000000           0.000000   \n",
       "\n",
       "   Text.Char.a|r|e  Text.Char.r|e|<␠>  ...  Text.Word.upset  \\\n",
       "0         0.218218           0.218218  ...              0.0   \n",
       "1         0.000000           0.000000  ...              0.0   \n",
       "2         0.000000           0.000000  ...              0.0   \n",
       "3         0.000000           0.000000  ...              0.0   \n",
       "4         0.000000           0.000000  ...              0.0   \n",
       "\n",
       "   Text.Word.vocation  Text.Word.flight  Text.Word.late  Text.Word.again  \\\n",
       "0                 0.0               0.0             0.0              0.0   \n",
       "1                 0.0               0.0             0.0              0.0   \n",
       "2                 0.0               0.0             0.0              0.0   \n",
       "3                 0.0               0.0             0.0              0.0   \n",
       "4                 0.0               0.0             0.0              0.0   \n",
       "\n",
       "   Text.Word.commute  Text.Word.died  Text.Word.cancer  Text.Word.oh,  \\\n",
       "0                0.0             0.0               0.0            0.0   \n",
       "1                0.0             0.0               0.0            0.0   \n",
       "2                0.0             0.0               0.0            0.0   \n",
       "3                0.0             0.0               0.0            0.0   \n",
       "4                0.0             0.0               0.0            0.0   \n",
       "\n",
       "   Text.Word.finally  \n",
       "0                0.0  \n",
       "1                0.0  \n",
       "2                0.0  \n",
       "3                0.0  \n",
       "4                0.0  \n",
       "\n",
       "[5 rows x 1007 columns]"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "text_transformed = featurizer.fit_transform(trainData[\"Text\"].to_frame()) # Using one column as input\n",
    "print(text_transformed.shape)\n",
    "text_transformed.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can see that, all the columns are the generated features from the original \"Text\" column. Based on those features, we can train a binary classifier."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Binary Classifier"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The user can use the transformed data as the input to the binary classifier using .fit(X,Y). \n",
    "\n",
    "                                            ag = AveragedPerceptronBinaryClassifier()\n",
    "                                            ag.fit(text_transformed, 1 * (trainData[\"Sentiment\"] == \"Positive\"))\n",
    "                                            \n",
    "The user can also use NimbusML pipeline to train the featurizer and the learner together."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Automatically adding a MinMax normalization transform, use 'norm=Warn' or 'norm=No' to turn this behavior off.\n",
      "Training calibrator.\n",
      "Elapsed time: 00:00:00.7693293\n",
      "Training time: 1.0\n"
     ]
    }
   ],
   "source": [
    "t0 = time.time()\n",
    "\n",
    "ppl = Pipeline([\n",
    "                NGramFeaturizer(word_feature_extractor=Ngram(weighting = 'Tf')), \n",
    "                PcaTransformer(rank = 100),\n",
    "                AveragedPerceptronBinaryClassifier(l2_regularization=0.4,\n",
    "                                                   number_of_iterations=5),\n",
    "               ])\n",
    "\n",
    "ppl.fit(trainData[\"Text\"], trainData[\"Label\"]) #will replace with series if supported\n",
    "\n",
    "print(\"Training time: \"  + str(round(time.time() - t0, 2)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Using the NimbusML pipeline, we can call ppl.test(test_X,test_Y)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Performance metrics: \n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>AUC</th>\n",
       "      <th>Accuracy</th>\n",
       "      <th>Positive precision</th>\n",
       "      <th>Positive recall</th>\n",
       "      <th>Negative precision</th>\n",
       "      <th>Negative recall</th>\n",
       "      <th>Log-loss</th>\n",
       "      <th>Log-loss reduction</th>\n",
       "      <th>Test-set entropy (prior Log-Loss/instance)</th>\n",
       "      <th>F1 Score</th>\n",
       "      <th>AUPRC</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.621295</td>\n",
       "      <td>0.662791</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.662791</td>\n",
       "      <td>1</td>\n",
       "      <td>1.006758</td>\n",
       "      <td>-0.091782</td>\n",
       "      <td>0.922123</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.529486</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        AUC  Accuracy  Positive precision  Positive recall  \\\n",
       "0  0.621295  0.662791                   0                0   \n",
       "\n",
       "   Negative precision  Negative recall  Log-loss  Log-loss reduction  \\\n",
       "0            0.662791                1  1.006758           -0.091782   \n",
       "\n",
       "   Test-set entropy (prior Log-Loss/instance)  F1 Score     AUPRC  \n",
       "0                                    0.922123       NaN  0.529486  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Individual scores: \n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PredictedLabel</th>\n",
       "      <th>Score</th>\n",
       "      <th>Probability</th>\n",
       "      <th>OriginText</th>\n",
       "      <th>Sentiment</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>-0.128473</td>\n",
       "      <td>0.135174</td>\n",
       "      <td>@faketwitterid I am sad</td>\n",
       "      <td>Negative</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>-0.125328</td>\n",
       "      <td>0.149867</td>\n",
       "      <td>@wakeup_you  It is a very simple twit I created</td>\n",
       "      <td>Negative</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0</td>\n",
       "      <td>-0.114177</td>\n",
       "      <td>0.212648</td>\n",
       "      <td>@anotherfakeid I would love to see the latest ...</td>\n",
       "      <td>Positive</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0</td>\n",
       "      <td>-0.164016</td>\n",
       "      <td>0.038579</td>\n",
       "      <td>Oh my ladygaga! I haven't played tennis for 2 ...</td>\n",
       "      <td>Negative</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0</td>\n",
       "      <td>-0.091164</td>\n",
       "      <td>0.394444</td>\n",
       "      <td>I am heading on a road trip and taking a few d...</td>\n",
       "      <td>Positive</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   PredictedLabel     Score  Probability  \\\n",
       "0               0 -0.128473     0.135174   \n",
       "1               0 -0.125328     0.149867   \n",
       "2               0 -0.114177     0.212648   \n",
       "3               0 -0.164016     0.038579   \n",
       "4               0 -0.091164     0.394444   \n",
       "\n",
       "                                          OriginText Sentiment  \n",
       "0                            @faketwitterid I am sad  Negative  \n",
       "1    @wakeup_you  It is a very simple twit I created  Negative  \n",
       "2  @anotherfakeid I would love to see the latest ...  Positive  \n",
       "3  Oh my ladygaga! I haven't played tennis for 2 ...  Negative  \n",
       "4  I am heading on a road trip and taking a few d...  Positive  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Total runtime: 4.03\n"
     ]
    }
   ],
   "source": [
    "metrics, scores = ppl.test(testData[\"Text\"], testData[\"Label\"], output_scores = True) #replace with series \n",
    "print(\"Performance metrics: \")\n",
    "display(metrics)\n",
    "print(\"Individual scores: \")\n",
    "\n",
    "# Append origin text to the score\n",
    "scores[\"OriginText\"] = testData[\"Text\"]\n",
    "scores[\"Sentiment\"] = testData[\"Sentiment\"]\n",
    "\n",
    "display(scores[0:5])\n",
    "print(\"Total runtime: \"  + str(round(time.time() - t0, 2)))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}