glue/notebooks/Eval - Speech Transcription...

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Evaluation of Custom Speech Transcription\n",
    "This notebook serves to evaluate your Speech-to-Tex transcriptions generated by [GLUE](https://github.com/microsoft/glue)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import required packages\n",
    "import sys\n",
    "import pandas as pd\n",
    "import configparser\n",
    "\n",
    "# Notebook specific functions\n",
    "from matplotlib import cm, pyplot as plt \n",
    "\n",
    "# Custom functions\n",
    "sys.path.append(\"../src\")\n",
    "import evaluate as ev\n",
    "\n",
    "# Notebook configs\n",
    "%matplotlib inline\n",
    "%load_ext autoreload\n",
    "%autoreload 2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Input Data\n",
    "Below, you will import the transcription file generated by GLUE in the `--do_transcribe` mode. <br>\n",
    "The evaluation will be equivalent to the one generated by the `--do_evaluation` mode, which is only printed in the console output. <br>\n",
    "Here, you will have a consistent view on the results. \n",
    "\n",
    "Make sure it has the structure below. If you used GLUE, it will have it either way:\n",
    "- Comma-separated (.csv)\n",
    "- UTF-8 encoded\n",
    "- Columns \"text\" for reference transcript and \"rec\" for recognition"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "              audio                                               text  \\\n",
       "0    BookFlight.wav        I would like to book a flight to Frankfurt.   \n",
       "1  CancelFlight.wav        I want to cancel my journey to Kuala Lumpur   \n",
       "2  ChangeFlight.wav     I would like to change my flight to Singapore.   \n",
       "3      BookSeat.wav  I would like to book a seat on my flight to St...   \n",
       "\n",
       "                                  rec  \n",
       "0  Aber leicht über Flight Frankfurt.  \n",
       "1                                Pur.  \n",
       "2   I would like to change my flight?  \n",
       "3                                      "
      ],
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>audio</th>\n      <th>text</th>\n      <th>rec</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>BookFlight.wav</td>\n      <td>I would like to book a flight to Frankfurt.</td>\n      <td>Aber leicht über Flight Frankfurt.</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>CancelFlight.wav</td>\n      <td>I want to cancel my journey to Kuala Lumpur</td>\n      <td>Pur.</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>ChangeFlight.wav</td>\n      <td>I would like to change my flight to Singapore.</td>\n      <td>I would like to change my flight?</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>BookSeat.wav</td>\n      <td>I would like to book a seat on my flight to St...</td>\n      <td></td>\n    </tr>\n  </tbody>\n</table>\n</div>"
     },
     "metadata": {},
     "execution_count": 2
    }
   ],
   "source": [
    "# Import transcription file\n",
    "res = pd.read_csv(\"../assets/examples/output_files/example_transcriptions_full.csv\", sep=\",\", encoding='utf-8')[['audio', 'text', 'rec']]\n",
    "res.text.fillna(\"\", inplace=True)\n",
    "res.rec.fillna(\"\", inplace=True)\n",
    "res.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Evaluate\n",
    "Evaluation of transcription results by comparing them with reference transcripts.\n",
    "- Calculates metrics such as [Word Error Rate (WER)](https://en.wikipedia.org/wiki/Word_error_rate), Sentence Error Rate (SER), Word Recognition Rate (WRR).\n",
    "- Implementation based on [github.com/belambert/asr-evaluation](https://github.com/belambert/asr-evaluation).\n",
    "- See some hints on [how to improve your Custom Speech accuracy](https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-custom-speech-evaluate-data).\n",
    "\n",
    "Generally, we recommend not to take the WER too serious, rather see it as a tool to detect recurring patterns or issues in the speech model. Especially in combination with LUIS, an end-to-end testing is more relevant.\n",
    "\n",
    "Print Verbosity:\n",
    "- 0 -> Only summary metrics\n",
    "- 1 -> Only errors\n",
    "- 2 -> All\n",
    "\n",
    "Optional variable: query_keyword.   \n",
    "This can be used to search for certain words in the reference text."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "eva = ev.EvaluateTranscription()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "REF: \u001b[31mI\u001b[0m \u001b[31mWOULD\u001b[0m \u001b[31mLIKE\u001b[0m \u001b[31mTO  \u001b[0m \u001b[31mBOOK  \u001b[0m \u001b[31mA   \u001b[0m flight \u001b[31mTO\u001b[0m frankfurt\nREC: \u001b[31m*\u001b[0m \u001b[31m*****\u001b[0m \u001b[31m****\u001b[0m \u001b[31mABER\u001b[0m \u001b[31mLEICHT\u001b[0m \u001b[31mÜBER\u001b[0m flight \u001b[31m**\u001b[0m frankfurt\nSENTENCE 2  BookFlight.wav\nCorrect          =  22.2%    2   (     9)\nErrors           =  77.8%    7   (     9)\nREF: \u001b[31mI\u001b[0m \u001b[31mWANT\u001b[0m \u001b[31mTO\u001b[0m \u001b[31mCANCEL\u001b[0m \u001b[31mMY\u001b[0m \u001b[31mJOURNEY\u001b[0m \u001b[31mTO\u001b[0m \u001b[31mKUALA\u001b[0m \u001b[31mLUMPUR\u001b[0m\nREC: \u001b[31m*\u001b[0m \u001b[31m****\u001b[0m \u001b[31m**\u001b[0m \u001b[31m******\u001b[0m \u001b[31m**\u001b[0m \u001b[31m*******\u001b[0m \u001b[31m**\u001b[0m \u001b[31m*****\u001b[0m \u001b[31mPUR   \u001b[0m\nSENTENCE 3  CancelFlight.wav\nCorrect          =   0.0%    0   (     9)\nErrors           = 100.0%    9   (     9)\nREF: i would like to change my flight \u001b[31mTO\u001b[0m \u001b[31mSINGAPORE\u001b[0m\nREC: i would like to change my flight \u001b[31m**\u001b[0m \u001b[31m*********\u001b[0m\nSENTENCE 4  ChangeFlight.wav\nCorrect          =  77.8%    7   (     9)\nErrors           =  22.2%    2   (     9)\nREF: \u001b[31mI\u001b[0m \u001b[31mWOULD\u001b[0m \u001b[31mLIKE\u001b[0m \u001b[31mTO\u001b[0m \u001b[31mBOOK\u001b[0m \u001b[31mA\u001b[0m \u001b[31mSEAT\u001b[0m \u001b[31mON\u001b[0m \u001b[31mMY\u001b[0m \u001b[31mFLIGHT\u001b[0m \u001b[31mTO\u001b[0m \u001b[31mSTUTTGART\u001b[0m\nREC: \u001b[31m*\u001b[0m \u001b[31m*****\u001b[0m \u001b[31m****\u001b[0m \u001b[31m**\u001b[0m \u001b[31m****\u001b[0m \u001b[31m*\u001b[0m \u001b[31m****\u001b[0m \u001b[31m**\u001b[0m \u001b[31m**\u001b[0m \u001b[31m******\u001b[0m \u001b[31m**\u001b[0m \u001b[31m*********\u001b[0m\nSENTENCE 5  BookSeat.wav\nCorrect          =   0.0%    0   (    12)\nErrors           = 100.0%   12   (    12)\n\nSentence count: 4\nWER: 76.923% (30 / 39)\nWRR: 23.077% (9 / 39)\nSER: 100.000% (4 / 4)\n"
     ]
    },
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "(0.7692307692307693, 0.23076923076923078, 1.0)"
      ]
     },
     "metadata": {},
     "execution_count": 4
    }
   ],
   "source": [
    "eva.calculate_metrics(res.text.values, res.rec.values, label=res.audio.values, print_verbosiy=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "\n***DELETIONS:\nto                            6\ni                             3\nwould                         2\nlike                          2\nmy                            2\nwant                          1\ncancel                        1\njourney                       1\nkuala                         1\nsingapore                     1\nbook                          1\na                             1\nseat                          1\non                            1\nflight                        1\nstuttgart                     1\n\n***SUBSTITUTIONS:\nto                   -> aber                            1\nbook                 -> leicht                          1\na                    -> über                            1\nlumpur               -> pur                             1\n"
     ]
    }
   ],
   "source": [
    "eva.print_errors(min_count=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "nlp",
   "language": "python",
   "name": "nlp"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6-final"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}