glue/notebooks/Eval - Speech Transcription...

192 строки
9.3 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Evaluation of Custom Speech Transcription\n",
"This notebook serves to evaluate your Speech-to-Tex transcriptions generated by [GLUE](https://github.com/microsoft/glue)."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"# Import required packages\n",
"import sys\n",
"import pandas as pd\n",
"import configparser\n",
"\n",
"# Notebook specific functions\n",
"from matplotlib import cm, pyplot as plt \n",
"\n",
"# Custom functions\n",
"sys.path.append(\"../src\")\n",
"import evaluate as ev\n",
"\n",
"# Notebook configs\n",
"%matplotlib inline\n",
"%load_ext autoreload\n",
"%autoreload 2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Input Data\n",
"Below, you will import the transcription file generated by GLUE in the `--do_transcribe` mode. <br>\n",
"The evaluation will be equivalent to the one generated by the `--do_evaluation` mode, which is only printed in the console output. <br>\n",
"Here, you will have a consistent view on the results. \n",
"\n",
"Make sure it has the structure below. If you used GLUE, it will have it either way:\n",
"- Comma-separated (.csv)\n",
"- UTF-8 encoded\n",
"- Columns \"text\" for reference transcript and \"rec\" for recognition"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" audio text \\\n",
"0 BookFlight.wav I would like to book a flight to Frankfurt. \n",
"1 CancelFlight.wav I want to cancel my journey to Kuala Lumpur \n",
"2 ChangeFlight.wav I would like to change my flight to Singapore. \n",
"3 BookSeat.wav I would like to book a seat on my flight to St... \n",
"\n",
" rec \n",
"0 Aber leicht über Flight Frankfurt. \n",
"1 Pur. \n",
"2 I would like to change my flight? \n",
"3 "
],
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>audio</th>\n <th>text</th>\n <th>rec</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>BookFlight.wav</td>\n <td>I would like to book a flight to Frankfurt.</td>\n <td>Aber leicht über Flight Frankfurt.</td>\n </tr>\n <tr>\n <th>1</th>\n <td>CancelFlight.wav</td>\n <td>I want to cancel my journey to Kuala Lumpur</td>\n <td>Pur.</td>\n </tr>\n <tr>\n <th>2</th>\n <td>ChangeFlight.wav</td>\n <td>I would like to change my flight to Singapore.</td>\n <td>I would like to change my flight?</td>\n </tr>\n <tr>\n <th>3</th>\n <td>BookSeat.wav</td>\n <td>I would like to book a seat on my flight to St...</td>\n <td></td>\n </tr>\n </tbody>\n</table>\n</div>"
},
"metadata": {},
"execution_count": 2
}
],
"source": [
"# Import transcription file\n",
"res = pd.read_csv(\"../assets/examples/output_files/example_transcriptions_full.csv\", sep=\",\", encoding='utf-8')[['audio', 'text', 'rec']]\n",
"res.text.fillna(\"\", inplace=True)\n",
"res.rec.fillna(\"\", inplace=True)\n",
"res.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Evaluate\n",
"Evaluation of transcription results by comparing them with reference transcripts.\n",
"- Calculates metrics such as [Word Error Rate (WER)](https://en.wikipedia.org/wiki/Word_error_rate), Sentence Error Rate (SER), Word Recognition Rate (WRR).\n",
"- Implementation based on [github.com/belambert/asr-evaluation](https://github.com/belambert/asr-evaluation).\n",
"- See some hints on [how to improve your Custom Speech accuracy](https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-custom-speech-evaluate-data).\n",
"\n",
"Generally, we recommend not to take the WER too serious, rather see it as a tool to detect recurring patterns or issues in the speech model. Especially in combination with LUIS, an end-to-end testing is more relevant.\n",
"\n",
"Print Verbosity:\n",
"- 0 -> Only summary metrics\n",
"- 1 -> Only errors\n",
"- 2 -> All\n",
"\n",
"Optional variable: query_keyword. \n",
"This can be used to search for certain words in the reference text."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"eva = ev.EvaluateTranscription()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"scrolled": false
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"REF: \u001b[31mI\u001b[0m \u001b[31mWOULD\u001b[0m \u001b[31mLIKE\u001b[0m \u001b[31mTO \u001b[0m \u001b[31mBOOK \u001b[0m \u001b[31mA \u001b[0m flight \u001b[31mTO\u001b[0m frankfurt\nREC: \u001b[31m*\u001b[0m \u001b[31m*****\u001b[0m \u001b[31m****\u001b[0m \u001b[31mABER\u001b[0m \u001b[31mLEICHT\u001b[0m \u001b[31mÜBER\u001b[0m flight \u001b[31m**\u001b[0m frankfurt\nSENTENCE 2 BookFlight.wav\nCorrect = 22.2% 2 ( 9)\nErrors = 77.8% 7 ( 9)\nREF: \u001b[31mI\u001b[0m \u001b[31mWANT\u001b[0m \u001b[31mTO\u001b[0m \u001b[31mCANCEL\u001b[0m \u001b[31mMY\u001b[0m \u001b[31mJOURNEY\u001b[0m \u001b[31mTO\u001b[0m \u001b[31mKUALA\u001b[0m \u001b[31mLUMPUR\u001b[0m\nREC: \u001b[31m*\u001b[0m \u001b[31m****\u001b[0m \u001b[31m**\u001b[0m \u001b[31m******\u001b[0m \u001b[31m**\u001b[0m \u001b[31m*******\u001b[0m \u001b[31m**\u001b[0m \u001b[31m*****\u001b[0m \u001b[31mPUR \u001b[0m\nSENTENCE 3 CancelFlight.wav\nCorrect = 0.0% 0 ( 9)\nErrors = 100.0% 9 ( 9)\nREF: i would like to change my flight \u001b[31mTO\u001b[0m \u001b[31mSINGAPORE\u001b[0m\nREC: i would like to change my flight \u001b[31m**\u001b[0m \u001b[31m*********\u001b[0m\nSENTENCE 4 ChangeFlight.wav\nCorrect = 77.8% 7 ( 9)\nErrors = 22.2% 2 ( 9)\nREF: \u001b[31mI\u001b[0m \u001b[31mWOULD\u001b[0m \u001b[31mLIKE\u001b[0m \u001b[31mTO\u001b[0m \u001b[31mBOOK\u001b[0m \u001b[31mA\u001b[0m \u001b[31mSEAT\u001b[0m \u001b[31mON\u001b[0m \u001b[31mMY\u001b[0m \u001b[31mFLIGHT\u001b[0m \u001b[31mTO\u001b[0m \u001b[31mSTUTTGART\u001b[0m\nREC: \u001b[31m*\u001b[0m \u001b[31m*****\u001b[0m \u001b[31m****\u001b[0m \u001b[31m**\u001b[0m \u001b[31m****\u001b[0m \u001b[31m*\u001b[0m \u001b[31m****\u001b[0m \u001b[31m**\u001b[0m \u001b[31m**\u001b[0m \u001b[31m******\u001b[0m \u001b[31m**\u001b[0m \u001b[31m*********\u001b[0m\nSENTENCE 5 BookSeat.wav\nCorrect = 0.0% 0 ( 12)\nErrors = 100.0% 12 ( 12)\n\nSentence count: 4\nWER: 76.923% (30 / 39)\nWRR: 23.077% (9 / 39)\nSER: 100.000% (4 / 4)\n"
]
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
"(0.7692307692307693, 0.23076923076923078, 1.0)"
]
},
"metadata": {},
"execution_count": 4
}
],
"source": [
"eva.calculate_metrics(res.text.values, res.rec.values, label=res.audio.values, print_verbosiy=1)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"scrolled": false
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"\n***DELETIONS:\nto 6\ni 3\nwould 2\nlike 2\nmy 2\nwant 1\ncancel 1\njourney 1\nkuala 1\nsingapore 1\nbook 1\na 1\nseat 1\non 1\nflight 1\nstuttgart 1\n\n***SUBSTITUTIONS:\nto -> aber 1\nbook -> leicht 1\na -> über 1\nlumpur -> pur 1\n"
]
}
],
"source": [
"eva.print_errors(min_count=1)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "nlp",
"language": "python",
"name": "nlp"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.6-final"
}
},
"nbformat": 4,
"nbformat_minor": 2
}