116 строки
6.2 KiB
Markdown
116 строки
6.2 KiB
Markdown
---
|
||
type: post
|
||
title: "Tip 320 - How to get started with Neural Text to Speech in Azure"
|
||
excerpt: "Learn how to get started with Neural Text to Speech in Azure"
|
||
tags: [AI + Machine Learning]
|
||
share: true
|
||
date: 2021-06-16 12:00:00
|
||
---
|
||
|
||
::: tip
|
||
|
||
:fire: Download the FREE Azure Developer Guide eBook [here](http://aka.ms/azuredevebook?WT.mc_id=docs-azuredevtips-azureappsdev).
|
||
|
||
:bulb: Learn more : [What is Neural Text-to-speech?](https://docs.microsoft.com/azure/cognitive-services/speech-service/text-to-speech?WT.mc_id=docs-azuredevtips-azureappsdev).
|
||
|
||
:bulb: Checkout [Azure AI resources for developers](https://azure.microsoft.com/en-us/overview/ai-platform/dev-resources/?WT.mc_id=docs-azuredevtips-azureappsdev).
|
||
|
||
:tv: Watch the video : [How to get started with Neural Text to Speech in Azure](https://youtu.be/dl0amatX5zs?WT.mc_id=youtube-azuredevtips-azureappsdev).
|
||
|
||
:::
|
||
|
||
### How to get started with Neural Text to Speech in Azure
|
||
|
||
#### Synthesize text to speech with in Azure
|
||
[Azure Neural Text-to-Speech](https://azure.microsoft.com/services/cognitive-services/text-to-speech/?WT.mc_id=azure-azuredevtips-azureappsdev) is part of the [Azure Cognitive Services family](https://azure.microsoft.com/services/cognitive-services/?WT.mc_id=azure-azuredevtips-azureappsdev). You can use it to synthesize text into natural human speech. You can translate text into speech in more than [54 languages and locales](https://docs.microsoft.com/azure/cognitive-services/speech-service/language-support#text-to-speech?WT.mc_id=docs-azuredevtips-azureappsdev) with a growing list of 129 neural voices.
|
||
|
||
In this post, we'll create a simple application that can turn text into speech.
|
||
|
||
#### Prerequisites
|
||
If you want to follow along, you'll need the following:
|
||
* An Azure subscription (If you don't have an Azure subscription, create a [free account](https://azure.microsoft.com/free/?WT.mc_id=azure-azuredevtips-azureappsdev) before you begin)
|
||
* The latest version of [Visual Studio](https://visualstudio.microsoft.com/downloads/?WT.mc_id=microsoft-azuredevtips-azureappsdev) or [Visual Studio Code](https://code.visualstudio.com/?WT.mc_id=other-azuredevtips-azureappsdev). This post uses Visual Studio, and you can also use VS Code to accomplish the same result.
|
||
|
||
#### Use the Text-To-Speech service in Azure
|
||
To start turning text into speech, we need to create an Azure Cognitive Service Speech service. We'll do that in the Azure portal.
|
||
|
||
1. Click the **Create a resource** button (the plus-sign in the top left corner)
|
||
2. Search for **speech**, select the "Speech" result and click **Create**
|
||
1. This brings you to the **Create Speech** blade
|
||
2. Fill in a **Name** for the speech service
|
||
3. Pick a **Location**
|
||
4. Select a **Pricing Tier**. The free tier is fine for this demo
|
||
5. Finally, select a **Resource Group**
|
||
6. Click **Create** to create the speech service
|
||
|
||
<img :src="$withBase('/files/105createspeech.png')" width="75%">
|
||
|
||
(Create Speech service blade in the Azure portal)
|
||
|
||
When the speech service is created, navigate to it to see its access keys. You'll need **one of the access keys** and the **location** to use the service in an application.
|
||
|
||
<img :src="$withBase('/files/105keys.png')">
|
||
|
||
(Speech service access keys in the Azure portal)
|
||
|
||
Let's use the speech service in an application that we'll create with Visual Studio.
|
||
|
||
1. Open Visual Studio
|
||
2. Create a new Console application by navigating to **File > New > Project** and selecting **Console Application**
|
||
3. The first thing that we need to do, is to reference a NuGet package to work with the Speech service. Right-click the project file and select **Manage NuGet Packages**
|
||
4. Find the package **Microsoft.CognitiveServices.Speech** and install it
|
||
|
||
<img :src="$withBase('/files/105nuget.png')" width="75%">
|
||
|
||
(Speech NuGet package in Visual Studio)
|
||
|
||
5. Next, create the code in the **Program.cs** file. The file should look like this:
|
||
|
||
```
|
||
using Microsoft.CognitiveServices.Speech;
|
||
using Microsoft.CognitiveServices.Speech.Audio;
|
||
using System.Threading.Tasks;
|
||
|
||
namespace TextToSpeech
|
||
{
|
||
class Program
|
||
{
|
||
static async Task Main()
|
||
{
|
||
await SynthesizeAudioAsync();
|
||
}
|
||
|
||
static async Task SynthesizeAudioAsync()
|
||
{
|
||
var config = SpeechConfig.FromSubscription("servicekey", "servicelocation");
|
||
|
||
using var synthesizer = new SpeechSynthesizer(config, audioConfig);
|
||
|
||
await synthesizer.SpeakTextAsync("Synthesizing directly to speaker output.");
|
||
}
|
||
}
|
||
}
|
||
|
||
```
|
||
In the **SynthesizeAudioAsync** method, the first thing that happens is that a **SpeechConfig** is created using the access key and the service location. Leaving the config variable as it is, it is passed to a new **SpeechSynthesizer**, which is used to turn text into speech
|
||
|
||
6. Run the code to see if it works. It should output the text synthesizing directly to speaker output from your default audio output device
|
||
7. By default, the service detects the language of the text and uses the default synthesizer voice for it. In this case, it detects the language as en-US and will use the default voice for output. You can change the language that it uses to analyze the text, although the default detection works very good. You can also **change the output voice** by using the **config** variable and passing that to the **SpeechSynthesizer**, like in the example below:
|
||
|
||
```
|
||
var config = SpeechConfig.FromSubscription("servicekey", "servicelocation");
|
||
config.SpeechSynthesisVoiceName = "en-GB-RyanNeural";
|
||
|
||
using var synthesizer = new SpeechSynthesizer(config, audioConfig);
|
||
```
|
||
8. Instead of outputting the voice to the default audio output, you can also output the audio to a memory stream or to a file. For instance, to a .wav file, like in the example below:
|
||
|
||
```
|
||
var config = SpeechConfig.FromSubscription("servicekey", "servicelocation");
|
||
AudioConfig audioConfig = AudioConfig.FromWavFileOutput("c:/audio.wav");
|
||
|
||
using var synthesizer = new SpeechSynthesizer(config, audioConfig);
|
||
```
|
||
|
||
#### Conclusion
|
||
The [Azure Neural Text-to-Speech](https://azure.microsoft.com/services/cognitive-services/text-to-speech/?WT.mc_id=azure-azuredevtips-azureappsdev) service enables you to convert text to lifelike speech which is close to human-parity. Go and check it out! |