AzureTipsAndTricks/blog/tip320.md

116 строки
6.2 KiB
Markdown
Исходник Постоянная ссылка Обычный вид История

---
type: post
title: "Tip 320 - How to get started with Neural Text to Speech in Azure"
excerpt: "Learn how to get started with Neural Text to Speech in Azure"
tags: [AI + Machine Learning]
share: true
date: 2021-06-16 12:00:00
---
::: tip
:fire: Download the FREE Azure Developer Guide eBook [here](http://aka.ms/azuredevebook?WT.mc_id=docs-azuredevtips-azureappsdev).
:bulb: Learn more : [What is Neural Text-to-speech?](https://docs.microsoft.com/azure/cognitive-services/speech-service/text-to-speech?WT.mc_id=docs-azuredevtips-azureappsdev).
:bulb: Checkout [Azure AI resources for developers](https://azure.microsoft.com/en-us/overview/ai-platform/dev-resources/?WT.mc_id=docs-azuredevtips-azureappsdev).
:tv: Watch the video : [How to get started with Neural Text to Speech in Azure](https://youtu.be/dl0amatX5zs?WT.mc_id=youtube-azuredevtips-azureappsdev).
:::
### How to get started with Neural Text to Speech in Azure
#### Synthesize text to speech with in Azure
[Azure Neural Text-to-Speech](https://azure.microsoft.com/services/cognitive-services/text-to-speech/?WT.mc_id=azure-azuredevtips-azureappsdev) is part of the [Azure Cognitive Services family](https://azure.microsoft.com/services/cognitive-services/?WT.mc_id=azure-azuredevtips-azureappsdev). You can use it to synthesize text into natural human speech. You can translate text into speech in more than [54 languages and locales](https://docs.microsoft.com/azure/cognitive-services/speech-service/language-support#text-to-speech?WT.mc_id=docs-azuredevtips-azureappsdev) with a growing list of 129 neural voices.
In this post, we'll create a simple application that can turn text into speech.
#### Prerequisites
If you want to follow along, you'll need the following:
* An Azure subscription (If you don't have an Azure subscription, create a [free account](https://azure.microsoft.com/free/?WT.mc_id=azure-azuredevtips-azureappsdev) before you begin)
* The latest version of [Visual Studio](https://visualstudio.microsoft.com/downloads/?WT.mc_id=microsoft-azuredevtips-azureappsdev) or [Visual Studio Code](https://code.visualstudio.com/?WT.mc_id=other-azuredevtips-azureappsdev). This post uses Visual Studio, and you can also use VS Code to accomplish the same result.
#### Use the Text-To-Speech service in Azure
To start turning text into speech, we need to create an Azure Cognitive Service Speech service. We'll do that in the Azure portal.
1. Click the **Create a resource** button (the plus-sign in the top left corner)
2. Search for **speech**, select the "Speech" result and click **Create**
1. This brings you to the **Create Speech** blade
2. Fill in a **Name** for the speech service
3. Pick a **Location**
4. Select a **Pricing Tier**. The free tier is fine for this demo
5. Finally, select a **Resource Group**
6. Click **Create** to create the speech service
<img :src="$withBase('/files/105createspeech.png')" width="75%">
(Create Speech service blade in the Azure portal)
When the speech service is created, navigate to it to see its access keys. You'll need **one of the access keys** and the **location** to use the service in an application.
<img :src="$withBase('/files/105keys.png')">
(Speech service access keys in the Azure portal)
Let's use the speech service in an application that we'll create with Visual Studio.
1. Open Visual Studio
2. Create a new Console application by navigating to **File > New > Project** and selecting **Console Application**
3. The first thing that we need to do, is to reference a NuGet package to work with the Speech service. Right-click the project file and select **Manage NuGet Packages**
4. Find the package **Microsoft.CognitiveServices.Speech** and install it
<img :src="$withBase('/files/105nuget.png')" width="75%">
(Speech NuGet package in Visual Studio)
5. Next, create the code in the **Program.cs** file. The file should look like this:
```
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;
using System.Threading.Tasks;
namespace TextToSpeech
{
class Program
{
static async Task Main()
{
await SynthesizeAudioAsync();
}
static async Task SynthesizeAudioAsync()
{
var config = SpeechConfig.FromSubscription("servicekey", "servicelocation");
using var synthesizer = new SpeechSynthesizer(config, audioConfig);
await synthesizer.SpeakTextAsync("Synthesizing directly to speaker output.");
}
}
}
```
In the **SynthesizeAudioAsync** method, the first thing that happens is that a **SpeechConfig** is created using the access key and the service location. Leaving the config variable as it is, it is passed to a new **SpeechSynthesizer**, which is used to turn text into speech
6. Run the code to see if it works. It should output the text synthesizing directly to speaker output from your default audio output device
7. By default, the service detects the language of the text and uses the default synthesizer voice for it. In this case, it detects the language as en-US and will use the default voice for output. You can change the language that it uses to analyze the text, although the default detection works very good. You can also **change the output voice** by using the **config** variable and passing that to the **SpeechSynthesizer**, like in the example below:
```
var config = SpeechConfig.FromSubscription("servicekey", "servicelocation");
config.SpeechSynthesisVoiceName = "en-GB-RyanNeural";
using var synthesizer = new SpeechSynthesizer(config, audioConfig);
```
8. Instead of outputting the voice to the default audio output, you can also output the audio to a memory stream or to a file. For instance, to a .wav file, like in the example below:
```
var config = SpeechConfig.FromSubscription("servicekey", "servicelocation");
AudioConfig audioConfig = AudioConfig.FromWavFileOutput("c:/audio.wav");
using var synthesizer = new SpeechSynthesizer(config, audioConfig);
```
#### Conclusion
The [Azure Neural Text-to-Speech](https://azure.microsoft.com/services/cognitive-services/text-to-speech/?WT.mc_id=azure-azuredevtips-azureappsdev) service enables you to convert text to lifelike speech which is close to human-parity. Go and check it out!