AzureTipsAndTricks/blog/tip320.md

6.2 KiB

type title excerpt tags share date
post Tip 320 - How to get started with Neural Text to Speech in Azure Learn how to get started with Neural Text to Speech in Azure
AI + Machine Learning
true 2021-06-16 12:00:00

::: tip

🔥 Download the FREE Azure Developer Guide eBook here.

💡 Learn more : What is Neural Text-to-speech?.

💡 Checkout Azure AI resources for developers.

📺 Watch the video : How to get started with Neural Text to Speech in Azure.

:::

How to get started with Neural Text to Speech in Azure

Synthesize text to speech with in Azure

Azure Neural Text-to-Speech is part of the Azure Cognitive Services family. You can use it to synthesize text into natural human speech. You can translate text into speech in more than 54 languages and locales with a growing list of 129 neural voices.

In this post, we'll create a simple application that can turn text into speech.

Prerequisites

If you want to follow along, you'll need the following:

  • An Azure subscription (If you don't have an Azure subscription, create a free account before you begin)
  • The latest version of Visual Studio or Visual Studio Code. This post uses Visual Studio, and you can also use VS Code to accomplish the same result.

Use the Text-To-Speech service in Azure

To start turning text into speech, we need to create an Azure Cognitive Service Speech service. We'll do that in the Azure portal.

  1. Click the Create a resource button (the plus-sign in the top left corner)
  2. Search for speech, select the "Speech" result and click Create
    1. This brings you to the Create Speech blade
    2. Fill in a Name for the speech service
    3. Pick a Location
    4. Select a Pricing Tier. The free tier is fine for this demo
    5. Finally, select a Resource Group
    6. Click Create to create the speech service

(Create Speech service blade in the Azure portal)

When the speech service is created, navigate to it to see its access keys. You'll need one of the access keys and the location to use the service in an application.

(Speech service access keys in the Azure portal)

Let's use the speech service in an application that we'll create with Visual Studio.

  1. Open Visual Studio
  2. Create a new Console application by navigating to File > New > Project and selecting Console Application
  3. The first thing that we need to do, is to reference a NuGet package to work with the Speech service. Right-click the project file and select Manage NuGet Packages
  4. Find the package Microsoft.CognitiveServices.Speech and install it

(Speech NuGet package in Visual Studio)

  1. Next, create the code in the Program.cs file. The file should look like this:
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;
using System.Threading.Tasks;

namespace TextToSpeech
{
    class Program
    {
        static async Task Main()
        {
            await SynthesizeAudioAsync();
        }

        static async Task SynthesizeAudioAsync()
        {
            var config = SpeechConfig.FromSubscription("servicekey", "servicelocation");

            using var synthesizer = new SpeechSynthesizer(config, audioConfig);
            
            await synthesizer.SpeakTextAsync("Synthesizing directly to speaker output.");
        }
    }
}

In the SynthesizeAudioAsync method, the first thing that happens is that a SpeechConfig is created using the access key and the service location. Leaving the config variable as it is, it is passed to a new SpeechSynthesizer, which is used to turn text into speech

  1. Run the code to see if it works. It should output the text synthesizing directly to speaker output from your default audio output device
  2. By default, the service detects the language of the text and uses the default synthesizer voice for it. In this case, it detects the language as en-US and will use the default voice for output. You can change the language that it uses to analyze the text, although the default detection works very good. You can also change the output voice by using the config variable and passing that to the SpeechSynthesizer, like in the example below:
        var config = SpeechConfig.FromSubscription("servicekey", "servicelocation");
        config.SpeechSynthesisVoiceName = "en-GB-RyanNeural";
        
        using var synthesizer = new SpeechSynthesizer(config, audioConfig);
  1. Instead of outputting the voice to the default audio output, you can also output the audio to a memory stream or to a file. For instance, to a .wav file, like in the example below:
        var config = SpeechConfig.FromSubscription("servicekey", "servicelocation");
        AudioConfig audioConfig = AudioConfig.FromWavFileOutput("c:/audio.wav");

        using var synthesizer = new SpeechSynthesizer(config, audioConfig);

Conclusion

The Azure Neural Text-to-Speech service enables you to convert text to lifelike speech which is close to human-parity. Go and check it out!