Polly

Amazon Polly converts text into lifelike speech.

It takes text in a specific language and results in speech also in that specific language.

Polly performs no translation. It can only take text in a given language and output speech also in that language.

Polly operates in two modes:

  • Standard text to speech (TTS) which uses a concatenative architecture. It takes phonemes and uses a concatenative architecture to build patterns of speech.

  • Neural text to speech: This mode takes phonemes, generates spectrograms, puts these spectrograms through a vocoder and that generates the output audio. This is much more complex and it’s much more computationally heavy but what it does is result in much more human or natural sounding speech.

Polly’s output can be in different formats:

  • MP3

  • OgVorBis

  • PCM: useful if you want to integrate with various aws products.

Polly is capable of using the speech synthesis markup language (SSML), a way in which you can provide additional context within the text, so you can control how Polly generates speech:

  • Emphasize various parts of sentences

  • Pronounce things in certain ways

  • Whisper

  • others

Polly can be integrated with other services:

  • You can get a WordPress plugin which allows articles on WordPress blogs to be spoken.

  • It can be integrated with other AWS services where you need speech to be generated based on text.

  • You can integrate Polly with your own applications using the APIs.