FAQ: AWS-Polly

18 November 2020

AWS-Polly

Polly is a service that turns text into lifelike speech.

It supports Speech Synthesis Markup Language (SSML) tags like prosody so users can adjust the speech rate, pitch or volume.

It is a secure service that delivers benefits at high scale and at low latency.

Users can cache and replay Amazon Polly’s generated speech at no additional cost.

Users can use Polly to power their application with high-quality spoken output.

Users can synthesize speech for certain Neural voices using the Newscaster style, to make them sound like a TV or Radio newscaster.

Users can detect when specific words or sentences in the text are being spoken to the user based on the metadata included in the audio stream.

It generates Speech Marks using the following four elements: Sentence, Word, Viseme and SSML.

It can be used in announcement systems in public transportation and industrial control systems for notifications and emergency announcements.

Applications such as quiz games, animations, avatars or narration generation are common use-cases for cloud-based Text-to-speech solution like Polly.

Cloud-based text-to-speech (Polly) is platform independent, so it minimizes development time and effort.

It supports all the programming languages included in the AWS SDK (Java, Node.js, .NET, PHP, Python, Ruby, Go and C++) and AWS Mobile SDK (iOS/Android).

It supports an HTTP API so users can implement their own access layer.

It supports MP3, Vorbis and raw PCM audio stream formats.

It is a HIPAA Eligible Service covered under the AWS Business Associate Addendum (AWS BAA).

It makes it easy to request an additional stream of metadata with information about when particular sentences, words and sounds are being pronounced.

It 's pay-per-use model means there are no setup costs. User can start small and scale up as their application grows.

It provides simple API operations that users can easily integrate with their existing applications.

It has a Neural TTS (NTTS) system that can produce even higher quality voices than its standard voices. The NTTS system produces the most natural and human-like text-to-speech voices possible.

Neural voices aren't available in all AWS Regions, nor do they support all It features.

It provides API operations that users can use to store lexicons in an AWS region.

Lexicons give additional control over how Polly pronounces words uncommon to the selected language.

The SynthesizeSpeech operation produces audio in near-real time, with relatively little latency in most cases.

Polly's Asynchronous Synthesis feature overcomes the challenge of processing a larger text document by changing the way the document is both synthesized and returned.

With the Polly plugin for WordPress, users can provide visitors to their WordPress website audio recordings of their content.

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)