TTS AD is the abbreviation for text-to-speech audio description. Now you may have heard text to speech and audio description as separate entities, but TTS AD is a relatively new concept that is starting to turn heads in the world of accessible media. Speed is the name of the game with TTS AD, as media professionals seek cost and time-effective solutions for making their programming accessible to low-vision audiences.
So let's take a closer look at what TTS AD is, how it is made, and what role it will play in the future of accessible media.
Text to speech (TTS) converts written text into a synthesised spoken voice.
It is sometimes referred to as "read aloud" technology.
TTS is an increasingly popular accessibility feature on computers and phones for reading out digital text.
You can find TTS in many places today such as internet browsers, smart assistants, e-book readers, and word processors.
Audio description for film and television is a form of narration that uses verbal descriptions to provide information on visual aspects of a media production.
In other words, a pre-recorded voice over track describes what is happening in a video, TV show or film.
The audio description should be neatly intertwined with the production's original dialogue and soundtrack.
TTS AD is audio description that is read aloud using synthetic voices through text to speech.
Audio description is primarily intended for blind or visually impaired viewers, so key visual elements are described to help their understanding of the video.
TTS AD is gaining in popularity due to time and cost factors.
The conventional way of making audio description is complex, meaning production companies often don't have the budget for it.
The audio description production process can be broken down into the following steps:
Complications often arise at the recording and mixing stages, as production is reliant on the availability of staff and recordings needing correction. With delays and re-recordings comes rising costs and postponed deadlines.
When costs rise over budget, media companies and content creators simply won't provide audio description for their productions.
This is one of the main reasons why there is a severe lack of audio-described programming.
Some countries have quotas in place, but only between 4 and 11% of programming in the EU is provided with an audio description.
Moral and legal aspects aside, an improved, more cost-effective workflow should encourage media service providers to factor audio description into their budgets.
This is where TTS AD comes in.
TTS reads out the script, so there is no need for recording with a voice artist.
This removes roadblocks associated with scheduling voice talent and sound engineers.
If changes are needed, corrections can be easily made to the script, and the new TTS voice output is ready right away.
It depends on what features are included in the TTS AD software.
VIDEO TO VOICE has developed Frazier, an audio description production suite that automates mixing and mastering steps.
This way, the mixed audio-described video meets official loudness standards and is ready to broadcast.
The audio describer can work on every stage of the process in the browser-based audio description editor – everything from script writing to delivering the broadcast-ready mixed video.
There is no need for expensive recording studios or mixing desks.
This article provides more in-depth analysis of audio description software.
The reaction has been generally positive from blind and low-vision audiences, though acknowledging human voices are preferable.
In 2012, a Polish study showed 95% of respondents regarded TTS AD as a viable interim solution.
A 2015 study by the Autonomous University of Barcelona supported these findings, where 94% of blind or partially sighted participants found TTS AD to be a suitable solution.
Since these studies were conducted, synthetic voice quality has continued to improve.
Production houses are encouraged to test samples on their audiences before deciding if TTS AD is the right choice.
Frazier includes human-like synthetic voices. Here's an example in Australian English:
You can listen to further examples in different languages on the VIDEO TO VOICE production page.
With more visual media being produced than ever before, we need workable solutions for ensuring as much content as possible is made accessible.
The number of productions without audio description is continually growing, especially when considering the thousands of hours of footage being uploaded onto YouTube and other video-sharing platforms every day.
The only way to bridge the accessibility gap is to automate time-consuming and fiddly processes, as mentioned above.
The focus should not be on discussing whether human audio description is preferable to synthetic voices. Instead, we should be looking at the bigger picture: TTS AD provides a viable option for production companies that would otherwise not be able to audio describe their content.