The traditional audio description workflow can be expensive and time-consuming. That's why audio description professionals are opting for technological solutions to bring down expenditure. One method is to read our audio description transcripts with synthetic voices, but how does the technology fare in practice?
An audio description is a spoken narration that helps people with visual or intellectual disabilities follow a film or TV show.
Gregory Frazier put in the first concrete work on audio description in the mid-1970s.
In the following decade, audio description services started to emerge with a primary focus on cinema.
The conventional way of making audio descriptions requires different personnel for each stage in the process.
We go into greater detail in our article exploring why audio description production is a headache for production companies.
Here's a quick overview of our findings:
To address all these problems, production companies are turning to synthetic voices.
Speech synthesis is the artificial production of human speech.
Based on machine learning, the technique is used for:
The latter two points are what we are particularly interested in.
Text to speech converts written text into a synthesized spoken voice.
Accessibility supports social inclusion for people with disabilities, ensuring everyone is treated the same.
Technology has a big hand in promoting accessibility.
For example, many disabled people use specialized text-to-speech software such as screen readers when online or on devices.
Screen readers read aloud the text on a website and provide important functionality when navigating through headings and reading out alt text for images.
Image credit: Wikipedia, Sebastien.delorme - Own work, CC BY-SA 3.0
Following the logic above, audio description transcripts can also be read aloud as a synthesized voice using text-to-speech technology.
Studies show that text-to-speech audio description is generally regarded as an acceptable solution by viewers with visual impairments.
The best approach is to test audio descriptions created with synthetic voices on your audience and decide from there.
For example, German broadcaster MDR tested text-to-speech audio descriptions on blind consumers at the Louis Braille Festival in 2019 before adopting the technology.
No, they don't need to be.
While JAWS has its functional merits, it is not suitable for audio description.
Thankfully, the quality and range of voices have come on a long way in recent years, giving the narration a natural feel.
With advancements in artificial intelligence, the technology is also improving all the time.
Take a listen for yourself – the female voice over in this video is a text-to-speech audio description:
With text to speech, you can deliver the audio description without a voice artist or recording studio.
If there are subsequent corrections made to the transcript, you don't need to worry about rehiring voice talent and studio space.
Using synthetic voices also saves time.
In the traditional workflow, production companies are sometimes left waiting on voice actor and studio availability.
Yet with synthesized speech, the audio description transcript can be read out as soon as the text is ready.
And this makes budgeting a lot easier, right?
Unpredictable variable costs become easy-to-manage fixed costs.
No, not at all!
Synthesized speech isn't there to compete with professional voice artists.
Instead, synthetic voices play a supporting role in audio description creation.
The issue is that not enough productions are made with an audio description due to budget restraints.
Time is another sticking point – most companies need the audio description for their productions right away; for cinema, it usually takes weeks for the audio description to be ready.
So it's not a case of replacing voice artists – it's about making audio description affordable and time-saving for projects where it would otherwise be economically unviable.
In turn, the number of productions provided with an audio description should increase, boosting the availability of accessible content as a whole.
After extensive testing, early adopters have switched to text-to-speech audio descriptions for productions of various lengths.
Text-to-speech audio descriptions are perfect for smaller budgets and certain formats, particularly those intended for the web.
VIDEO TO VOICE developed Frazier, a Production Suite for creating text-to-speech audio descriptions.
The transcript can be written directly into the tool; the audio description is then generated using text to speech in seconds.
The user can choose from hundreds of voices in over 40 languages for the audio description.
Here is a short narration of a pole vault attempt delivered using text-to-speech audio description:
You don't have to!
Frazier is browser-based, so there's no need to download anything.
You'll be given log-in details to access the platform online, and can start creating your own text-to-speech audio descriptions right away.
Frazier includes a neural machine translation feature.
This automatically converts your audio description into another language.
After the translation has been generated, a post-editor can make any adjustments to the text and select the voice in the new language.
Using synthetic voices is a lot less expensive than using voice actors.
In fact, it is the most affordable option out there.
Through Frazier, you have access to the best synthetic voices out there from leading providers.
The software also takes care of the final mix and mastering.
That's value for money.
You can book a demo to see if text-to-speech audio descriptions are the right fit for you and your audience.
We got through quite a lot there, so let's quickly sum up our findings.
Without synthetic voices, audio description is a non-starter for many productions. Yet as the aforementioned studies and examples show, synthesized speech is a viable and affordable solution for audio describing content. Browser-based and easy to use, Frazier provides the perfect platform for integrating audio descriptions into your videos through text to speech.