Audio description is an essential component for ensuring videos are accessible to all your viewers. However, the traditional audio description workflow is complex and costly, making it difficult to stick to quotas and budgets. This is why media companies should look to new technology to streamline processes and lower their production costs.
An audio description is a pre-recorded voice over track that describes what is happening in a film, video or TV show between dialogue.
Without this narration, blind or visually impaired viewers aren't able to fully enjoy and understand a video in the same way as fully sighted people do.
Even so, only a small fraction of media productions are audio described.
For a comprehensive overview of the problems, take a look at our detailed analysis of the traditional audio description workflow.
Here is a quick summary of the conclusions we came to:
The final point is particularly problematic for media companies, as they need to deal with dreaded variable costs.
The short answer is digitalisation and automation.
This doesn't mean audio description writers and voice artists should be replaced by computers – far from it.
Instead, automation provides the answer for closing the huge gap in audio description availability, especially for low-budget productions and the web format.
Automation can also help media companies meet and surpass national quotas.
It's simple: The current number of human audio describers cannot cope with the number of productions being made, let alone the 720,000 hours of footage being posted on YouTube every day.
Even if there were enough audio description professionals in the world, tight budgets and pressing deadlines make the pricey traditional workflow unfeasible.
The largest costs surrounding audio description production usually lie at the recording stage with voice artist, sound engineer, and studio fees to consider.
These costs can be circumvented using text-to-speech technology.
Text to speech converts written text into a synthesised voice, i.e. the artificial production of human speech.
That way, the audio description transcript from the writer can be read out without the need for a voice artist or recording studio.
Rest assured: speech synthesis technology has come a long way in recent years.
The quality and range of voices have improved, giving the narration and natural feel.
The male voice over in this video is a text-to-speech audio description:
Let's take a look at another variable expense that can be hard for media companies to calculate: sound engineering.
Here are several processes that can be automated:
Ducking temporarily lowers one audio signal whenever a second signal is present.
That means the background music or sounds are lowered when the audio description is being read out.
Nowadays, the ducking process can be completed in seconds with a few clicks of the mouse.
Audio mixing is the process of optimising and combining multiple sounds into one or more channels.
With the latest technology, the audio track is automatically analysed and processed to create a professional mix.
Settings can also be preconfigured to ensure the audio description conforms to official standards.
Delivering the audio description in the right format is usually a pain in the traditional workflow.
By automating the process, an export can be created by selecting pre-configured settings that meet broadcasting standards.
Digitalising and automating the production process also helps audio description writers.
Finding the right wording isn't always easy, which is where text-to-speech technology can be a great help.
The writer can listen to a preview of what they have written without having to record and play anything back.
This leads on to the next time-saving benefit...
In the traditional workflow, the writer needs to repeatedly rewatch the video to determine gaps in dialogue for their audio description.
With voice activity detection, the video is analysed to see where there's dialogue and displays this in the timeline; music and other irrelevant sounds aren't considered.
The writer can see where the gaps are and simply place their audio description between the speech.
In the example below, the red areas show where there's speech and the blue areas show where it's possible to insert the audio description.
In the traditional audio description workflow, there can be a lot of back-and-forth between the writer and the client.
This is particularly the case when corrections are required.
With the automated features mentioned above, the writer is less likely to make mistakes and need direction from the client.
The idea is to provide a cost-friendly, reliable solution that's compatible for budgets that can't afford to create audio descriptions done the traditional way.
With unpredictable human factors out of the way, it is also easier to stick to tight production deadlines.
This makes automation through text to speech particularly useful for the increasingly popular web series format.
Generally speaking, media companies can get more audio descriptions produced within their budget and meet national quotas more comfortably.
This increases audio description availability, meaning the blind and visually impaired have access to more content.
What's not to like?
Automation and text-to-speech technology speed up the production process and lower costs by:
All of these benefits make text-to-speech audio description perfect for low-budget productions and the web format.
Speaking of budget...
The biggest problem facing media companies is variable costs.
With so many varying human factors, total production costs can be difficult to establish.
Digitalising and automating the production process removes this uncertainty, as turnaround times can be precisely calculated.
As a result, variable costs become reliable fixed costs.
Media companies can now budget accordingly, meet their quotas, and plan ahead for the future.
At VIDEO TO VOICE, we have developed solutions that incorporate text-to-speech technology to create high-quality audio descriptions efficiently and cost-effectively.