How To Lower Audio Description Production Time & Costs

How To Lower Audio Description Production Time & Costs

Audio description is an essential component for ensuring videos are accessible to all your viewers. However, the traditional audio description workflow is complex and costly, making it difficult to stick to quotas and budgets. This is why media companies should look to new technology to streamline processes and lower their production costs.


Remind me, why is audio description so important again?

An audio description is a pre-recorded voice over track that describes what is happening in a film, video or TV show between dialogue.

Without this narration, blind or visually impaired viewers aren't able to fully enjoy and understand a video in the same way as fully sighted people do.

Denying people with visual impairments access to media also contravenes legal requirements, such as the Audiovisual Media Services Directive and the Web Accessibility Initiative.

Even so, only a small fraction of media productions are audio described.

Audio description: Microphone in foreground in front of a blurred computer screen

Got it. So what's the problem with providing audio description?

For a comprehensive overview of the problems, take a look at our detailed analysis of the traditional audio description workflow.

Here is a quick summary of the conclusions we came to:

  • The workflow is complex.
  • It is difficult to say how long an audio description will take to produce.
  • Production costs are unpredictable and can rise quickly. 

The final point is particularly problematic for media companies, as they need to deal with dreaded variable costs.

Okay, let's cut to the chase: What's the solution?

The short answer is digitalisation and automation.

This doesn't mean audio description writers and voice artists should be replaced by computers – far from it.

Instead, automation provides the answer for closing the huge gap in audio description availability, especially for low-budget productions and the web format.

Automation can also help media companies meet and surpass national quotas.

It's simple: The current number of human audio describers cannot cope with the number of productions being made, let alone the 720,000 hours of footage being posted on YouTube every day.

Even if there were enough audio description professionals in the world, tight budgets and pressing deadlines make the pricey traditional workflow unfeasible.

Man holding phone with YouTube play symbol on screen

So where does automation come into the audio description production process?

The largest costs surrounding audio description production usually lie at the recording stage with voice artist, sound engineer, and studio fees to consider.

These costs can be circumvented using text-to-speech technology.

What is text-to-speech technology?

Text to speech converts written text into a synthesised voice, i.e. the artificial production of human speech.

That way, the audio description transcript from the writer can be read out without the need for a voice artist or recording studio.

Interesting. But I don't want my audio description to sound robotic...

Rest assured: speech synthesis technology has come a long way in recent years.

The quality and range of voices have improved, giving the narration and natural feel.

The male voice over in this video is a text-to-speech audio description:

Great! What else can be automated?

Let's take a look at another variable expense that can be hard for media companies to calculate: sound engineering.

Here are several processes that can be automated:

Ducking

Ducking temporarily lowers one audio signal whenever a second signal is present.

That means the background music or sounds are lowered when the audio description is being read out.

Nowadays, the ducking process can be completed in seconds with a few clicks of the mouse.

Mixing

Audio mixing is the process of optimising and combining multiple sounds into one or more channels.

With the latest technology, the audio track is automatically analysed and processed to create a professional mix.

Settings can also be preconfigured to ensure the audio description conforms to official standards.

Delivery

Delivering the audio description in the right format is usually a pain in the traditional workflow.

By automating the process, an export can be created by selecting pre-configured settings that meet broadcasting standards.

recording studio desk

So time and costs are saved with recording and post production. What about the writing stage?

Digitalising and automating the production process also helps audio description writers.

Live preview

Finding the right wording isn't always easy, which is where text-to-speech technology can be a great help.

The writer can listen to a preview of what they have written without having to record and play anything back.

This leads on to the next time-saving benefit...

Voice activity detection

In the traditional workflow, the writer needs to repeatedly rewatch the video to determine gaps in dialogue for their audio description.

With voice activity detection, the video is analysed to see where there's dialogue and displays this in the timeline; music and other irrelevant sounds aren't considered.

The writer can see where the gaps are and simply place their audio description between the speech.

In the example below, the red areas show where there's speech and the blue areas show where it's possible to insert the audio description.

Voice Activity Detection

Less project management, less stress

In the traditional audio description workflow, there can be a lot of back-and-forth between the writer and the client.

This is particularly the case when corrections are required.

With the automated features mentioned above, the writer is less likely to make mistakes and need direction from the client.

Let technology do the work. But you said earlier that humans wouldn't be replaced with computers...

That's right!

The idea is to provide a cost-friendly, reliable solution that's compatible for budgets that can't afford to create audio descriptions done the traditional way.

With unpredictable human factors out of the way, it is also easier to stick to tight production deadlines.

This makes automation through text to speech particularly useful for the increasingly popular web series format.

Generally speaking, media companies can get more audio descriptions produced within their budget and meet national quotas more comfortably.

This increases audio description availability, meaning the blind and visually impaired have access to more content.

What's not to like?

Smiling woman looking at camera giving two thumbs up

That's a lot to take in. Can you try to sum it up nicely?

Sure.

Automation and text-to-speech technology speed up the production process and lower costs by:

  • helping the writer create the audio description transcript more efficiently
  • eliminating the need for a voice over artist to read out the transcript
  • eliminating recording studio and sound engineer fees
  • simplifying final delivery of the audio description in the right format

All of these benefits make text-to-speech audio description perfect for low-budget productions and the web format.

Speaking of budget...

Fixed costs for full control over your budget 

The biggest problem facing media companies is variable costs.

With so many varying human factors, total production costs can be difficult to establish.

Digitalising and automating the production process removes this uncertainty, as turnaround times can be precisely calculated.

As a result, variable costs become reliable fixed costs.

Media companies can now budget accordingly, meet their quotas, and plan ahead for the future. 

Budget, pen and calculator on table

This all sounds great in theory. But does it work in practice?

Absolutely. 

At VIDEO TO VOICE, we have developed solutions that incorporate text-to-speech technology to create high-quality audio descriptions efficiently and cost-effectively.

Latest posts

Don’t miss out and subscribe to our newsletter today.

Contact

We work with leading experts from academic institutions in our software's development:

zhaw Logo Uni HIldesheim Logo