How Text To Speech Technology Meets Your Audio Description Needs

VIDEO TO VOICE is making it easier to produce accessible videos in a cost-effective way. The German firm's products include authoring and production tools that use text to speech technology to create audio descriptions.

Here's how the software company's ground-breaking solutions lower costs for production companies and ultimately expand the availability of accessible media…

What is most people's experience with accessible media?

We are watching more videos than ever nowadays. Or should that be reading videos?

Whether it’s scrolling through social media or catching up on our favourite Netflix series, many of us follow the subtitles, particularly when we need to keep the sound off.

Why do we need subtitles?

We’re so accustomed to seeing subtitles, it is easy to forget why they were introduced in the first place beyond simple convenience or understanding foreign-language films.

The technology was originally promoted to ensure audiovisual media is accessible to deaf and hearing-impaired audiences.

Are subtitles widely used?

It depends on where you live.

For example, all English programming in the United States now requires closed captions, while 80% of the UK’s television content is subtitled.

A big win for the millions of people with hearing loss who understand English, but what about those who have visual impairments?

What do blind and low-vision people need?

In order to follow on-screen action, low-vision viewers require a voice-over that narrates what is happening during natural pauses in the dialogue.

This is known as the audio description.

Is audio description widely used?

Unfortunately not.

While audio description is a highly useful service, only 4% to 11% of productions in Europe are broadcast with one.

Why is there a lack of content with audio description?

The conventional audio description production process is expensive.

The process requires the involvement of scriptwriters, project managers, voice talent, and sound engineers.

Human labour aside, recording studio hiring fees add to the costs.

So if a production company has to stick to a limited budget, providing an audio description is usually seen as a low priority.

Woman counting out bills while working on her budget

So costs are the main factor. Are there any other reasons?

Yes. The traditional audio description process can also be time-consuming.

From writing the script to final export, all production stages require detailed planning and coordination.

In particular, audio editing can be extremely time-intensive when dealing with longer content – UK communications regulator Ofcom estimates that a two-hour film can take up to 60 hours to prepare.

Yet audio description is an important service. What legal parameters are in place?

Some countries such as Canada and the United Kingdom have introduced audio description quotas.

Yet in most cases, quotas are currently too low, hovering around 10%.

Until recently, EU regulations had been doing little to improve accessibility for the blind and low-vision people.

Soft legal wording meant there was no motivation for media service providers in member states to provide an audio description for their productions.

However, changes that came into effect on 19 September 2020 strengthened the relevant article wording, requiring production companies to increase audio description output.

So what's being done to increase audio description availability?

VIDEO TO VOICE, a Berlin-based tech company, wants to boost the number of productions made with audio descriptions.

The firm's long-term goal is to make audio description as widely available as subtitles.

Okay, tell me more about VIDEO TO VOICE.

In 2017, VIDEO TO VOICE started designing user-friendly solutions for making audio descriptions.

The tech firm's services include a modern authoring tool and a virtual recording studio for producing broadcast-ready audio descriptions.

All products are provided on a single browser-based platform and use text-to-speech technology.

What is text to speech?

Text to speech technology converts written text into a synthesised spoken voice.

How is text to speech used in VIDEO TO VOICE's software?

At the authoring stage, text to speech provides the user with a live preview of what their script sounds like.

In production, text to speech is used to deliver the audio description, instead of using a voice-over artist.

Is the quality any good?

In recent years, voice quality has vastly improved thanks to advancements in artificial intelligence.

The voices used in VIDEO TO VOICE's software are mostly indistinguishable from human voices.

Listen for yourself…

How many voices are available?

VIDEO TO VOICE's products provide an unrivalled selection of 544 voices across numerous languages.

This ensures the audio description creates the right mood for the production and fits the audience’s needs.

How do VIDEO TO VOICE's tools lower costs?

In the traditional workflow, audio description production is difficult to budget for due to variable costs.

By using VIDEO TO VOICE's solutions, production companies work instead with fixed costs.

The price is calculated beforehand depending on the project's length, i.e. the number of minutes of footage to be audio-described.

Audio description writers and production companies can customize their pricing plans in line with the amount of audio description work they usually do.

Our previous article explains where costs can be lowered in greater detail.

So unpredictable variable costs become manageable fixed costs. Can money be saved anywhere else?

Yes. Production companies also no longer need to consider recording studio and voice talent fees.

Corrections to the script can be simply made in the tool, meaning re-recordings are no longer necessary.

Eliminating the endless back-and-forth in the feedback cycle is just one way production becomes less costly and more expedient.

Woman writing costs on notepad while using calculator

How else does the software speed up production?

At the authoring stage, writers no longer lose time dealing with timecode, export or delivery problems.

Users can easily monitor and manage all stages of the workflow in one place.

Removing time-consuming steps such as voice recording and audio post-production results in shorter turnarounds.

VIDEO TO VOICE's software also automates the audio mixing and mastering stages.

In simple words, an audio description can be made in minutes instead of days.

What if the audio description needs to be provided in another language?

VIDEO TO VOICE uses artificial intelligence to simplify translation and localization processes.

DeepL machine translation technology is integrated into the software so that the audio description can be provided in other languages.

What are VIDEO TO VOICE's credentials?

From its very beginnings, VIDEO TO VOICE has been collaborating with audio description professionals and academic institutions such as the University of Hildesheim and the Zurich University of Applied Sciences throughout the software's development.

The firm's award-winning tools are well-regarded in the professional media scene.

Clients include national broadcasters such as MDR in Germany; the software has been used by clients to create audio descriptions for Netflix and online media libraries.

And let's not forget the bigger picture…

It is easy to forget that people with visual impairments enjoy watching TV and going to the movies, just like anyone else.

As things stand, most productions are inaccessible to them due to the lack of an audio description.

With 30 million low-vision people living in Europe alone, VIDEO TO VOICE's tools open up the gateway to an untapped consumer market that could pay huge dividends.

The company's goal of bringing audio description's availability in line with that of subtitling is certainly an ambitious one.

Media professionals now have the perfect platform for boosting media accessibility with innovative text to speech audio description technology.