How Neural Machine Translation Makes Video Localization Easier Than Ever

In addition to audio description, VIDEO TO VOICE is actively involved in other fields surrounding accessibility for digital media. To break down language barriers, VIDEO TO VOICE's software includes a neural machine translation service. With lower costs and fewer delays, this option is a popular choice for clients wanting to create video content in multiple languages.

Jokes about online translation tools are so last decade. When the first real-time machine translation services emerged in the early 2000s, they were ridiculed for producing questionable results. At best amusing, and at worst wildly offensive, translations created this way would often be too literal and stray away from the intended meaning of the original text. 

But as technology has improved in recent years, there have been noticeably fewer "translation fail" memes doing the rounds on social media. The latest developments in machine translation involve neural networks. These are subsets of machine learning that try to mimic how the human brain works during the translation process.

Intrigued? Read on to find out everything you need to know about the language service industry, where providers are having trouble, and how neural machine translation can solve their problems. 

neural machine translation a graphic of a document with a symbol for latin script and a symbol for Chinese script

Why is translation important?

It's simple – we all want access to content and information that is naturally communicated in our own language.

From a business standpoint, effective communication helps companies improve their brand image, boost customer satisfaction, and increase productivity of staff.

As well as keeping existing customers happy, translated content can help companies expand into other markets. This is backed up by findings in the 2020 Common Sense Advisory Report "Can't Read, Won't Buy":

  • 80% of people prefer to make a purchase in their own language
  • 85% of people decide to make the purchase when information is in their own language
  • 72% of people will only buy products sold in their preferred language
  • 74% of people are more likely to make a repeat purchase if after-sales is in their own language
  • 73% of people want product reviews in their own language

What is the current state of play in the language services industry?

The global language services industry was valued at $49.6 billion in 2019. The market is forecasted to reach $77 billion by 2024, spurred on by increasing demand for video content in particular.

By 2022, 82% of internet traffic will be video content, a 15-fold increase in the number of videos from 2017. Over 4 in 5 businesses now use video as part of their marketing campaigns.  

With more video content comes the need for more translation and localization. But as production increases, an ever smaller fraction of videos are being localized.

woman watching a tutorial video on her computer while taking notes

Why isn't enough content being translated or localized?

Language service providers are not equipped to keep up with the workload.

This is mainly due to time and cost factors

The traditional workflow usually follows this structure:

  1. the client commissions a language service provider to do the translation
  2. the language service provider assigns a translator to do the translation
  3. the finished translation is reviewed by a second translator
  4. the language service provider delivers the translation to the client

The translation itself is usually the most time-consuming step. If there are tight deadlines, potential clients may choose to forego the translation altogether. 

Money is another important consideration. 

With average translator rates varying between $0.07 to $0.15 per word, potential clients may not have the budget to get the translation done.

If audio needs to be translated, the client also has to consider voice artist and studio costs for the new recording.

This is why language service providers need to embrace machine translation to meet demand.

What is machine translation?

Machine translation automatically converts a source text from one language into a target text in another language.

Why is machine translation needed?

Machine translation saves the language service provider time and money, making it more likely that potential clients will choose to translate their content.

With machine translation, the target text is ready in seconds. This also eliminates the costs involved at the actual translation stage.

The process is simple. After the translation is generated, a post-editor looks over the text. 

What does the post-editor do?

Post-editing is the stage where human linguists make any necessary changes to the translation for the finished product.

Neural machine translation: a man takes a sip from a mug while doing post editing on a text on his computer

Are there any quality issues with machine translation?

The earliest machine translation technologies were developed using a rule-based system. This means the tool would translate sentences word for word based on dictionaries and set grammar rules.

Unfortunately, the rule-based approach often failed to take into account the context and overall meaning of the text. These errors are often lampooned in humorous translation fail memes doing the rounds on clickbait sites.

The rule-based system has since given way to more refined models such as statistical machine translation and neural machine translation.

What are the statistical and neural machine translation models?

Both statistical machine translation (SMT) and neural machine translation (NMT) are AI-based and use corpora

Corpora are texts that have been written and translated by professionals in different languages, which can be compared side by side. 

For example, EU Parliament documents need to be translated into each member state's language. Therefore, these parallel data sources are a good example of the sorts of texts that can be used in corpora.

How does it all work?

SMT matches up equivalent words and expressions in texts, while NMT learns from them using neural networks

Neural networks are a subset of machine learning that attempt to replicate how the brain works when translating between two languages.

NMT accounts for the context in which a word is used, instead of just translating each word on its own. 

For example, the technology recognises whether the text is using a formal register or slang. If a user makes any corrections, the system also updates itself with the new translation.

As a result, these sophisticated systems achieve high-quality results and have since become the industry standard

Neural machine translation: A complicated structure in a night time cityscape mimics the way biological neurouns signal one another in the brain

Who are the major players in machine translation?

The following providers currently dominate the market:

Google Translate:

  • launched in 2006
  • around 100 billion words translated per day
  • auto-detection to automatically translate web pages 

Amazon Translate:

  • launched in 2017
  • customisable options for website localization
  • 2 million-character limit per month in free version

Microsoft Translator:

  • launched in 2009
  • 58% of customers based in the United States
  • 2 million-character limit per month in free version

DeepL:

in the foreground a man is on his phone. In the background the world s flags appear in a circle on a wall outside the world bank

Which languages are covered?

DeepL recently expanded its services to support 23 languages with the promise of more on the way.

Amazon Translate supports 54 languages and dialects, resulting in 2,804 language pairs.

Microsoft Translator has 90 languages and dialects available on the platform.

In terms of quantity, Google Translate comes out on top with 109 languages to choose from.

How do the largest machine translation providers compare in terms of quality?

It is difficult to determine which provider delivers the best results.

One key factor is the quality of the training data available for a particular language pair. Training data is a large dataset that is used to teach machine learning models.

In-house translators at phrase.com conducted a blind test to evaluate target texts generated by machine translation providers. Each pair had a different first preference:

Machine translation performance
Language pair 1st preference 2nd preference 3rd preference 4th preference
English to French Microsoft DeepL Google Amazon
English to German DeepL Microsoft Amazon Google
English to Russian Google Amazon Microsoft DeepL


As the results show, it is important to shop around and read evaluations to find the best provider for your language pair.

Subscribe to our Blog

Be the first to receive fresh content hot off the press. Get up to speed on text-to-speech audio description, learn how to use it, and take your skills to another level.
You can unsubscribe at any time informally (e.g. via a link in an e-mail).

So VIDEO TO VOICE is working with neural machine translation?

Neural machine translation technology has been integrated into VIDEO TO VOICE's production tools.

This option is particularly popular with clients who need to produce multilingual versions of their videos

For example, Swiss companies often provide content in the four national languages: German, French, Italian, and Romansh.

On VIDEO TO VOICE's platform, users can generate a version of the video in another language in seconds. A second user can then be added to the project to perform any post-editing on the translated text.

Once approved, human-like synthetic voices read out the translated audio. The new version of the audio is then mixed with the original soundtrack and mastered to professional broadcasting standards.

Summary

The neural approach has transformed machine translation's reputation. With demand for video growing at an exponential rate, neural machine translation, combined with post-editing, is a reliable solution for localizing content quickly and at low cost. Language service providers now need tools such as VIDEO TO VOICE'S platform to keep up and increase output. 

We work with leading experts from academic institutions in our software's development:

zhaw Logo Uni HIldesheim Logo