Invisible Languages

Can machine translation help to unlock the voice of millions of vulnerable Levantine Arabic speakers?

Currently the voices of millions in need are unheard as there is not enough human translators and it is not a market for for-profit companies. Our team explores the possibility of using machine translation to enhance the communication between Levantine Arabic speakers and aid organizations such the World Food Programme (WFP). If successful it can not only increase the impact of crisis respondents but give a sense of connectedness and dignity to those in need.

Illustration of the language barrier in the humanitarian context, source: Nick Lowndes / The Economist .

The challenge

It is estimated that about 130 million people are in need of humanitarian assistance and they speak about 3000 languages in 40 countries. Millions of those who are affected by crises, e.g. extreme hunger, natural disaster, conflict, do not speak a language understood by responders. There are just not enough trained human translators to address all the needs that often require prompt action.

Imagine if the person in need and the responder could communicate without speaking a common language. It would not only tremendously increase the impact of the responder, but give a sense of connectedness and dignity to the person in need, as speech is the most fundamental communication for humans.

There are three major reasons why current, commercial, machine translation engines, such as Google Translate or Bing Translator, cannot adequately address the challenge:

  • Commercial, closed-source engines do not meet the strict privacy and data security requirements of the humanitarian sector. For example by using Google Translate you send your text to Google servers located outside of your organization which is not acceptable for organizations like WFP, which require in-house deployment.
  • Commercial, closed-source engines are general and their lack of focus on the humanitarian context can lead to mistranslations with serious consequences.
  • Currently available engines are built to translate formal text, which, of course, is not the way people speak. In addition, they are built to serve educated, literate people and people who are affected by crises often have not had the opportunity to live in a place where they can get an education.

Our proposal and approach

In the WFP challenge of the Humanitarian Action Challenge we prototype a solution with the potential to achieve this dream: an open-source, computer based translation engine that is jointly developed with crisis context specific experts, translators, and machine learning experts. The project is the joint effort of Translators without Borders (TWB), Prompsit, and PNGK. It is part of TWB’s Gamayun: The Language Equality Initiative.

The project consists the prototyping of a text-to-text machine translation engine for Levantine Arabic (the language of 25 million Syrians) focusing on the context of food security. The engine that is currently under training is specifically designed for the food security context to provide reliable and highly accurate translation where it is needed the most. Compared with human translators, this engine can be indefinitely scaled, and continuously and collectively trained. Among others, the engine can be used for needs assessment, and giving and receiving feedback. Our proposal offers a promising solution to give voice to the currently unheard millions in dire need.

The datasets we collected focus on humanitarian domain language and colloquialisms. The team uses content from sources working in the Syrian humanitarian setting, including Mercy Corps, WFP, UNHCR and IFRC. With the well-edited datasets, the project team is currently training a machine translation model. The next step is the evaluation of the performance of the engine and assessment of the potential added value in the needs assessment in the food security context.

Ultimately, we look to radically change two-way communications in humanitarian response by scaling machine translation for marginalized languages and hope to unlock the voice of millions currently unheard.