In Africa, rescuing the languages that Western tech ignores

Lagos, Dec 24 (U.S.): Computers have become amazingly accurate at translating spoken words into text messages and searching for vast amounts of information for answers to complex questions. At least, as long as you speak English or any other of the world’s dominant languages.

But try speaking to your phone in Yoruba, Igbo, or any number of widely spoken African languages and you’ll find the glitches that can hamper access to information, commerce, personal communications, customer service, and other benefits of the global tech economy.

Fukusi Marivat, Head of Data Science at the University of Pretoria in South Africa, said in a call to action ahead of the December virtual gathering of the world’s AI researchers.

American tech giants don’t have a great record of making their language technology work well outside of wealthier markets, a problem that has made it difficult for them to detect dangerous misinformation on their platforms, the AP reports.

Marivate is part of a coalition of African researchers trying to change that. Among their projects is one that found that machine translation tools failed to properly translate COVID-19 online surveys from English into several African languages.

“Most people want to be able to interact with the rest of the information highway in their local language,” Marivat said in an interview. He is a founding member of Masakhane, an African research project to improve how dozens of languages are represented in the branch of artificial intelligence known as natural language processing. It is the largest number of popular language technology projects that have emerged from the Andes to Sri Lanka.

Tech giants offer their products in many languages, but they don’t always pay attention to the nuances needed for these applications to work in the real world. Part of the problem is that there is not enough online data in these languages — including scientific and medical terms — for AI systems to effectively learn how to better understand them.

Google, for example, offended members of the Yoruba community several years ago when its language app mistranslated Esu, a deceptive, deceitful deity, as Satan. A misunderstanding of Facebook’s language has been linked to political strife around the world and its inability to quell harmful misinformation about COVID-19 vaccines. More mundane translation glitches have been turned into online joke memes.

Omolewa Adedipe was frustrated trying to share her thoughts on Twitter in Yoruba because her auto-translated tweets usually end up with different meanings.

Once, the 25-year-old content designer tweeted, “T’Ílù ò bà dùn, T’Ílù ò bà t’òrò. Èyin l’ęmò bí ş şe şé,” which means, “If the land (or country, In this context) not peaceful or happy, you are responsible for it.” However, Twitter managed to get the translation: “If you are not happy, if you are not happy.”

For complex Nigerian languages such as Yoruba, dialect marks – often associated with tones – make a huge difference in communication. For example, ‘ogun’ is a Yoruba word meaning war, but it may also mean a country in Nigeria ( Ógún), an iron god ( Ógún), a stab (Ógún), or a twentieth or monarchy (Ogún).

“Some bias is intentional given our history,” said Marivat, who has devoted some of his AI research to the South African languages of Xitsonga and Setwana spoken by members of his family, as well as the common conversation practice of “code-switching” between languages.

“The history of the African continent and in general of the colonial countries is that when a language had to be translated, it was translated in a very narrow way,” he said. “You were not allowed to write a generic text in any language because the colonial country might be concerned that people would communicate and write books about rebellions or revolutions. But they would allow religious texts.”

Google and Microsoft are among the companies that say they are trying to improve technology for so-called “low-resource” languages for which AI systems don’t have enough data. Computer scientists at Meta, the company formerly known as Facebook, announced in November a major advance on the path to a “universal translator” that could translate multiple languages simultaneously and work better with low-resource languages such as Icelandic or Hausa.

This is an important step, David Ifeoluwa Adelani said, but for now, only large tech companies and large AI labs in developed countries can build these models. He is a researcher at the University of Saarland in Germany and another member of Masakhane, which has a mission to advance and catalyze African-led research to tackle technology “that does not understand our names, our cultures, our places and our history”.

Improving systems requires not only more data but also careful human review from native speakers who are under-represented in the global technical workforce. It also requires a level of computing power that is difficult for independent researchers to access.

Writer and linguist Kula Tobosun created a multimedia dictionary of the Yoruba language and also created a text-to-speech machine for the language. He is now working on similar speech recognition technologies for the other two major languages in Nigeria, Hausa and Igbo, to help people who want to write short sentences and syllables.

“We finance ourselves,” he said. “The goal is to show that these things can be profitable.”

Toboson led the team that created the “Nigerian English” voice and accent from Google used in tools like Maps. But he said it was still difficult to raise the money needed to build technology that could allow farmers to use an audio tool to follow market trends or the weather.

In Rwanda, software engineer Remy Mohair is helping build a new open source speech dataset for Kinyarwanda that includes lots of volunteers who record themselves reading Kinyarwanda newspaper articles and other texts.

“They are native speakers. They understand the language,” said Mozilla, a fellow at Mozilla, maker of the Firefox internet browser. Part of the project involves collaborating with a government-backed smartphone app that answers questions about COVID-19. To improve AI systems in various languages In Africa, Masakhan researchers are also tapping into news sources across the continent, including the Hausa Voice of America and BBC Igbo broadcasts.

Damien Blasey, who researches language diversity at the Harvard Data Science Initiative, said people are increasingly challenging to develop their own language approaches rather than waiting for elite institutions to solve problems.

Blasey co-authored a recent study that analyzed the uneven development of language technology across more than 6000 languages in the world. For example, it found that while Dutch and Swahili each have tens of millions of speakers, there are hundreds of scholarly reports on natural language processing in Western European language and only about 20 in East African language.

Source link

In Africa, rescuing the languages that Western tech ignores

Related

Leave a Comment Cancel reply