Skip to main content

Why Africa needs a ‘Google Translate for science’

Some Indigenous languages don't have words for common scientific concepts. AI could help.

By Sibusiso Biyela

A few years ago, I got an assignment to write an article about a new dinosaur discovery in South Africa — for a science website that publishes in my native language of isiZulu. It wasn’t easy. The first problem was that there was no isiZulu word for “dinosaur,” or for “fossil.” I had to figure out a way to convey the scientific discovery in an African language incapable of talking about science.

My workaround was coming up with a new term for dinosaur, isilwane sasemandulo, which translates to “ancient animal.” While this term served its purpose in the article, I still didn’t have a bespoke term for dinosaur. I knew that I couldn’t coin this and other science terms by myself. I’d need to build a consensus among scientists, linguists, and translators — who understand that, globally, knowledge often begins with having the right words.

Ask anyone what the language of science is and the answer you’ll most likely get is English. And to some degree, that is true. The biggest journals in science are published in English, and scientists use English to communicate among themselves from across the world. In casual conversation, the Zulu language speakers of South Africa often say, “izinto zabelungu” — literally, “a thing for the white people” — when referring to anything scientific or technological.

But now, Indigenous peoples in once-colonized African countries are engaging in their own solutions using science. Projects like AfricArXiv (pronounced “Africa archive”), a repository of African research by African scientists, aims to usher in an African Renaissance.

The natural next step is to enable scientists, teachers, journalists, and science communicators to discuss and talk about science in African languages. But so far, there haven’t been enough resources or political will to adapt African languages to scientific discourse, the same way Afrikaans was adapted for science in apartheid-era South Africa.

The Decolonise Science project, where I am the lead science writer and “decolonisation specialist,” aims to change that — by leveraging artificial intelligence.

Science communication in local languages democratizes science

The problem of Africa’s science-language barrier is much bigger than my inability to write about dinosaurs. African institutions contribute to less than 1% of the world’s published research, but Africa suffers the highest disease burden on Earth. Clearly, the continent does not do enough science. The African Union has set a goal for African governments to contribute 1% of their GDP to research, in order to develop the continent.

Improving homegrown scientific understanding would not only boost Africans’ ability to conduct science on the continent, but also help local communities control their own destinies. For example, I helped a waste disposal company translate its complex emissions report into isiZulu, because the community did not trust a report that it could not understand.

I rewrote the complex scientific language about chemicals and atmospheric processes into simple English, then translated that version into isiZulu. In the end, this effort did not help the company, but it did help the community. They expelled the company from the area — and they did it with a better understanding of the science.

“It could be a revolutionary idea. It would basically be a democratization of information, which is really valuable.”

Heather Littlefield, director of Northeastern University’s linguistics program

Still, the work that goes into making any language capable of explaining science can seem insurmountable, if it means starting from scratch. That’s where the Decolonise Science project comes in.

Decolonise Science was cofounded by Jade Abbott, a Johannesburg-based software engineer and data scientist, and was supported by the Lacuna Fund, a collaboration of several international foundations and the Canadian and German governments.

Abbott envisions a tool that could instantly translate a complex scientific text into any of six African languages: the West African languages Yoruba and Hausa, the East African languages Luganda and Amharic, and the southern African languages isiZulu and Northern Sotho. “The long-term goal of the project is to have a Google Translate for science in African languages,” she says.

Ultimately, Abbott wants Decolonise Science to create easily usable tools, such as plug-ins for Google Docs or Microsoft Word, to access the growing list of translated science terms the project is churning. If we succeed, AfricArXiv will be able to translate the research submitted to them so that universities can create content in local languages, like the work the University of KwaZulu-Natal is doing.

The final product would be useful for anyone trying to create science content in an African language, including scientists who want to create knowledge in their native tongue. It could also help professional translators looking to standardize African languages for science.

“It could be a revolutionary idea,” says Heather Littlefield, the director of Northeastern University’s linguistics program. “It would basically be a democratization of information, which is really valuable.”

Littlefield, who has taught a class on African linguistics, says the effort reminds her of a debate in late-1600s and early-1700s Europe. “Right around Sir Isaac Newton’s time, there was a huge battle in Europe about whether or not to use Latin as the language of academia or whether to use local languages like English, French, German, and Russian,” she says. Translations in local languages won out because they outsold the Latin texts. “If you only write in Latin, you have constrained the information to a few people. But if you write in English and then translate it into French, some farmhand out in the middle of nowhere, who’s interested in this, could learn about it and revolutionize something.”

Similarly, Littlefield says of Decolonise Science, “you expand the brain power that is possible through this project.”

African languages are as capable as any others

The challenge of translating scientific terms became clear in the early stages of the project, when we appointed professional translators from each of the six languages to translate African research articles from their original English. The papers ranged from pharmacology to biochemistry to physics research from the last two years.

The translators had no issue understanding basic English. But we came to realize that a lot of field-specific science vocabulary requires a different approach to translation. For example, the words “work,” “energy,” and “power” have meanings that most people take for granted in everyday use, but in physics they have technical meanings.

Some academics, seeing that challenge, have asked whether science can be translated into African languages at all. In a 2009 study, Rosemary Wildsmith-Cromarty, a professor of applied linguistics at the University of KwaZulu-Natal in South Africa, analyzed professional translations of academic science documents into isiZulu, and found that many scientific terms were loosely translated in ways that lost their scientific meaning.

For instance, when translated directly into isiZulu, the term “condensation” becomes ukujiya, which means to “thicken” or “congeal” — not an accurate description if you’re talking about how water vapor changes into a liquid.

But we at Decolonise Science believe that African languages are as capable as any others of hosting scientific discourse. It’s just that, because they weren’t developed for science, they need an extra step.

The project has employed science-writing specialists to clarify scientific jargon for the translators. But this step, we found, can be painfully slow. So we aim to use machine learning to brute-force the translation of technical science texts into clearer language, then use those simplified texts as the basis for the African-language translation.

The task is not as hard as it would have been a decade ago, thanks to advances in natural language processing, says Byron Wallace, a professor of computer science at Northeastern University. Still, the best machine learning tools are only as good as the data that’s fed to them, he says, so they’re limited by how much data is available and how long it will take to create each plain-language summary.

At Decolonise Science, it takes about four hours for one writer to create each summary. In his own work, Wallace has used up to 4,000 summaries to create an effective program.

“It can be very expensive to compile these training datasets, especially for scientific articles,” Wallace says.

We still have a lot of work to do before we can even start automating the African language translations. Once we do, we’ll involve more linguists and translators, scientists who speak African languages, science communicators, and teachers, because we need all the help we can get.

“We have to do a lot of the groundwork now,” says Abbott, “to make this ultimate science translation tool a reality.”

Published on

Sibusiso Biyela is a writer based in Johannesburg, working as a science communicator at ScienceLink.

Illustration by Klawe Rzeczy


How to prepare for sentient AI

And why the experts hope it never happens

By Schuyler Velasco