A group of South African researchers is working to ensure African languages are not left behind as artificial intelligence reshapes the way people access information, communicate and learn.
Researchers at the University of Cape Town have joined colleagues from three other universities in a national collaboration to develop AI tools that better understand and serve African languages, including isiXhosa, isiZulu and Sepedi. The project is supported by the National Research Foundation and the Telkom Centers of Excellence Programme, which has funded information and communications technology research in South Africa for more than two decades.
The collaboration is led by Professor Matthew Adigun and Professor Alfredo Terzoli of the University of Zululand, Associate Professor Thippe Modipa of the University of Limpopo, Dr Phumzile Nomanga of the University of Fort Hare and Associate Professor Melissa Densmore of UCT. It will fund master's, doctoral and postdoctoral researchers at all participating institutions and will run until 2027.
“This is one of the first projects where centers of excellence are working across different institutions,” Densmore said. “The idea is to build collaboration between universities while developing new innovations and technologies in the ICT sector.”
At the heart of the project is the development of large language models – the AI systems that power chatbots and digital assistants. But building such systems for African languages faces different challenges. Most existing AI language models are trained on large amounts of digital text collected from the Internet, and for many African languages, such data is scarce.
“The amount of text available in languages like isiZulu or isiXhosa is much smaller than in English or other widely used languages,” said Dr Jan Buys, a UCT researcher involved in the project. “So, one of the challenges of research is how to develop models that still work effectively, even when the available data is limited.”
To address the data gap, researchers are exploring underutilized sources of language data, including printed materials in libraries and archives that have never been digitized, and exploring new techniques to train language models more efficiently when data is scarce. The linguistic structure of African languages presents an additional technical challenge. “These languages are morphologically complex,” Beuys said. “The structure of words can be quite complex, so we need algorithms that can handle that complexity.”
The researchers also emphasize that building AI systems for African languages raises important ethical and social questions. The team plans to consult language experts, AI experts, and native speakers about the technology's broader implications.
“We want to talk to people who speak these languages about the potential impact of AI tools and what the trajectory of this kind of research should look like,” Densmore said. “It's about shaping global AI knowledge rather than importing and using technologies created in other parts of the world.”
The stakes are important. Many widely used AI systems have difficulty providing accurate responses when users ask questions in low-resource languages, whose answers may be poorly translated, poorly formulated, or simply wrong. This could have serious consequences for health care, Densmore said. “If someone is looking for health information and the system gives wrong or misleading answers – that becomes a real problem from a misinformation perspective.”
Community involvement is also a major goal. Densmore said that through previous research he has observed how people want technologies that reflect the languages and dialects they use in daily life. “In the community we worked in, people said they wanted a chatbot that spoke their local dialect – the language they use at home,” he said. “It will feel like something that belongs to them.”
Their long-term vision is for communities to be able to create their own digital tools in their own languages. “Whether they are powered by language models or other types of AI, the key is that communities have ownership over them,” she said.
