South African businesses are increasingly introducing AI voice agents, but most were created in the United States – ones that sound American, “think” in English and process calls through servers on the other side of the world. Researchers at the University of Cape Town and Cape Town startup Untappd AI argue that the problem runs deeper than vendors want to admit.
Voice is still the dominant customer service channel in South Africa, and when it breaks down, customers hang up. Cape Town startup Untappd AI began investigating the issue 18 months ago. The company has a call center with over a decade of operational data, and that data points to a specific pattern — drop-off rates increase when customers hear robotic or foreign-sounding voices.
“Our data shows that drop-off rates are significantly higher when customers encounter robotic or foreign accents,” said Lloyd Matthews, CEO of Untappd AI. The company recorded 20 South African voice agents in a studio rather than artificially creating them. Matthews said the distinction matters because South Africans use slang and expressions that do not occur on international platforms. Untapped AI currently supports English and African languages, with IsiZulu and IsiXhosa in development. For now, when a caller switches to an African language, the agent asks them to continue in English or African.
Accent is only part of what makes voice AI fail locally. Bruce von Maltitz, CEO of South African contact center and hosted telephony provider 1Stream, argues that most businesses are treating AI as a software problem when it is actually a telephony engineering problem. The threshold that matters, von Maltitz said, is about 400 milliseconds – react faster than 300 ms and the bot feels abrasive; Slower than 700ms and the customer realizes they are talking to a machine that can't keep up.
“You can start an AI business from scratch and be proficient in the software, but delivering a high-quality voice experience requires a deep understanding of legacy telephony, routing, and local infrastructure,” von Maltitz wrote in an analysis shared with TechCentral. The common failure mode, he said, is the “bolt-on” approach – companies simply layer AI on top of existing telephony without taking into account the latency that accrues at each step. In South Africa, where calls are often routed through international servers, latency increases.
Untappd AI says local hosting is at the heart of its offering. Matthews said the company routes calls through South African data centers and holds ISO 27001 certification through its own voice platform. “By having control over the entire stack, from GPUs to distributed AI voice agents, we can provide the assurances needed to ensure data security,” Matthews said.
Accent and latency are solvable engineering problems. The third challenge – language – is not there.
UCT's Department of Computer Science published research this month introducing MjansiLM, which it describes as the first publicly available AI language model trained on all 11 official written languages of South Africa. The team has made the model available for free to researchers and developers. “In language modeling, languages are primarily considered low-resource because there are very few and small textual datasets available in these languages for training language models,” said Jan Buys, a senior lecturer at UCT and one of the lead researchers on the project.
Nine of South Africa's 11 official languages fall into that low-resource category – and the gap is still stark, with isindebele and Sepedi representation severely under-represented even within MzansiLM. Asked what happens when a South African customer speaks isiZulu or Sepedi into a deployed AI system, Buys said some of the larger models have limited isiZulu support, while another common approach is to translate the query into English and process it there. He can't say for sure what businesses are doing in practice – and, he said, uncertainty is the issue.
Buys said enterprises purchasing equipment that claims to be “built for South Africa” should press vendors on two things – what language support actually means and how it can be verified, and whether the underlying model understands the South African context, regulation and business environment, or whether it is an international model with a local label. “The more transparency there is, the better one will be able to assess these things,” he said.
They also took a risk that most enterprise discussions don't reach. As the AI voice becomes further localised, it becomes a more effective fraud tool – a voice agent that sounds authentically South African can target elderly consumers who would otherwise identify foreign-sounding calls as suspicious. “There are broader social and regulatory issues that one has to think about as they come up,” Buys said.
Untappd AI said it has more than 5,000 agents currently live – a figure Matthews said includes deployments across the group's own companies and call centers that are already offering services. He expects Microsoft and Amazon to properly localize for South Africa within 12 to 18 months, and is betting it will be difficult to displace the market share they have built up before then.
Buying takes a long-term view. MazanCLM is not a product but proof that South Africa needs to build its own AI capabilities rather than rely on the willingness of large US companies to support local languages, he said. “We should not rely solely on the support that a large American company can provide,” he said.
The MjansiLM model has 125 million parameters – small by commercial standards – and Buys acknowledged that larger models would still outperform this in most practical applications today. But the question for enterprise buyers, he said, is not whether the research outperforms commercial alternatives. The point is whether the business tools South African businesses are already paying for are performing better.
