Although Africa is home to a huge proportion of the world's languages – well over a quarter according to some estimates - many are missing when it comes to the development of artificial intelligence (AI).

This is both an issue of a lack of investment and readily available data. Most AI tools, such as ChatGPT, used today are trained on English as well as other European and Chinese languages. These have vast quantities of online text to draw from.

But as many African languages are mostly spoken rather than written down, there is a lack of text to train AI on to make it useful for speakers of those languages. For millions across the continent, this means being left out.

Researchers trying to address this issue recently released the largest known dataset of African languages, developing AI-ready datasets in 18 African languages through the African Next Voices project.

In two years, the team recorded 9,000 hours of speech across Kenya, Nigeria, and South Africa, capturing everyday scenarios in farming, health, and education. The project is backed by a $2.2 million Gates Foundation grant and aims to make this data accessible for developers to create tools in local languages.

Farmers like Kelebogile Mosime are already benefiting from technology like the AI-Farmer app, which helps solve various farming challenges in the user's home language. The hope is that these efforts will not only enhance access to technology but also allow for greater representation of African culture and history through language.

As Prof. Marivate notes, Language is access to imagination and integrating indigenous languages into AI is crucial for preserving knowledge and understanding within communities.