AfricaDailyAI
← Back Home
ResearchJul 5, 2026BeninNigeriaNiger93% confidence

New AI Models Boost Speech-to-Text for Low-Resource West African Languages Fongbe and Hausa

This research addresses the critical challenge of developing AI for African languages, which often suffer from a severe lack of digital text corpora essential for training robust language models. The study specifically investigates the potential of Automatic Speech Recognition (ASR) pipelines to expand these crucial text resources, focusing on two typologically distinct West African languages: Fongbe and Hausa.

The methodology involved fine-tuning the MMS-300M model on a curated 12.3-hour Fongbe dataset, resulting in a remarkable 78% relative reduction in Word Error Rate (WER) on the ALFFA benchmark, while crucially maintaining the tonal diacritics vital to the language's meaning. For Hausa, an existing fine-tuned Whisper-Small model was applied to process a substantial subset of 45.49 hours from a catalog of 236 YouTube video hours, yielding 6,770 transcribed segments.

Human evaluation of the transcribed segments revealed mean quality scores of 57.4/100 for Hausa and 36.5/100 for Fongbe. These results suggest that while Hausa transcriptions are nearing acceptable quality for direct corpus construction, Fongbe transcriptions would benefit from further post-processing or more advanced models to achieve production-ready quality. The researchers emphasize their commitment to open science by releasing the curated datasets, fine-tuned models, transcribed corpus, and the full video catalog, adhering to ethical guidelines.

The significance of this work for Africa is profound. By developing and open-sourcing ASR models and foundational datasets for these low-resource languages, the research directly contributes to greater linguistic inclusion within the global AI landscape. This effort is crucial for fostering the development of AI applications that can genuinely serve diverse African populations, preserve rich cultural and linguistic heritage, and help bridge the digital divide for millions who speak languages currently underrepresented in global AI systems.

More in research

The dispatch

One email a day. The AI stories shaping Africa.

Rewritten for clarity, sourced always. No spam; unsubscribe anytime.