AfricaDailyAI
← Index

New Multilingual Dataset BOUTEF Advances AI Fight Against North African Fake News

The rapid spread of fake news on social media poses a significant challenge, especially in linguistically diverse and under-resourced regions like North Africa. To tackle this issue, researchers have developed BOUTEF, a large-scale multilingual corpus specifically designed to analyze the propagation, characteristics, and impact of fake news within Algeria and Tunisia. This resource is vital for understanding the complex dynamics of misinformation in these specific contexts.

BOUTEF integrates three complementary components: fake narratives, genuine narratives, and associated user-generated comments, along with verified debunking information. The corpus covers a broad spectrum of languages and linguistic varieties, including Modern Standard Arabic (MSA), Algerian and Tunisian dialects, Arabizi, French, English, and code-switched language. This comprehensive linguistic coverage makes it an invaluable asset for training AI models to detect nuanced forms of misinformation prevalent in the region.

Building on this dataset, a thorough empirical analysis was conducted using both quantitative and qualitative approaches. Key findings indicate that fake news heavily relies on emotionally charged narratives, sensational framing, and hybrid linguistic practices to enhance its virality and audience engagement. In contrast, debunking content typically employs a more factual and verification-oriented style. A comparative analysis between Algeria and Tunisia also revealed both shared patterns and country-specific characteristics influenced by their unique sociopolitical environments.

This research highlights the critical role of informal language practices in the diffusion and reception of misinformation across North Africa. By providing a rich, annotated, and publicly available dataset, BOUTEF significantly contributes to advancing research in fake news detection, low-resource language processing, and a deeper understanding of information disorders within complex multilingual settings, offering a foundational tool for future AI development in the region.

More in research

The dispatch

One email a day. The AI stories shaping Africa.

Rewritten for clarity, sourced always. No spam; unsubscribe anytime.