DialectSafe: Bridging the ASR Gap
David Romero, Euruviel Marquez, Leonardo Torres y Roberto Balmes
DialectSafe: A Safety Audit of Multimodal ASR in the Global South addresses the critical issue of "Digital Deafness," where state-of-the-art automatic speech recognition systems fail systematically on peripheral, hyper-rural dialects underrepresented in standard metropolitan benchmarks. Focusing on rural Yucatecan Spanish—a variety heavily shaped by Mayan language contact—this project implements a rigorous, 100% reproducible evaluation pipeline running under a deterministic environment with a fixed random seed of 42 and zero temperature parameters. By auditing 15 field audio samples across two paradigms, the benchmark directly compares the transcription outputs of OpenAI's Whisper, a model unexposed to local indigenous speech dynamics, against Google’s advanced multimodal Gemini 2.5 Flash via OpenRouter. Rather than relying on aggregate metrics alone, the framework leverages advanced alignment processing to conduct a granular qualitative analysis, categorizing speech recognition failures into deletions, substitutions, and insertions. The evaluation exposes a dangerous upstream safety failure dubbed the "Domino Effect," wherein highly corrupted ASR text propagates into downstream Large Language Models, forcing standard AI assistants to generate confident, speculative, or completely irrelevant advice for requests the user never actually articulated. Ultimately, DialectSafe establishes a formal pre-deployment safety audit standard for Global South voice technologies, demonstrating that AI alignment and safety boundaries are ineffective if the system remains structurally incapable of hearing and respecting the cultural and linguistic reality of marginalized communities.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) DialectSafe: Bridging the ASR Gap
},
author={
David Romero, Euruviel Marquez, Leonardo Torres y Roberto Balmes
},
date={
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


