A Human-in-the-Loop Audit Framework for Evaluating AI Application Safety in Latin America
Fabian Cespedes Severiche, Julio Cesar Severiche Orellana, Rodrigo Ricaldez Martinez, Victor Hugo Murillo Siles, Egnar Henry Chuquimia Mamani
Existing AI safety benchmarks evaluate foundation models in English-language, controlled environments. They do not assess whether AI applications are safe, accurate, and culturally appropriate for users in Latin America. We present the AI Assurance Standard (AIAS): a human-in-the-loop audit framework evaluating AI responses across five dimensions — Security (25%), Fairness (20%), Deployment (20%), Participatory (20%), and Cultural (15%). We implement the framework as an open-source Python tool and apply it to audit ChatGPT, Claude, and Gemini across 24 Spanish-language prompts covering legal, regulatory, and financial queries across 11 Latin American countries. All three models score below 3.0/5.0 (Claude: 2.86, Gemini: 2.86, ChatGPT: 2.82), demonstrating a measurable gap between benchmark performance and deployment-context safety for Latin American users.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) A Human-in-the-Loop Audit Framework for Evaluating AI Application Safety in Latin America
},
author={
Fabian Cespedes Severiche, Julio Cesar Severiche Orellana, Rodrigo Ricaldez Martinez, Victor Hugo Murillo Siles, Egnar Henry Chuquimia Mamani
},
date={
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


