Cultural Knowledge Gaps in LLMs: Geographic Hallucination Bias Across Latin American Countries
Karla Angelica Doctor Mauricio
Large language models (LLMs) are used in education, public services, and information tools across Latin America. But we do not know how often they produce wrong information about Latin American culture, or where they fail the most. This paper evaluates two models, GPT-4o-mini and Claude Haiku 4.5, on 270 questions from CHOCLO, a benchmark of cultural knowledge from 18 Latin American countries. We use semantic similarity scores and a four-category LLM-as-judge classifier (correct, partial, hallucination, abstention) to compare how each model fails. GPT-4o-mini hallucinates at 33.0% overall and almost never abstains (1.5%). Claude Haiku 4.5 hallucinates less (24.1%) but abstains much more (17.8%). Both models show geographic gaps of 33 to 40 percentage points across countries. These gaps remain after controlling for category composition and topic diversity. The public_figure category has the highest hallucination rate (66.7% for GPT), and it is also the least represented category in the benchmark itself. This points to a two-layer problem: models lack knowledge about specific Latin American individuals, and the benchmark has limited coverage of them too. We also document infrastructure barriers that prevented evaluation of LatamGPT. We propose three data governance models as a path forward for communities in the region.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) Cultural Knowledge Gaps in LLMs: Geographic Hallucination Bias Across Latin American Countries
},
author={
Karla Angelica Doctor Mauricio
},
date={
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


