TriloByte: Evaluating LLMs on Bolivian Quechua Through a Ground-Truth-Based Framework for Low-Resource Languages
Andy Brandon Garcia Espinoza, Sebastian Martinez, Niko Witczak, Max Baldiviezo, Joel Brugmann
This project evaluates the ability of Large Language Models (LLMs) to understand and define vocabulary from Bolivian Quechua, an underrepresented indigenous language. Using a bilingual Quechua–Spanish dictionary as ground truth, we compare model-generated definitions against dictionary entries through semantic similarity metrics and human evaluation. Our methodology combines embeddings and expert judgment to measure adequacy, completeness, and fluency, providing insights into how well modern LLMs capture the meaning of low-resource languages. The results highlight current limitations and opportunities for improving AI systems in linguistically underrepresented communities.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) TriloByte: Evaluating LLMs on Bolivian Quechua Through a Ground-Truth-Based Framework for Low-Resource Languages
},
author={
Andy Brandon Garcia Espinoza, Sebastian Martinez, Niko Witczak, Max Baldiviezo, Joel Brugmann
},
date={
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


