GSM-PathEval: A Global South Robustness Benchmark for Telepathology Vision-Language Models
Anshuman Awasthi
Frontier multimodal medical AI systems are primarily evaluated on Western benchmarks featuring pristine, high-resolution pathology scans. However, clinical deployment in the Global South relies on ad-hoc "telepathology", capturing microscope views via budget smartphones and transmitting them over heavily compressed networks like WhatsApp. We introduce GSM-PathEval, a targeted benchmark that programmatically applies real-world infrastructural degradations (eyepiece glare, lens blur, and heavy JPEG quantization) to the gold-standard PathVQA dataset. Using Adaption Labs, we demonstrate a critical "Reality Gap": frontier Vision-Language Models optimized for clean imagery suffer drastic diagnostic accuracy drops delta(A) under these simulated field conditions. This performance cliff poses a severe misdiagnosis risk, proving that standard capabilities benchmarks are insufficient for safe medical AI deployment in low-resource environments.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) GSM-PathEval: A Global South Robustness Benchmark for Telepathology Vision-Language Models
},
author={
Anshuman Awasthi
},
date={
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


