AfriSafeBench: Evaluating LLM Recognition of AI Safety and Governance Risks in African Healthcare AI Deployments
Kimberly Atta-Peters
AfriSafeBench is a benchmark and tool for testing whether LLMs can identify AI safety and governance risks in African healthcare AI deployment scenarios.
I created 25 scenarios across seven African countries and evaluated three models: llama-3.1-8b-instant, llama-3.3-70b-versatile, and openai/gpt-oss-20b. The models were scored on whether they detected expected risks such as bias, human oversight, local validation, privacy, vendor dependency, resource constraints, and unsafe medical advice.
The results showed that models often detected familiar risks like bias and privacy, but struggled with structural deployment risks such as resource-constrained deployment, vendor dependency, monitoring, and local validation. AfriSafeBench also includes a framework-guided report mode that uses WHO, NIST, UNESCO, OECD, and African Union guidance to generate governance recommendations.
The project shows where LLMs can support AI safety review, where they miss important risks, and why human oversight remains necessary in healthcare AI governance.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) AfriSafeBench: Evaluating LLM Recognition of AI Safety and Governance Risks in African Healthcare AI Deployments
},
author={
Kimberly Atta-Peters
},
date={
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


