Three Detectors, Three Failure Profiles: Detector-Specific Demographic Bias in Public Deepfake Detection and Its Implications for Content Moderation in Latin America
jose ernesto ramirez GARCIA, Angel Pineda Aguirre, Daniel Alberto Curiel Vargas
This project audits whether three public Hugging Face deepfake detectors protect demographic groups equally in content moderation contexts relevant to Latin America and Mexico’s Ley Olimpia. It measures false-negative rates, meaning synthetic faces classified as real, using real FFHQ/Flickr faces and synthetic StyleGAN faces from the 140k Real and Fake Faces dataset. Apparent demographics were labeled with FairFace. The main finding is that bias is detector-specific: the highest false-negative group changes by model—Black for prithiv v1, Asian for dima806, and White for prithiv v2. Only dima806 shows a statistically significant racial difference, while prithiv v1 shows a significant gender difference. The audit also finds that strong advertised model metrics do not transfer reliably: dima806 and prithiv v2 perform near chance at the default threshold. The project concludes that detector fairness must be audited per deployment using disaggregated error rates and tested under real operating conditions before use as protection tools.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) Three Detectors, Three Failure Profiles: Detector-Specific Demographic Bias in Public Deepfake Detection and Its Implications for Content Moderation in Latin America
},
author={
jose ernesto ramirez GARCIA, Angel Pineda Aguirre, Daniel Alberto Curiel Vargas
},
date={
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


