Feb 1, 2026

Global AI Bias Audit for Technical Governance

Jason Hung

This project is the exploratory phase of Phases 3-4 of my milestone-based, ongoing Global AI Dataset (GAID) Project. In this exploratory project, I used the version 2 GAID dataset (published on Harvard Dataverse) as a framework to stress-test the open-weight Llama-3 8B model and evaluate geographic and socioeconomic biases in technical AI governance awareness. By stress-testing the model with 1,704 queries across 213 countries and eight technical metrics, I identified a significant digital barrier and gap separating the Global North and South. The results indicate that the model was only able to provide number/fact responses in 11.4% of its query answers, where the empirical validity of such responses was yet to be verified. The findings reveal that AI's technical knowledge is heavily concentrated in higher-income regions, while lower-income countries from the Global South are subject to disproportionate systemic information gaps. This disparity between the Global North and South poses concerning risks for global AI safety and inclusive governance, as policymakers in underserved regions may lack reliable data-driven insights or be misled by hallucinated facts. The research suggests that current AI models (at least the Llama-3 8B model) have yet to be maturely developed enough to serve as reliable tools for global technical governance. The digital barriers and gaps identified in this paper must be addressed through more inclusive data representation in model training and more transparent alignment processes to ensure that AI benefits—including safety, fairness and readiness—are accessible to all countries, regardless of their geographical location or income classification.

Reviewer's Comments

Reviewer's Comments

Arrow
Arrow
Arrow
Arrow
Arrow

The authors propose a reasonable strategy for measuring a policy-relevant model capability—ensuring models are not substantively worse at answering questions about some countries versus others. My core criticism is that they do not spend enough time thinking carefully about what information their evaluation strategy actually captures. The query year postdates the model's training cutoff, so refusals may reflect temporal limitations rather than geographic bias. And the response categories are too coarse. More time spent reading through outputs and interpreting what different failure modes actually mean would strengthen the conclusions considerably.

The research question is important and well-motivated. If policymakers increasingly rely on LLMs, geographic knowledge gaps have real governance consequences, and the GAID dataset provides a solid ground-truth benchmark. However, the experimental design conflates the model's training data limitations with bias. The author acknowledges that Llama-3 8B may not have access to 2025 data, but still draws strong conclusions about geographic exclusion without controlling for this confound.

Cite this work

@misc {

title={

(HckPrj) Global AI Bias Audit for Technical Governance

},

author={

Jason Hung

},

date={

2/1/26

},

organization={Apart Research},

note={Research submission to the research sprint hosted by Apart.},

howpublished={https://apartresearch.com}

}

Recent Projects

View All

View All

Feb 2, 2026

Prototyping an Embedded Off-Switch for AI Compute

This project prototypes an embedded off-switch for AI accelerators. The security block requires periodic cryptographic authorization to operate: the chip generates a nonce, an external authority signs it, and the chip verifies the signature before granting time-limited permission. Without valid authorization, outputs are gated to zero. The design was implemented in HardCaml and validated in simulation.

Read More

Feb 2, 2026

Fingerprinting All AI Cluster I/O Without Mutually Trusted Processors

We design and simulate a "border patrol" device for generating cryptographic evidence of data traffic entering and leaving an AI cluster, while eliminating the specific analog and steganographic side-channels that post-hoc verification can not close. The device eliminates the need for any mutually trusted logic, while still meeting the security needs of the prover and verifier.

Read More

Feb 2, 2026

Modelling the impact of verification in cross-border AI training projects

This paper develops a stylized game-theoretic model of cross-border AI training projects in which multiple states jointly train frontier models while retaining national control over compute resources. We focus on decentralized coordination regimes, where actors publicly pledge compute contributions but privately choose actual delivery, creating incentives to free-ride on a shared public good. To address this, the model introduces explicit verification mechanisms, represented as a continuous monitoring intensity that improves the precision of noisy signals about each actor's true compute contribution. Our findings suggest that policymakers designing international AI governance institutions face a commitment problem: half-measures in verification are counterproductive, and effective regimes require either accepting some free-riding or investing substantially in monitoring infrastructure.

Read More

This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.