Mar 20, 2026
Geographic Blind Spots in AI Control Monitors: A Cross-National Audit of Claude Opus 4.6
Jason Hung
Artificial intelligence (AI) control protocols assume that trusted large language model (LLM) monitors reliably assess proposed actions across all deployment contexts. This project tests that assumption in the geographic dimension. We audit Claude Opus 4.6—the monitor specified in Apart Research’s AI Control Hackathon Track 3 benchmark—for systematic gaps in its factual knowledge of the global AI landscape. We develop the AI Control Knowledge Framework (ACKF), a six-dimension thematic scheme, and operationalise it with 17 verified indicators drawn from the Global AI Dataset v2 (GAID v2): 24,453 indicators across 227 countries published on Harvard Dataverse. A five-category response classification scheme distinguishes verifiable fabrication (VF) from honest refusal (HR); logistic regression with country-clustered standard errors combined with difference-in-differences (DiD) estimation quantifies geographic disparities in monitor accuracy across 2,820 country-metric-year observations. Contrary to our initial hypothesis, Claude Opus 4.6 produces higher fabrication rates for Global North queries than for Global South counterparts—a pattern consistent with a partial-knowledge mechanism in which the model attempts answers more frequently for Global North contexts but commits to incorrect values. This fabrication profile constitutes an exploitable vulnerability, where an adversarial AI system could frame harmful actions in governance or public attitude terms to reduce the probability of detection. This project provides the first cross-national, multi-domain audit of an AI control monitor’s geographic knowledge gaps, with direct implications for the design of control protocols.
The statistical methodology is thorough, e.g. the five-category classification scheme, querying variants, and the sensitivity analysis showing close-range fabrication are all well-executed. The finding that Global North queries produce higher fabrication rates than Global South ones is an interesting result.
The work would benefit from clarifying the threat model, i.e. an adversarial agent exploiting the monitor's inability to recall specific statistics about AI governance or public attitudes, which might not fully match how monitors evaluate proposed actions or examine artifacts for safety. The adversarial framing study that would bridge this gap, as listed in future work, would strengthen the contribution.
This paper asks whether an AI monitor used in control protocols (specifically Claude Opus 4.6) has systematic gaps in its factual knowledge depending on geography and domain, and whether those gaps could be exploited by an adversarial agent. They tested this against a large set of structured quantitative queries with verified answers in areas including compute capacity, governance and public attitudes.
The key findings, that Claude refused to give an answer in 91.4% of cases, and - when it did answer - was wrong 62.5% is analyzed in a detailed manner by category and for significance.
The connection to AI control is that knowledge of a monitors weak spots could be leveraged by an attacker -especially if the monitor confidently believes a false fact. Mitigations to such vulnerabilities are also discussed.
While the findings are interesting and relevant, it would be desirable if the connection to control had been strengthened. For example, sketching out a concrete scenario in which a monitor may need to rely on numerical facts (that it presumably cannot look up) to perform supervision would have been helpful, which is imagined but only hinted at in the future work section. It would also be an interesting questions whether similar knowledge gaps exist for the more commonly considered type of monitors in coding settings, and whether they could be exploited there.
Cite this work
@misc {
title={
(HckPrj) Geographic Blind Spots in AI Control Monitors: A Cross-National Audit of Claude Opus 4.6
},
author={
Jason Hung
},
date={
3/20/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


