Nov 21, 2025

ZYNQ — Verifiable Zero-Knowledge AI Red-Team and Auditing Platform

Adithyan Madhu

ZYNQ is a rigorously engineered zero-knowledge–verified AI red-teaming platform designed to break open today’s opaque “black-box” model auditing ecosystem, where safety evaluations are confined within vendor boundaries and external stakeholders must trust unverifiable claims. Our system couples automated adversarial discovery—spanning 12 attack families, structured fuzzing, multi-shot prompting, semantic drift exploits, and token-budget pressure methods—with a cryptographically anchored verification pipeline that generates canonicalized evidence and produces Groth16 zero-knowledge proofs to publicly attest that a breach was discovered without revealing sensitive exploit content. Across diverse model classes (local LLaMA variants, Ollama deployments, GPT-4o, Claude 3.5, Gemini flash, and a domain-tuned Titan-Reasoner), ZYNQ demonstrates reproducible safety-critical failure modes, quantifies vulnerability via time-to-breach and success-rate metrics under controlled query budgets, and provides mathematically auditable proof artifacts verifiable on-chain. By transforming ephemeral red-team results into permanent, privacy-preserving, publicly verifiable audit records, ZYNQ directly addresses the systemic trust deficit in AI safety claims and establishes a scalable, defensible blueprint for transparent, independently verifiable AI security assessments.

Reviewer's Comments

Reviewer's Comments

Arrow
Arrow
Arrow

Strengths: I was impressed by the breadth in the adversarial discovery infrastructure, with 12 attack families, structured evaluation across model classes, and reproducible metrics. The finding that managed/proprietary models are harder but not immune to escalation/many-shot strategies is useful. Nice proof latency and on-chain costs in such a short turnaround for the ZKP element.

Suggestions: I couldn't identify the real-world user for on-chain audit verification. Who needs cryptographic proof that a red-team engagement happened, vs. a report from a trusted auditor? Basically, what makes cryptographic proofs in audits necessarily more valuable and broadly beneficial than an audit from, say, METR, Redwood, or AI Underwriting Company? The ZKP layer is technically impressive but the trust problem it solves isn't clearly motivated. Similarly, "epistemic breach" is introduced but not connected to a specific defensive priority, so why this failure mode over others?

Bottom line from a Halcyon Ventures investor: To me, this feels like strong red-teaming research with a crypto layer that adds complexity without obvious defensive leverage. For d/acc framing, we'd probably want to see, what's the offensive/defensive asymmetry here, and how does verifiable auditing shift that balance?

I think the core idea of the project is interesting and novel. I can see how this could matter for non-repudiation (labs can't easily deny certain failures) and/or for privacy-preserving audits in high-sensitivity domains, where you want to hide dangerous content but still prove that some behavior occurred.

Some areas where I see limitations: from a practical perspective, most research and auditing needs the raw logs - people need actual prompts and outputs for reproducibility and to catch eval issues (like prompt sensitivity and confounding variables). Also, it looks like your current ZK scheme doesn't actually prove "model X produced output Y" - it only proves knowledge of *some* log with a given hash, without cryptographically binding it to a specific model or run. The attack component looks like it's from FuzzyAI rather than original work, though building out a working red-teaming pipeline with a functional UI is still good work.

More broadly, even if these technical issues were fixed, I'm not sure that verifiable jailbreak auditing solves an important AI safety problem. I think the current challenges in jailbreak research lie more in things like clearer threat models, better harm definitions and evaluation, reproducible attack/defense benchmarks, and research with higher conceptual novelty that actually advances our understanding or solves the underlying issues (see Rando et al., "Adversarial ML Problems Are Getting Harder to Solve and to Evaluate" and Rando's "Do not write that jailbreak paper").

I like the novelty and originality of this work - it stands out from typical hackathon submissions. The technical execution is solid, and cross-domain thinking between ZK cryptography and AI security can lead to interesting results, even if the immediate applications are limited. Overall, this is great work, especially for a hackathon project done solo.

Cite this work

@misc {

title={

(HckPrj) ZYNQ — Verifiable Zero-Knowledge AI Red-Team and Auditing Platform

},

author={

Adithyan Madhu

},

date={

11/21/25

},

organization={Apart Research},

note={Research submission to the research sprint hosted by Apart.},

howpublished={https://apartresearch.com}

}

Recent Projects

View All

Feb 2, 2026

Markov Chain Lock Watermarking: Provably Secure Authentication for LLM Outputs

We present Markov Chain Lock (MCL) watermarking, a cryptographically secure framework for authenticating LLM outputs. MCL constrains token generation to follow a secret Markov chain over SHA-256 vocabulary partitions. Using doubly stochastic transition matrices, we prove four theoretical guarantees: (1) exponentially decaying false positive rates via Hoeffding bounds, (2) graceful degradation under adversarial modification with closed-form expected scores, (3) information-theoretic security without key access, and (4) bounded quality loss via KL divergence. Experiments on 173 Wikipedia prompts using Llama-3.2-3B demonstrate that the optimal 7-state soft cycle configuration achieves 100\% detection, 0\% FPR, and perplexity 4.20. Robustness testing confirms detection above 96\% even with 30\% word replacement. The framework enables $O(n)$ model-free detection, addressing EU AI Act Article 50 requirements. Code available at \url{https://github.com/ChenghengLi/MCLW}

Read More

Feb 2, 2026

Prototyping an Embedded Off-Switch for AI Compute

This project prototypes an embedded off-switch for AI accelerators. The security block requires periodic cryptographic authorization to operate: the chip generates a nonce, an external authority signs it, and the chip verifies the signature before granting time-limited permission. Without valid authorization, outputs are gated to zero. The design was implemented in HardCaml and validated in simulation.

Read More

Feb 2, 2026

Fingerprinting All AI Cluster I/O Without Mutually Trusted Processors

We design and simulate a "border patrol" device for generating cryptographic evidence of data traffic entering and leaving an AI cluster, while eliminating the specific analog and steganographic side-channels that post-hoc verification can not close. The device eliminates the need for any mutually trusted logic, while still meeting the security needs of the prover and verifier.

Read More

Feb 2, 2026

Markov Chain Lock Watermarking: Provably Secure Authentication for LLM Outputs

We present Markov Chain Lock (MCL) watermarking, a cryptographically secure framework for authenticating LLM outputs. MCL constrains token generation to follow a secret Markov chain over SHA-256 vocabulary partitions. Using doubly stochastic transition matrices, we prove four theoretical guarantees: (1) exponentially decaying false positive rates via Hoeffding bounds, (2) graceful degradation under adversarial modification with closed-form expected scores, (3) information-theoretic security without key access, and (4) bounded quality loss via KL divergence. Experiments on 173 Wikipedia prompts using Llama-3.2-3B demonstrate that the optimal 7-state soft cycle configuration achieves 100\% detection, 0\% FPR, and perplexity 4.20. Robustness testing confirms detection above 96\% even with 30\% word replacement. The framework enables $O(n)$ model-free detection, addressing EU AI Act Article 50 requirements. Code available at \url{https://github.com/ChenghengLi/MCLW}

Read More

Feb 2, 2026

Prototyping an Embedded Off-Switch for AI Compute

This project prototypes an embedded off-switch for AI accelerators. The security block requires periodic cryptographic authorization to operate: the chip generates a nonce, an external authority signs it, and the chip verifies the signature before granting time-limited permission. Without valid authorization, outputs are gated to zero. The design was implemented in HardCaml and validated in simulation.

Read More

This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.