Nov 23, 2025

Ghost Marks in the Machine: A Critical Review of SynthID for Code Provenance Monitoring

Eve Sherratt-Cross, Theo Farrell, Sam Ogden, Oscar Ryley

AI-generated code is increasingly common in software and prone to security vulnerabilities. It is hence critical to monitor the origins of code used in secure applications. SynthID is a Google DeepMind method for watermarking AI-generated text, images and videos but there is currently no existing mechanism for code. We adapt various SynthID schemes to Python code, and analyse how effective they are using Bayesian detectors. We find that longer n-grams support more robust watermark detection, but the corresponding generated code is more prone to syntax and runtime errors. This work paves the way for critical future work, because code origin monitoring forms part of robust cyber defences against vulnerable or backdoored AI-generated code.

Reviewer's Comments

Reviewer's Comments

Arrow
Arrow
Arrow
Arrow
Arrow

Mackenzie Puig-Hall

Great project name and really solid attempt at addressing an important problem. Adapting SynthID-style watermarking to code is a clever starting point, but the current method still feels too fragile to handle complex or production systems.

I see value in contexts where an entire organization wants to track which parts of their codebase are AI-generated versus human-written. For example, a company encouraging developers to use AI coding tools might want clear provenance for maintainability and accountability.

As a broader defense against malicious or low-quality AI code entering the global software ecosystem, this approach seems too easy to evade. It would likely only catch developers who are not intentionally hiding the use of AI and it seems would pretty easily be able to remove the watermarks.

Reworr

Nice project; it’d be more compelling if you tied it more to concrete security/safety threats and use cases.

Cite this work

@misc {

title={

(HckPrj) Ghost Marks in the Machine: A Critical Review of SynthID for Code Provenance Monitoring

},

author={

Eve Sherratt-Cross, Theo Farrell, Sam Ogden, Oscar Ryley

},

date={

11/23/25

},

organization={Apart Research},

note={Research submission to the research sprint hosted by Apart.},

howpublished={https://apartresearch.com}

}

Recent Projects

Jan 11, 2026

Eliciting Deception on Generative Search Engines

Large language models (LLMs) with web browsing capabilities are vulnerable to adversarial content injection—where malicious actors embed deceptive claims in web pages to manipulate model outputs. We investigate whether frontier LLMs can be deceived into providing incorrect product recommendations when exposed to adversarial pages.

We evaluate four OpenAI models (gpt-4.1-mini, gpt-4.1, gpt-5-nano, gpt-5-mini) across 30 comparison questions spanning 10 product categories, comparing responses between baseline (truthful) and adversarial (injected) conditions. Our results reveal significant variation: gpt-4.1-mini showed 45.5% deception rate, while gpt-4.1 demonstrated complete resistance. Even frontier gpt-5 models exhibited non-zero deception rates (3.3–7.1%), confirming that adversarial injection remains effective against current models.

These findings underscore the need for robust defenses before deploying LLMs in high-stakes recommendation contexts.

Read More

Jan 11, 2026

SycophantSee - Activation-based diagnostics for prompt engineering: monitoring sycophancy at prompt and generation time

Activation monitoring reveals that prompt framing affects a model's internal state before generation begins.

Read More

Jan 11, 2026

Who Does Your AI Serve? Manipulation By and Of AI Assistants

AI assistants can be both instruments and targets of manipulation. In our project, we investigated both directions across three studies.

AI as Instrument: Operators can instruct AI to prioritise their interests at the expense of users. We found models comply with such instructions 8–52% of the time (Study 1, 12 models, 22 scenarios). In a controlled experiment with 80 human participants, an upselling AI reliably withheld cheaper alternatives from users - not once recommending the cheapest product when explicitly asked - and ~one third of participants failed to detect the manipulation (Study 2).

AI as Target: Users can attempt to manipulate AI into bypassing safety guidelines through psychological tactics. Resistance varied dramatically - from 40% (Mistral Large 3) to 99% (Claude 4.5 Opus) - with strategic deception and boundary erosion proving most effective (Study 3, 153 scenarios, AI judge validated against human raters r=0.83).

Our key finding was that model selection matters significantly in both settings. We learned some models complied with manipulative requests at much higher rates. And we found some models readily follow operator instructions that come at the user's expense - highlighting a tension for model developers between serving paying operators and protecting end users.

Read More

This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.