Nov 23, 2025
Ghost Marks in the Machine: A Critical Review of SynthID for Code Provenance Monitoring
Eve Sherratt-Cross, Theo Farrell, Sam Ogden, Oscar Ryley
AI-generated code is increasingly common in software and prone to security vulnerabilities. It is hence critical to monitor the origins of code used in secure applications. SynthID is a Google DeepMind method for watermarking AI-generated text, images and videos but there is currently no existing mechanism for code. We adapt various SynthID schemes to Python code, and analyse how effective they are using Bayesian detectors. We find that longer n-grams support more robust watermark detection, but the corresponding generated code is more prone to syntax and runtime errors. This work paves the way for critical future work, because code origin monitoring forms part of robust cyber defences against vulnerable or backdoored AI-generated code.
Mackenzie Puig-Hall
Great project name and really solid attempt at addressing an important problem. Adapting SynthID-style watermarking to code is a clever starting point, but the current method still feels too fragile to handle complex or production systems.
I see value in contexts where an entire organization wants to track which parts of their codebase are AI-generated versus human-written. For example, a company encouraging developers to use AI coding tools might want clear provenance for maintainability and accountability.
As a broader defense against malicious or low-quality AI code entering the global software ecosystem, this approach seems too easy to evade. It would likely only catch developers who are not intentionally hiding the use of AI and it seems would pretty easily be able to remove the watermarks.
Reworr
Nice project; it’d be more compelling if you tied it more to concrete security/safety threats and use cases.
Cite this work
@misc {
title={
(HckPrj) Ghost Marks in the Machine: A Critical Review of SynthID for Code Provenance Monitoring
},
author={
Eve Sherratt-Cross, Theo Farrell, Sam Ogden, Oscar Ryley
},
date={
11/23/25
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


