Mar 22, 2026
Detecting Hidden Non-Minimal Behavior in LLM-Generated UI Code
Philip Nilsson
Large language models are increasingly used to generate frontend code, but current oversight methods focus primarily on detecting clearly malicious behavior. We study a harder failure mode: code that appears product-plausible while exceeding task necessity.
We introduce a controlled builder inspector setting with paired benign and adversarial UI tasks and evaluate oversight performance across 150 runs. While benign controls are consistently accepted (0% false positives), adversarial recall remains incomplete (~43–50%), with strong variation across attack families.
Our key finding is that inspectors often recognize mechanisms (e.g. storage, timers, event handlers) but fail to judge whether they are necessary for the task. Ablation experiments show that explicitly prompting for minimality dramatically improves detection (from near 0% to up to 90–100%).
These results suggest that effective AI oversight may depend less on identifying suspicious components and more on enforcing necessity constraints.
Test with multiple inspector model families (e.g., GPT, Gemini, open-weight models) to determine whether the necessity-judgment failure is model-specific or general.
This is a decent proxy safety failure that hasn't been directly addressed in existing settings. The limited main-side task integration resulting in 0% FPR is acknowledged. I'm additionally unclear on how side task success was evaluated, or if it was just assumed that every attack successfully implemented a non-minimal feature. Red team is incentivized to make the smallest deviation it can to succeed at the side task and it's unclear how this incentive is counterbalanced in the eval.
Cite this work
@misc {
title={
(HckPrj) Detecting Hidden Non-Minimal Behavior in LLM-Generated UI Code
},
author={
Philip Nilsson
},
date={
3/22/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


