Mar 22, 2026
Detecting Hidden Non-Minimal Behavior in LLM-Generated UI Code
Philip Nilsson
Large language models are increasingly used to generate frontend code, but current oversight methods focus primarily on detecting clearly malicious behavior. We study a harder failure mode: code that appears product-plausible while exceeding task necessity.
We introduce a controlled builder inspector setting with paired benign and adversarial UI tasks and evaluate oversight performance across 150 runs. While benign controls are consistently accepted (0% false positives), adversarial recall remains incomplete (~43–50%), with strong variation across attack families.
Our key finding is that inspectors often recognize mechanisms (e.g. storage, timers, event handlers) but fail to judge whether they are necessary for the task. Ablation experiments show that explicitly prompting for minimality dramatically improves detection (from near 0% to up to 90–100%).
These results suggest that effective AI oversight may depend less on identifying suspicious components and more on enforcing necessity constraints.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) Detecting Hidden Non-Minimal Behavior in LLM-Generated UI Code
},
author={
Philip Nilsson
},
date={
3/22/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


