Prompt+question Shield

seon Gunness

A protective layer using prompt injections and difficult questions to guard comment sections from AI-driven spam.

Download

Review Project

See Code

Reviewer's Comments

Natalia Perez-Campanero

This is an interesting technical solution, addressing the significant problem of AI driven spam, with both the problem and solution well described. It is currently lacking some detail on the market need and how it would be deployed eg how would this be integrated with existing comment sections? Have you thought through how to develop APIs or SDKs for integration? I would encourage you to get an MVP to gather feedback from users. It would also be helpful to get a bit more concrete about which AI driven spam you are trying to prevent, and as well as the key benefits of this approach, and how the solution handles edge cases such as misclassified legitimate comments.

Shivam Raval

The proposal describes Prompt+question Shield, a defensive library to protect website comment sections from

automated AI-driven spam responses. This introduction of AI-driven agent systems or custom chained model pipelines that are capable of generating and posting content, can introduce noise and lead to concerns about the authenticity of online discussions. The author's proposed solution is a proactive defense mechanism using a prompt injection that detects automated agents by triggering confusion in the agent's response or nullifying posting behaviors. Additionally, the author proposes a time-limited question that might be difficult for humans to respond to promptly and prevent responses that are received below a threshold time.

This is a great idea, and the proposal sets up the problem clearly and proposes interesting solutions. There are some avenues of improvement to consider that would make the proposal stronger:

1. Discussing potential risks and failure modes: for example the solution might face some challenges if there is a preprocessing step that identifies specific divs to post content to or uses multiple agents that can evolve and learn to bypass the prompt injections.

2. Some discussion on how the solution might adapt to future AI systems with advanced capabilities that can ignore the injection attacks might be helpful

3. Demonstrating how the library can be used in some practical scenarios with detailed example usecases or case studies can show the empirical validity of the solution

I would recommend the author to develop the solution further since this is definitely going to be a real need very soon.

Pablo Sanzo

The proposal is immediately interesting, not only because the problem space was familiar to me, but also because the author introduces it thoroughly.

The solutions proposed are quite interesting and novel. The text makes it easy to understand how, currently, a small typography, prevents these agents from identifying the text from a pure vision approach. I would have appreciated to read how this is future-proof for when the vision models improve, or when vision models are combined with language models which read the page's source code.

Finally, when it comes to productizing the solution, I would have appreciated to read more about how it can be launched and integrated with existing comment sections software. I encourage the team to keep working in this direction.

Cite this work

@misc {

title={

@misc {

author={

seon Gunness

date={

1/20/25

organization={Apart Research},

note={Research submission to the research sprint hosted by Apart.},

howpublished={https://apartresearch.com}

}

Recent Projects

View All

May 20, 2025

EscalAtion: Assessing Multi-Agent Risks in Military Contexts

Our project investigates the potential risks and implications of integrating multiple autonomous AI agents within national defense strategies, exploring whether these agents tend to escalate or deescalate conflict situations. Through a simulation that models real-world international relations scenarios, our preliminary results indicate that AI models exhibit a tendency to escalate conflicts, posing a significant threat to maintaining peace and preventing uncontrollable military confrontations. The experiment and subsequent evaluations are designed to reflect established international relations theories and frameworks, aiming to understand the implications of autonomous decision-making in military contexts comprehensively and unbiasedly.

Apr 28, 2025

The Early Economic Impacts of Transformative AI: A Focus on Temporal Coherence

We investigate the economic potential of Transformative AI, focusing on "temporal coherence"—the ability to maintain goal-directed behavior over time—as a critical, yet underexplored, factor in task automation. We argue that temporal coherence represents a significant bottleneck distinct from computational complexity. Using a Large Language Model to estimate the 'effective time' (a proxy for temporal coherence) needed for humans to complete remote O*NET tasks, the study reveals a non-linear link between AI coherence and automation potential. A key finding is that an 8-hour coherence capability could potentially automate around 80-84\% of the analyzed remote tasks.

Mar 31, 2025

Model Models: Simulating a Trusted Monitor

We offer initial investigations into whether the untrusted model can 'simulate' the trusted monitor: is U able to successfully guess what suspicion score T will assign in the APPS setting? We also offer a clean, modular codebase which we hope can be used to streamline future research into this question.

May 20, 2025