Jan 20, 2025
Prompt+question Shield
seon Gunness
A protective layer using prompt injections and difficult questions to guard comment sections from AI-driven spam.
Natalia Perez-Campanero
This is an interesting technical solution, addressing the significant problem of AI driven spam, with both the problem and solution well described. It is currently lacking some detail on the market need and how it would be deployed eg how would this be integrated with existing comment sections? Have you thought through how to develop APIs or SDKs for integration? I would encourage you to get an MVP to gather feedback from users. It would also be helpful to get a bit more concrete about which AI driven spam you are trying to prevent, and as well as the key benefits of this approach, and how the solution handles edge cases such as misclassified legitimate comments.
Shivam Raval
The proposal describes Prompt+question Shield, a defensive library to protect website comment sections from
automated AI-driven spam responses. This introduction of AI-driven agent systems or custom chained model pipelines that are capable of generating and posting content, can introduce noise and lead to concerns about the authenticity of online discussions. The author's proposed solution is a proactive defense mechanism using a prompt injection that detects automated agents by triggering confusion in the agent's response or nullifying posting behaviors. Additionally, the author proposes a time-limited question that might be difficult for humans to respond to promptly and prevent responses that are received below a threshold time.
This is a great idea, and the proposal sets up the problem clearly and proposes interesting solutions. There are some avenues of improvement to consider that would make the proposal stronger:
1. Discussing potential risks and failure modes: for example the solution might face some challenges if there is a preprocessing step that identifies specific divs to post content to or uses multiple agents that can evolve and learn to bypass the prompt injections.
2. Some discussion on how the solution might adapt to future AI systems with advanced capabilities that can ignore the injection attacks might be helpful
3. Demonstrating how the library can be used in some practical scenarios with detailed example usecases or case studies can show the empirical validity of the solution
I would recommend the author to develop the solution further since this is definitely going to be a real need very soon.
Pablo Sanzo
The proposal is immediately interesting, not only because the problem space was familiar to me, but also because the author introduces it thoroughly.
The solutions proposed are quite interesting and novel. The text makes it easy to understand how, currently, a small typography, prevents these agents from identifying the text from a pure vision approach. I would have appreciated to read how this is future-proof for when the vision models improve, or when vision models are combined with language models which read the page's source code.
Finally, when it comes to productizing the solution, I would have appreciated to read more about how it can be launched and integrated with existing comment sections software. I encourage the team to keep working in this direction.
Cite this work
@misc {
title={
@misc {
},
author={
seon Gunness
},
date={
1/20/25
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}