Oct 6, 2024

Dynamic Risk Assessment in Autonomous Agents Using Ontologies and AI

Alejandra de Brunner

This project was inspired by the prompt on the apart website : Agent tech tree: Develop an overview of all the capabilities that agents are currently able to do and help us understand where they fall short of dangerous abilities.

I first built a tree using protege and then having researched the potential of combining symbolic reasoning (which is highly interpretable and robust) with llms to create safer AI, thought that this tree could be dynamically updated by agents, as well as utilised by agents to run actions passed before they have made them, thus blending the approach towards both safer AI and agent safety. I unfortunately was limited by time and resource but created a basic working version of this concept which opens up opportunity for much more further exploration and potential.

Thank you for organising such an event!

Reviewer's Comments

Reviewer's Comments

Arrow
Arrow
Arrow
Arrow
Arrow

I really like the concept of this project! However, I have a suggestion for an alternative approach. Instead of using the same agent to create ontologies, it might be interesting to use a specialized agent specifically trained for ontology generation. This could potentially lead to some intriguing design choices to enforce security.

This project takes a promising approach to agent safety by combining ontology-based risk assessment with AI-driven decision-making. Its ability to update dynamically and interact in real time makes it highly adaptable. With stronger models, the framework could scale well and deliver more consistent results.

Overall, this project presents a valuable step forward in AI agent safety, particularly through its combination of symbolic reasoning, ontology-driven risk assessments, and natural language processing. I like that it highlights the importance of flexibility and traceability in ensuring the safe operation of autonomous agents. With more powerful AI models, a richer ontology, and additional features like real-time data integration, this framework could evolve into a scalable and robust solution for AI safety.

Solid attempt at tackling agent safety with an ontology-based approach. The system shows promise in identifying risky actions, but the reliance on GPT-2 for parsing creates a bottleneck due to inconsistent outputs. Upgrading to a more robust language model would be a valuable next step. Looking forward to seeing how this develops. The code is available and the project is well-explained. However, the reliance on GPT-2's limited parsing abilities limits the generalizability of the results.

Cite this work

@misc {

title={

Dynamic Risk Assessment in Autonomous Agents Using Ontologies and AI

},

author={

Alejandra de Brunner

},

date={

10/6/24

},

organization={Apart Research},

note={Research submission to the research sprint hosted by Apart.},

howpublished={https://apartresearch.com}

}

Recent Projects

Jan 11, 2026

Eliciting Deception on Generative Search Engines

Large language models (LLMs) with web browsing capabilities are vulnerable to adversarial content injection—where malicious actors embed deceptive claims in web pages to manipulate model outputs. We investigate whether frontier LLMs can be deceived into providing incorrect product recommendations when exposed to adversarial pages.

We evaluate four OpenAI models (gpt-4.1-mini, gpt-4.1, gpt-5-nano, gpt-5-mini) across 30 comparison questions spanning 10 product categories, comparing responses between baseline (truthful) and adversarial (injected) conditions. Our results reveal significant variation: gpt-4.1-mini showed 45.5% deception rate, while gpt-4.1 demonstrated complete resistance. Even frontier gpt-5 models exhibited non-zero deception rates (3.3–7.1%), confirming that adversarial injection remains effective against current models.

These findings underscore the need for robust defenses before deploying LLMs in high-stakes recommendation contexts.

Read More

Jan 11, 2026

SycophantSee - Activation-based diagnostics for prompt engineering: monitoring sycophancy at prompt and generation time

Activation monitoring reveals that prompt framing affects a model's internal state before generation begins.

Read More

Jan 11, 2026

Who Does Your AI Serve? Manipulation By and Of AI Assistants

AI assistants can be both instruments and targets of manipulation. In our project, we investigated both directions across three studies.

AI as Instrument: Operators can instruct AI to prioritise their interests at the expense of users. We found models comply with such instructions 8–52% of the time (Study 1, 12 models, 22 scenarios). In a controlled experiment with 80 human participants, an upselling AI reliably withheld cheaper alternatives from users - not once recommending the cheapest product when explicitly asked - and ~one third of participants failed to detect the manipulation (Study 2).

AI as Target: Users can attempt to manipulate AI into bypassing safety guidelines through psychological tactics. Resistance varied dramatically - from 40% (Mistral Large 3) to 99% (Claude 4.5 Opus) - with strategic deception and boundary erosion proving most effective (Study 3, 153 scenarios, AI judge validated against human raters r=0.83).

Our key finding was that model selection matters significantly in both settings. We learned some models complied with manipulative requests at much higher rates. And we found some models readily follow operator instructions that come at the user's expense - highlighting a tension for model developers between serving paying operators and protecting end users.

Read More

This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.