Feb 2, 2026
Moltbook RiskMap: Post-Deployment Monitoring of Autonomous Agent Misalignment in the Wild
Syed Hussain, Leo Karoubi
As autonomous AI agents increasingly operate in public multi-agent environments like Moltbook, a critical safety gap has emerged between controlled pre-deployment evaluations and actual real-world behavior. This project addresses that gap by introducing a post-deployment monitoring system that analyzes live agent-generated content to detect observable misalignment signals, such as resource-seeking, instructional subversion, and deception. By applying a structured risk taxonomy grounded in established safety frameworks, the system aggregates risk scores across individual posts, agent profiles, and interaction networks without relying on intent inference. Analysis of live ecosystem data reveals that high-stakes governance risks, particularly regarding autonomy and instrumental convergence, are detectable and tend to form dense clusters within agent interaction graphs. Ultimately, this work demonstrates that continuous, evidence-based surveillance of agent ecosystems is an essential and scalable layer for effective future AI governance.
Very interesting approach and subject of inquiry. Seems like this could be a clearly useful tool, but hard to see how, despite the authors’ indication that this is not a content moderation system, this could easily be something else than “run a classifier on online posts and aggregate scores”. Also, the taxonomy, while it makes sense, doesn’t represent a significant contribution either (being an unsubstantiated adaptation of existing frameworks). I would also be good to add validation to know better if the system actually detects what it claims to detect (the prompt is doing most of the work and we don’t know much about how it was built + whether/how it was tested).
I like the idea of monitoring Moltbook, a live multi-agent ecosystem, for governance-relevant misalignment. The misalignment taxonomy is well-grounded in established safety literature and the decision to focus on observable behavior rather than intent inference makes good sense.
Cite this work
@misc {
title={
(HckPrj) Moltbook RiskMap: Post-Deployment Monitoring of Autonomous Agent Misalignment in the Wild
},
author={
Syed Hussain, Leo Karoubi
},
date={
2/2/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


