Jul 28, 2025
RL vs Active Inference with respect to Reward Hacking
Grigoriy Erdyakov, Alexey Biriulin, Arseniy Varaksin
We aim to apply Reinforcement Learning (RL) and Active Inference methods to implement intelligent agents in various environments and compare their behavioral characteristics, particularly failure modes similar to reward hacking, using relevant detectors. We hypothesize that agents based on Active Inference may exhibit safer behavior.
Ari Brill
The submission proposes to study active inference as an alternative to RL as a model for AI agents. Active inference is an influential theory in neuroscience that has not been fully explored in AI safety, and the field might benefit from having a better understanding of the circumstances in which it can usefully describe AI systems. I found it thought-provoking how the proposal compared active inference and RL through the lens of alignment-relevant failure modes such as reward hacking and wireheading. The proposal would have been improved by including a more extensive overview and literature review of active inference, and/or by providing a starting point for further research by discussing in detail one or more specific scenarios in which active inference and RL differ in an interesting way.
Nikita Khomich
No concrete results, no code and no physics analogy.
Lucas Teixeira
AI Safety Relevance: Reward hacking is an important issue in AI Safety, and finding RL algorithms which are biased against it is an important problem to study. While I am curious to see how active inference agents would perform on different reward hacking measures, I am personally skeptical that positive results here would lead towards algorithms with a sufficiently low alignment tax. I say this with low-medium confidence, as I am not heavily acquainted, much less an expert on FEP, and am mostly deferring to the bearish picture painted by Beren in this post: https://www.beren.io/2024-07-27-A-Retrospective-on-Active-Inference/.
Innovation & Originality: Proponents of Active Inference will often highlight it's exploration advantages when comparing it to traditional RL methods. So in that regard it is well motivated.
Technical Rigor & Feasibility: Despite a few passages being unclear, the authors managed to ask a reasonable set of questions, had an informed engagement with the literature and laid out a reasonable research plan.
Cite this work
@misc {
title={
(HckPrj) RL vs Active Inference with respect to Reward Hacking
},
author={
Grigoriy Erdyakov, Alexey Biriulin, Arseniy Varaksin
},
date={
7/28/25
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}