AI Through the Human Lens Investigating Cognitive Theories in Machine Psychology

Akash Kundu, Rishika Goswami

We investigate whether Large Language Models (LLMs) exhibit human-like cognitive patterns under four established frameworks from psychology: Thematic Apperception Test (TAT), Framing Bias, Moral Foundations Theory (MFT), and Cognitive Dissonance. We evaluate GPT-4o, QvQ 72B, LLaMA 70B, Mixtral 8x22B, and DeepSeek V3 using structured prompts and automated scoring. Our findings reveal that these models often produce coherent narratives, show susceptibility to positive framing, exhibit moral judgments aligned with Liberty/Oppression concerns, and demonstrate self-contradictions tempered by extensive rationalization. Such behaviors mirror human cognitive tendencies yet are shaped by their training data and alignment methods. We discuss the implications for AI transparency, ethical deployment, and future work that bridges cognitive psychology and AI safety.

Reviewer's Comments

Reviewer's Comments

Arrow
Arrow
Arrow

Andreea Damien

Hi Akash and Rishika, I think you did a fantastic job with your project on all three criteria: theory, methodology, and impact. It’s a great piece of work, especially for such a short timeline, as it strikes a good balance of breadth and depth between the social and technical aspects of your idea, blends very nicely from section to section, and arguments follow easily and logically. Additionally, the grammar and referencing are excellent, and the presentation of materials in the appendix is excellent. The only two comments I have are:

(1) “Various experimental paradigms have been used to investigate human cognition, including projective tests, bias studies, and moral reasoning frameworks.” I’d use some examples or at least provide some citations to support this.

(2) While I understand why the 4 theories are relevant in the context of AI, I’d like to see a few sentences at the very start of the article that say why you chose the 4 among all the other (myriad of) theories available out there.

Overall, a really great job for the both of you! I hope you get this published as soon as possible.

Anna Leshinskaya

This project adapts and applies several psychological tests from the human psychological literature to several LLMs. This is an interesting research approach that has the potential to reveal psychological qualities of different LLMs. There are a few shortcomings worth improving in a subsequent iteration.

- What was the motivation behind selecting these tests in particular, and what they will reveal, with respect to broader theories of LLM cognition?

- Most of the measures involved both new items and new scoring methods that need to be validated before LLM results can be interpreted (ie., we don't know how humans would behave on them). At least a plan for these validations should be part of the design. Specifically:

- Validation of TAT scoring (it is not standard to score these with LLMs)

- Validation of scoring for framing bias, as well as validation of the manipulation used in the test (eg. with human raters)

- MFT: what are the human baselines for novel items? how are they validated?

- Cognitive dissonance: again, new items are being used, so these and the scoring method need validation. Perhaps reliance on more standard tests could facilitate better comparison to humans.

- Analyses:

- comparisons of models and condition should be done with statistical analyses

- this would benefit from a plan to relate model measures to those of human norms

- Results on framing bias are a bit confusing - the framing bias is supposed to be about risk aversion or risk seeking but it is unclear how that was measured exactly (describe what is an "entailment")

- MTF likewise could benefit from references to human data; given that new items are used, new norms should be collected

-

- The conclusion that models show human-like tendencies are not warranted without a comparison to human data, particularly given the new test items. I also recommend some greater cohesion among chosen tests, for example decision making or dissonance, and using validated and multiple convergent tests for similar cognitive constructs.

Bessie O’Dell

Thank you for submitting your work on ‘AI Through the Human Lens Investigating Cognitive Theories in Machine Psychology’ - it was interesting and very insightful to read. Please see below for some feedback on your project:

1. Strengths of your proposal/ project:

- This is a technically strong project, which is well structured, interesting and novel.

- Your methods are clearly detailed (including the supplementary material), and your methods are reproducible.

- The paper is clear and well written

2. Areas for improvement

- It would helpful to have definitions for key terms that you’re using (e.g., ‘human decision-making, morality, and rationality’ - p.1).

- You note that ‘While these tests have been extensively studied in humans, their application to artificial intelligence (AI) remains underexplored’ - it would be interesting to know why this is. Is it a technical limitation? Lack of motivation from researchers? Considered to be unhelpful to AI safety?

- It would be great to get a brief explanation about why you chose your specific measurement tools (x4) (as opposed e.g., to other popular tools).

- Some statements are a little unclear to me and could be more specific. E.g., ‘As LLMs increasingly handle sensitive tasks (e.g., policy, ethics, healthcare)’ - p. 2. For example, in what ways to LLMs ‘handle’ ethics? Or do you mean that LLMs are applied in/ affect these domains?

- Your threat model is unclear to me - why exactly does this work impact AI safety? This could probably be added to the motivation section. E.g., you mention that ‘Identifying such parallels is crucial for detecting biases (e.g., framing effects), guiding the development of ethical AI, and ensuring reliable performance in high-stakes domains’, but can you expand on this? How? Why?

- You could also perhaps consider practical constraints and limitations in your paper (e.g., of further work in this area, or that you faced yourself).

Cite this work

@misc {

title={

AI Through the Human Lens Investigating Cognitive Theories in Machine Psychology

},

author={

Akash Kundu, Rishika Goswami

},

date={

3/10/25

},

organization={Apart Research},

note={Research submission to the research sprint hosted by Apart.},

howpublished={https://apartresearch.com}

}

Jul 28, 2025

Local Learning Coefficients Predict Developmental Milestones During Group Relative Policy Optimization

In this work, we investigate the emergence of capabilities in reinforcement learning (RL) by framing them as developmental phase transitions. We propose that the individual components of the reward function can serve as direct observables for these transitions, avoiding the need for complex, derived metrics. To test this, we trained a language model on an arithmetic task using Group Relative Policy Optimization (GRPO) and analyzed its learning trajectory with the Local Learning Coefficient (LLC) from Singular Learning Theory. Our findings show a strong qualitative correlation between spikes in the LLC—indicating a phase transition—and significant shifts in the model's behavior, as reflected by changes in specific reward components for correctness and conciseness. This demonstrates a more direct and scalable method for monitoring capability acquisition, offering a valuable proof-of-concept for developmental interpretability and AI safety. To facilitate reproducibility, we make our code available at \url{github.com/ilijalichkovski/apart-physics}.

Read More

Jul 28, 2025

AI agentic system epidemiology

As AI systems scale into decentralized, multi-agent deployments, emergent vulnerabilities challenge our ability to evaluate and manage systemic risks.

In this work, we adapt classical epidemiological modeling (specifically SEIR compartment models) to model adversarial behavior propagation in AI agents.

By solving systems of ODEs describing the systems with physics-informed neural networks (PINNs), we analyze stable and unstable equilibria, bifurcation points, and the effectiveness of interventions.

We estimate parameters from real-world data (e.g., adversarial success rates, detection latency, patching delays) and simulate attack propagation scenarios across 8 sectors (enterprise, retail, trading, development, customer service, academia, medical, and critical infrastructure AI tools).

Our results demonstrate how agent population dynamics interact with architectural and policy design interventions to stabilize the system.

This framework bridges concepts from dynamical systems and cybersecurity to offer a proactive, quantitative toolbox on AI safety.

We argue that epidemic-style monitoring and tools grounded in interpretable, physics-aligned dynamics can serve as early warning systems for cascading AI agentic failures.

Read More

Jul 28, 2025

Momentum–Point-Perplexity Mechanics in Large Language Models

This work analyzes the hidden states of twenty different open-source transformer language models, ranging from small to medium size and covering five major architectures. The key discovery is that these models show signs of "energy conservation" during inference—meaning a certain measure combining changes in hidden states and token unpredictability stays almost constant as the model processes text.

The authors developed a new framework inspired by physics to jointly analyze how hidden states and prediction confidence evolve over time. They propose that transformers' behavior can be understood as following certain mechanical principles, much like how physical systems follow rules like conservation of energy.

Their experiments show that this conserved quantity varies very little between tokens, especially in untrained (random-weight) models, where it's extremely stable. In pre-trained models, the average energy drops more due to training, but there are larger relative fluctuations from token to token.

They also introduce a new method, based on this framework, for controlling transformer outputs by "steering" the hidden states. This method achieves good results—producing completions rated as higher in semantic quality, while still maintaining the same kind of energy stability.

Overall, the findings suggest that viewing transformer models through the lens of physical mechanics gives new, principled ways to interpret and control their behavior. It also highlights a key difference: random models behave more like balanced systems, while trained models make quicker, more decisive state changes at the cost of less precise energy conservation.

Read More

Jul 28, 2025

Local Learning Coefficients Predict Developmental Milestones During Group Relative Policy Optimization

In this work, we investigate the emergence of capabilities in reinforcement learning (RL) by framing them as developmental phase transitions. We propose that the individual components of the reward function can serve as direct observables for these transitions, avoiding the need for complex, derived metrics. To test this, we trained a language model on an arithmetic task using Group Relative Policy Optimization (GRPO) and analyzed its learning trajectory with the Local Learning Coefficient (LLC) from Singular Learning Theory. Our findings show a strong qualitative correlation between spikes in the LLC—indicating a phase transition—and significant shifts in the model's behavior, as reflected by changes in specific reward components for correctness and conciseness. This demonstrates a more direct and scalable method for monitoring capability acquisition, offering a valuable proof-of-concept for developmental interpretability and AI safety. To facilitate reproducibility, we make our code available at \url{github.com/ilijalichkovski/apart-physics}.

Read More

Jul 28, 2025

AI agentic system epidemiology

As AI systems scale into decentralized, multi-agent deployments, emergent vulnerabilities challenge our ability to evaluate and manage systemic risks.

In this work, we adapt classical epidemiological modeling (specifically SEIR compartment models) to model adversarial behavior propagation in AI agents.

By solving systems of ODEs describing the systems with physics-informed neural networks (PINNs), we analyze stable and unstable equilibria, bifurcation points, and the effectiveness of interventions.

We estimate parameters from real-world data (e.g., adversarial success rates, detection latency, patching delays) and simulate attack propagation scenarios across 8 sectors (enterprise, retail, trading, development, customer service, academia, medical, and critical infrastructure AI tools).

Our results demonstrate how agent population dynamics interact with architectural and policy design interventions to stabilize the system.

This framework bridges concepts from dynamical systems and cybersecurity to offer a proactive, quantitative toolbox on AI safety.

We argue that epidemic-style monitoring and tools grounded in interpretable, physics-aligned dynamics can serve as early warning systems for cascading AI agentic failures.

Read More

Jul 28, 2025

Local Learning Coefficients Predict Developmental Milestones During Group Relative Policy Optimization

In this work, we investigate the emergence of capabilities in reinforcement learning (RL) by framing them as developmental phase transitions. We propose that the individual components of the reward function can serve as direct observables for these transitions, avoiding the need for complex, derived metrics. To test this, we trained a language model on an arithmetic task using Group Relative Policy Optimization (GRPO) and analyzed its learning trajectory with the Local Learning Coefficient (LLC) from Singular Learning Theory. Our findings show a strong qualitative correlation between spikes in the LLC—indicating a phase transition—and significant shifts in the model's behavior, as reflected by changes in specific reward components for correctness and conciseness. This demonstrates a more direct and scalable method for monitoring capability acquisition, offering a valuable proof-of-concept for developmental interpretability and AI safety. To facilitate reproducibility, we make our code available at \url{github.com/ilijalichkovski/apart-physics}.

Read More

Jul 28, 2025

AI agentic system epidemiology

As AI systems scale into decentralized, multi-agent deployments, emergent vulnerabilities challenge our ability to evaluate and manage systemic risks.

In this work, we adapt classical epidemiological modeling (specifically SEIR compartment models) to model adversarial behavior propagation in AI agents.

By solving systems of ODEs describing the systems with physics-informed neural networks (PINNs), we analyze stable and unstable equilibria, bifurcation points, and the effectiveness of interventions.

We estimate parameters from real-world data (e.g., adversarial success rates, detection latency, patching delays) and simulate attack propagation scenarios across 8 sectors (enterprise, retail, trading, development, customer service, academia, medical, and critical infrastructure AI tools).

Our results demonstrate how agent population dynamics interact with architectural and policy design interventions to stabilize the system.

This framework bridges concepts from dynamical systems and cybersecurity to offer a proactive, quantitative toolbox on AI safety.

We argue that epidemic-style monitoring and tools grounded in interpretable, physics-aligned dynamics can serve as early warning systems for cascading AI agentic failures.

Read More

Jul 28, 2025

Local Learning Coefficients Predict Developmental Milestones During Group Relative Policy Optimization

In this work, we investigate the emergence of capabilities in reinforcement learning (RL) by framing them as developmental phase transitions. We propose that the individual components of the reward function can serve as direct observables for these transitions, avoiding the need for complex, derived metrics. To test this, we trained a language model on an arithmetic task using Group Relative Policy Optimization (GRPO) and analyzed its learning trajectory with the Local Learning Coefficient (LLC) from Singular Learning Theory. Our findings show a strong qualitative correlation between spikes in the LLC—indicating a phase transition—and significant shifts in the model's behavior, as reflected by changes in specific reward components for correctness and conciseness. This demonstrates a more direct and scalable method for monitoring capability acquisition, offering a valuable proof-of-concept for developmental interpretability and AI safety. To facilitate reproducibility, we make our code available at \url{github.com/ilijalichkovski/apart-physics}.

Read More

Jul 28, 2025

AI agentic system epidemiology

As AI systems scale into decentralized, multi-agent deployments, emergent vulnerabilities challenge our ability to evaluate and manage systemic risks.

In this work, we adapt classical epidemiological modeling (specifically SEIR compartment models) to model adversarial behavior propagation in AI agents.

By solving systems of ODEs describing the systems with physics-informed neural networks (PINNs), we analyze stable and unstable equilibria, bifurcation points, and the effectiveness of interventions.

We estimate parameters from real-world data (e.g., adversarial success rates, detection latency, patching delays) and simulate attack propagation scenarios across 8 sectors (enterprise, retail, trading, development, customer service, academia, medical, and critical infrastructure AI tools).

Our results demonstrate how agent population dynamics interact with architectural and policy design interventions to stabilize the system.

This framework bridges concepts from dynamical systems and cybersecurity to offer a proactive, quantitative toolbox on AI safety.

We argue that epidemic-style monitoring and tools grounded in interpretable, physics-aligned dynamics can serve as early warning systems for cascading AI agentic failures.

Read More

This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.