Keep Apart Research Going: Donate Today
Summary
This project presents a lightweight and interpretable routing system that selects among multiple reasoning strategies—Zero-Shot, Chain-of-Thought (CoT), Program-Aided Language (PAL), ReAct, and Few-Shot—based on features of the user query. It supports the vision of expert orchestration by treating large language model (LLM) prompting strategies as modular, agent-like components.
We frame the problem as a reinforcement learning task using a custom Gym environment, training a PPO agent on a small synthetic dataset with handcrafted, human-interpretable features. The router achieves ~23% accuracy (above random baseline of 20%), showing early signal that reasoning strategy can be learned and predicted from query structure.
Our prototype leverages the CrewAI framework, allowing seamless generalization to multi-agent setups and agent routing, making it production-aligned. The system supports full agent traceability for debugging and interpretability.
Though built on just 39 examples, this demo shows potential for scaling up with semantic features, local LLMs, and more realistic workloads. Future work includes routing across nested agents, evaluation on local models, and applying mechanistic interpretability to policy analysis.
Cite this work:
@misc {
title={
},
author={
Henry Luan
},
date={
6/2/25
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}