Jan 24, 2025
CoTEP: A Multi-Modal Chain of Thought Evaluation Platform for the Next Generation of SOTA AI Models
Alyssia J, Martin CL
Summary
As advanced state-of-the-art models like OpenAI's o-1 series, the upcoming o-3 family, Gemini 2.0 Flash Thinking and DeepSeek display increasingly sophisticated chain-of-thought (CoT) capabilities, our safety evaluations have not yet caught up. We propose building a platform that allows us to gather systematic evaluations of AI reasoning processes to create comprehensive safety benchmarks. Our Chain of Thought Evaluation Platform (CoTEP) will help establish standards for assessing AI reasoning and ensure development of more robust, trustworthy AI systems through industry and government collaboration.
Cite this work:
@misc {
title={
CoTEP: A Multi-Modal Chain of Thought Evaluation Platform for the Next Generation of SOTA AI Models
},
author={
Alyssia J, Martin CL
},
date={
1/24/25
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}