Nov 24, 2024
Encouraging Chain-of-Thought Reasoning
Shreyans Jain, Thomas Walker, Kutay Buyruk, Soumyadeep Bose
Encouraging Chain-of-Thought Reasoning via Feature Steering in Large Language Models
Jason Schreiber
cool idea with relevance to AI safety (model oversight / reasoning transparency; though slight caveat regarding faithfulness of CoT). I think this deserves further exploration and could potentially shed light on important methodological questions (such as faithfulness of model reasoning). These questions are not easy to study but this seems like a great first step!
Tom McGrath
This is a really nice project on chain of throught. The experiments are logical and well conducted, and the presentation of the results is clear. The uplift in chain of thought performance is quite surprising - I'd be interested to know if the authors tuned the feature strengths or set them at the default intervention strength. Feature steering curves (feature strength vs performance) often peak at somewhat different points on different features (even semantically very similar ones) so tuning can be very worth doing. The findings on uncertainty at the first tokens of a direct response are intriguing and worth some more investigation.
A very interesting extension would be to test the generalisation of these features to another domain where CoT reasoning is important (ideally something non-mathematical, for example logic puzzles). Seeing a scatter plot of performance
on one domain vs performance on another domain would be very informative - my concern is that steering might improve one kind of performance at the expense of another.
Alana Xiang
Very cool stab at increasing CoT via steering. I would like to see a fuller investigation of how the faithfulness of the steered CoT compares to prompted CoT.
Cite this work
@misc {
title={
@misc {
},
author={
Shreyans Jain, Thomas Walker, Kutay Buyruk, Soumyadeep Bose
},
date={
11/24/24
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}