Jul 29, 2024
Alignment Research Critiquer
Nancy Vigil, Ashish Rai
Alignment Research Critiquer is a tool for early career and independent alignment researchers to have access to high-quality feedback loops
Marc Carauleanu
Great project! I like how all claims made by the model are supported by evidence from the paper. I would be excited to see the evidence section hidden for convenience and opened when needed. On top of that, it would like to see features being integrated that try to collate evidence from other papers for the claims (both positive and negative).
Jonny Spicer
This is a great project and I’m excited to see that you have detailed next steps to continue. I think it’s reasonably likely that lack of senior mentorship is the largest bottleneck in safety research now, so I think this is trying to solve the most important problem in the domain. Further, I think you’ve made an excellent start. As an additional further step, I’d be excited to see you explore the feasibility of fine-tuning a model. Given the relatively small number of examples usually required for fine-tuning runs, I think by eg collaborating with MATS mentors, you could plausibly get the example data you need.
Jason Schreiber
Working on a good AIS research critique agent seems very promising to me. I’d be excited to see some experimentation on the UX side here and some thoughts on how to collect information on the tool’s performance.
Jacques Thibodeau
I think this project is headed in the right direction! One of the issues with paper critiques from language models is perhaps that the model does not have enough context for what makes a good research paper in a niche within alignment research. Therefore, focusing on specific areas of alignment research (like mechanistic interpretability), we might be able to create significantly better critiques from the models.
Cite this work
@misc {
title={
@misc {
},
author={
Nancy Vigil, Ashish Rai
},
date={
7/29/24
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}