Dear Apart Community,
Welcome to our newsletter - Apart News!
At Apart Research there is so much brilliant research, great events, and countless community updates to share.
This week’s edition of Apart News looks at new research coming out of Apart, successful acceptances into NeurIPS, and about our team offsite in a remote location next week.
Apart Paper Accepted to NeurIPS
We are proud to announce that Apart’s new co-authored paper, Interpreting Learned Feedback Patterns in Large Language Models, has been accepted to the prestigious NeurIPS. Read the full paper here.
It is authored by Luke Marks (Apart Research Fellow), Amir Abdullah (Cynch.ai, Apart Research Fellow), Fazl Barez (Apart Research Advisor) and Clement Neo (Apart Research Assistant), Rauno Arike (Apart), David Krueger (Cambridge_Uni), and Phillip Torr (Department of Engineering Sciences, University of Oxford).
Do Models Internalize our Preferences?
Accompanying our new paper is an accessible new blog post, called Do models really internalize our preferences?. The paper finds that if LLM behavior diverges from human feedback, unintended consequences may arise.
In spite of RLHF’s apparent success with OpenAI’s ChatGPT - it is unclear whether the LLMs accurately learn the underlying preferences in human feedback data. And so understanding so-called "Learning Feedback Patterns (LFPs)" is crucial to addressing AI safety concerns.
If LFPs can indeed be interpreted and used to understand how closely an LLM's activations match human feedback - this may help minimize risks associated with misaligned LLM behavior.
Does this matter? Is it a concern if LLMs learn human preferences? Do the LLMs learn from RLHF what is right and wrong? Or is something else happen here?
As LLMs become more complex and integrated into critical systems, understanding their internal workings will become increasingly important - not just for optimizing performance, but for ensuring they are used responsibly and safely.
More Research from Apart this week
Well done once more to Fazl Barez our Research Advisor, Clement Neo our Research Assistant, and Michael Lan a former Research Fellow of ours. Two more of their papers have been accepted to the 2024 Conference on Empirical Methods in Natural Language Processing.
- Interpreting Context Look-ups in Transformers: Investigating Attention-MLP Interactions.
- Towards Interpretable Sequence Continuation: Analyzing Shared Circuits in Large Language Models.
Concordia Hackathon Winners
Our Concordia Contest ran with the Cooperative AI Foundation had participants aim at advancing the cooperative intelligence of LLM agents. Could AI agents end up navigating complex social scenarios? Negotiate treaties? Manage shared resources? Become diplomats? Participants ran their agents on Google DeepMind’s Concordia to analyze how well they cooperated even in unknown settings.
The winning agents: [1st, $1,000] agent narra, Jord Nguyen [2nd, $600] very_cooperative agent, Jakub Fidler [3rd, $300] associated_mem_omniscient_reader, Akash Kundu [4th, $100] Kevin_agent, Kevin Vegda.
Other opportunities provided to the cohort besides cash prizes include:
- A private mentorship session for hackathon participants from the competition authors;
- The opportunity to compete for $10,000 in prizes with $10,000 in travel grants to NeurIPS;
- Potential authorship on a paper for high-achieving or interesting agent submissions;
- $50,000 in compute credits available for participants from under-represented and under-resourced groups by request;
- And more!
Apart Abroad
Next week, the new team at Apart is going to meet for the first time at a remote location… we will be working hard and putting our mission to practice!
Opportunities
- Agent Security Hackathon: while most AI safety research focuses on LLMs, the AI systems being deployed in the real world are far more complex. Join us in working on making AI agents safe, secure and beneficial for humanity. Sign up here.
- AI Agent PhD Funding Opportunity: our good friends at the Cooperative AI Foundation are looking for early career researchers with a specific interest in multi-agent/cooperation problems involving AI systems. They offer funding of up to $40,000 per year to cover your living expenses (if tuition is required, they will cover tuition costs in addition). Apply here.
Have a great week and let’s keep working towards safe AI.
‘We are an AI safety lab - our mission is to ensure AI systems are safe and beneficial.’