Join Apart Research's alignment hackathons on interesting and new agendas with professional alignment researchers. Participate in the next alignment jam below!
Join this AI safety hackathon to compete in uncovering novel aspects of how language models work! This follows the "black box interpretability" agenda of Buck Shlegeris. Read more.
Compete for $1,000 to create the best research projects in Black Box Interpretation!Read more and participate!See project ideas
Create a platform for students, researchers, entrepreneurs, and developers to get free AI safety ideas with the possibility of pre-committed associated funding. Collaborating with Nonlinear.
Website with leaderboards for major issues in AI safety with rewards / prizes associated with the solving or optimization of them. Reflects ideas from AIcrowd.com, OpenPhil's adversarial learning challenge and the ELK competition.
Many in the community argue that we need more for-profit ventures focused on helping the long term future since they can quickly scale and drive massive impact. Apart Research works on developing ideas within the space of AI safety and disseminate them.
By interviewing and surveying AI safety researchers and longtermists, we can pinpoint areas where there is a gap in the current tooling for the community.
We generally attempt to make our research and projects publicly accessible while also creating dedicated outreach projects in relation to AI safety.
Create different datasets with vastly differing types of data and fine-tune language models on these to see which domains lead to better alignment and analyze the different value groundings of the models.
Study the decay of aggressive action in the face of rule / context change in simulated agents, i.e. have agents fight in fighting game and let the same characters collaborate with the same control scheme in another type of game. How quickly do they adapt and is it easier in one direction than the other?
Create an overview of the main ways neural, probabilistic models are different from symbolic, logical AI safety problems. Relate to contemporary models and use cases.
Study how the process of researching alignment differs when using the image domain versus when using the text domain.
Imagine an alignment experiment and study that focuses on CLIP+StyleGAN to generate images and then attempts to "align" the output with what we actually intend it to generate. This is part of a bigger analysis on multimodal alignment work.
Analyze the capabilities of current interpretability tools in three depths: 1) No-code, 2) library implementation, 3) custom programming solutions. (3) will contribute to the development of new interpretability tools. This will play into exploring productizations of AI safety theory.
In a collaboration with the Stanford Center for AI Safety, we hope to make it easier for end-user engineers to understand models for geological marking which will play a part in the wider project of interpretability tooling.
Website with the major issues in AI safety as categories and some sort of timeline view of the progress each issue is getting.
Create massive public outreach projects meant to elicit funding from philanthropists, interest and law-making from governments, and active research from researchers and developers. E.g. documentaries, viral memes, etc.
A*PART researches how to align artificial intelligence with human values in applied alignment. We use our own and others' public tools and hope for a future with a benevolent relationship between human and AI.