All Apart hackathon projects
Here you can see all hackathon projects submitted to the
Apart hackathons
.
Omniscient Narrative Agent
Jord Nguyen, Akash Kundu, Gayatri K
December 13, 2024
The Concordia Contest: Advancing the Cooperative Intelligence of Language Model Agents
October 3, 2024
4th π
3rd π
2nd π
1st π
AutoSteer: Weight-Preserving Reinforcement Learning for Interpretable Model Control
Jeremias Lino Ferrao
December 3, 2024
Reprogramming AI Models Hackathon
4th π
3rd π
2nd π
1st π
Promoting School-Level Accountability for the Responsible Deployment of AI and Related Systems in K-12 Education: Mitigating Bias and Increasing Transparency
Chloe Jefferson
November 27, 2024
Howard University AI Safety Summit & Policy Hackathon
4th π
3rd π
2nd π
1st π
Diamonds are Not All You Need
Michael Andrzejewski, Melwina Albuquerque
November 10, 2024
Agent Security Hackathon
4th π
3rd π
2nd π
1st π
Speculative Consequences of A.I. Misuse
Joseph Karam, Charlie Nguyen, Andrew Lam
November 10, 2024
AI capabilities and risks demo-jam: Creating visceral interactive demonstrations
4th π
3rd π
2nd π
1st π
AI Alignment Knowledge Graph
Matin Mahmood, Samuel Ratnam, Sruthi Kuriakose, Pandelis Mouratoglou
November 10, 2024
Research Augmentation Hackathon: Supercharging AI Alignment
4th π
3rd π
2nd π
1st π
Robust Machine Unlearning for Dangerous Capabilities
Neel Jay, Austin Meek, Joshua Ehizibolo
November 7, 2024
AI Policy Hackathon at Johns Hopkins University
4th π
3rd π
2nd π
1st π
EscalAtion: Assessing Multi-Agent Risks in Military Contexts
Gabriel Mukobi, Anka Reuel, Juan-Pablo Rivera, Chandler Smith
September 12, 2024
October 2, 2023
4th π
3rd π
2nd π
1st π
DarkForest - Defending the Authentic and Humane Web
Mustafa Yasir
September 5, 2024
Hackathon for Technical AI Safety Startups
4th π
3rd π
2nd π
1st π
Sandbag Detection through Model Degradation
Cam Tice, Philipp Alexander Kreer, Fedor Ryzhenkov, Nathan Helm-Burger, Prithviraj Singh Shahani
July 8, 2024
Deception Detection Hackathon: Preventing AI deception
4th π
3rd π
2nd π
1st π
Unsupervised Recovery of Hidden Markov Models from Transformers with Evolutionary Algorithms
Dylan Bowman, Colin Lu
June 14, 2024
Computational Mechanics Hackathon!
4th π
3rd π
2nd π
1st π
rAInboltBench : Benchmarking user location inference through single images
Le "Qronox" Lam ; Aleksandr Popov ; Jord Nguyen ; Trung Dung "mogu" Hoang ; Marcel M ; Felix Michalak
May 31, 2024
AI Security Evaluation Hackathon: Measuring AI Capability
4th π
3rd π
2nd π
1st π
Beyond Refusal: Scrubbing Hazards from Open-Source Models
Kyle Gabriel Reynoso, Ivan Enclonar, Lexley Maree Villasis
May 8, 2024
AI and Democracy Hackathon: Demonstrating the Risks
4th π
3rd π
2nd π
1st π
We Discovered An Neuron
Joseph Miller, Clement Neo
February 24, 2024
January 25, 2023
4th π
3rd π
2nd π
1st π
Investigating Neuron Behaviour via Dataset Example Pruning and Local Search
Alex Foote
February 24, 2024
November 15, 2022
4th π
3rd π
2nd π
1st π
Seemingly Human: Dark Patterns in ChatGPT
Jin Suk Park, Angela Lu, Esben Kran
February 24, 2024
February 12, 2024
4th π
3rd π
2nd π
1st π
Model Cards for AI Algorithm Governance
Jaime Raldua Veuthey; Gediminas Dauderis; Chetan Talele
February 24, 2024
January 7, 2024
4th π
3rd π
2nd π
1st π
Detecting Implicit Gaming through Retrospective Evaluation Sets
Jacob Haimes, Lucie Philippon, Alice Rigg, Cenny Wenner
February 24, 2024
November 27, 2023
4th π
3rd π
2nd π
1st π
Preserving Agency in Reinforcement Learning under Unknown, Evolving and Under-Represented Intentions
Harry Powell, Luigi Berducci
February 24, 2024
September 25, 2023
4th π
3rd π
2nd π
1st π
In the Mirror: Using Chess to Simulate Agency Loss in Feedback Loops
Helios Lyons
February 24, 2024
September 24, 2023
4th π
3rd π
2nd π
1st π
Evaluating Myopia in Large Language Models
Marco Bazzani, Felix Binder
February 24, 2024
September 10, 2023
4th π
3rd π
2nd π
1st π
Data Taxation
Joshua Sammet, Per Ivar Friborg, William Wale
February 24, 2024
July 21, 2023
4th π
3rd π
2nd π
1st π
Relating induction heads in Transformers to temporal context model in human free recall
Ji-An Li
February 24, 2024
July 17, 2023
4th π
3rd π
2nd π
1st π
Exploring the Robustness of Model-Graded Evaluations of Language Models
Simon Lermen, OndΕej Kvapil
February 24, 2024
July 2, 2023
4th π
3rd π
2nd π
1st π
Solving the CNN Mech Int Challenge
Stefan Heimersheim, Marius Hobhahn
February 24, 2024
May 10, 2023
4th π
3rd π
2nd π
1st π
Automated Sandwiching: Efficient Self-Evaluations of Conversation-Based Scalable Oversight Techniques
Sophia Pung, Gabriel Mukobi
February 24, 2024
February 16, 2023
4th π
3rd π
2nd π
1st π
Discovering Latent Knowledge in Language Models Without Supervision - extensions and testing
Agatha Duzan, Matthieu David, Jonathan Claybrough
February 24, 2024
December 19, 2022
4th π
3rd π
2nd π
1st π
Agreeableness vs. Truthfulness
February 24, 2024
October 18, 2022
4th π
3rd π
2nd π
1st π
Classification on Latent Feature Activation for Detecting Adversarial Prompt Vulnerabilities
Hoang-Long Tran, Jack Kaunismaa, Edward Stevinson, Parv Mahajan, Oliver Clive-Griffin
December 3, 2024
Reprogramming AI Models Hackathon
4th π
3rd π
2nd π
1st π
Dynamic Risk Assessment in Autonomous Agents Using Ontologies and AI
Alejandra de Brunner
November 10, 2024
Agent Security Hackathon
4th π
3rd π
2nd π
1st π
Very Cooperative Agent
Jakub Fidler
November 10, 2024
The Concordia Contest: Advancing the Cooperative Intelligence of Language Model Agents
October 6, 2024
4th π
3rd π
2nd π
1st π
CoPirate
Mia Hopman, Carissa Cullen, Jack Wittmayer, Vaishnavi Pamulapati
November 10, 2024
AI capabilities and risks demo-jam: Creating visceral interactive demonstrations
4th π
3rd π
2nd π
1st π
Grant Application Simulator
MichaΓ«l Trazzi
November 10, 2024
Research Augmentation Hackathon: Supercharging AI Alignment
4th π
3rd π
2nd π
1st π
SafeBites
Arman Mahjoor, Vansh Bataviya, Charles Landreaux, Ryan Mahjoor,
October 29, 2024
AI Policy Hackathon at Johns Hopkins University
4th π
3rd π
2nd π
1st π
Simulation Operators: The Next Level of the Annotation Business
Ardy Haroen
September 5, 2024
Hackathon for Technical AI Safety Startups
4th π
3rd π
2nd π
1st π
Detecting and Controlling Deceptive Representation in LLMs with Representational Engineering
Avyay M Casheekar, Kaushik Sanjay Prabhakar, Kanishk Rath, Sienka Dounia
August 29, 2024
Deception Detection Hackathon: Preventing AI deception
4th π
3rd π
2nd π
1st π
RNNs represent belief state geometry in hidden state
Keenan Pepper
June 14, 2024
Computational Mechanics Hackathon!
4th π
3rd π
2nd π
1st π
Cybersecurity Persistence Benchmark
Davide Zani, Felix Michalak, Jeremias Ferrao
May 31, 2024
AI Security Evaluation Hackathon: Measuring AI Capability
4th π
3rd π
2nd π
1st π
Jekyll and HAIde: The Better an LLM is at Identifying Misinformation, the More Effective it is at Worsening It.
Mayowa Osibodu
May 8, 2024
AI and Democracy Hackathon: Demonstrating the Risks
4th π
3rd π
2nd π
1st π
Fishing for the answer: Mapping the flow of information in LLM agent groups using lessons from fish schools
Matthew Lutz, Nyasha Duri
February 24, 2024
February 12, 2024
4th π
3rd π
2nd π
1st π
Obsolescent Souls
Markov
February 24, 2024
January 7, 2024
4th π
3rd π
2nd π
1st π
Visual Prompt Injection Detection
Yoann Poupart, Imene Kerboua
February 24, 2024
November 27, 2023
4th π
3rd π
2nd π
1st π
Jailbreaking the Overseer
Alexander Meinke
February 24, 2024
October 1, 2023
4th π
3rd π
2nd π
1st π
Discovering Agency Features as Latent Space Directions in LLMs via SVD
max max
February 24, 2024
September 25, 2023
4th π
3rd π
2nd π
1st π
Agency as Shanon information. Unveiling limitations and common misconceptions
Ivan Madan, Hennadii Madan
February 24, 2024
September 24, 2023
4th π
3rd π
2nd π
1st π
Against Agency
Catherine Brewer
February 24, 2024
September 21, 2023
4th π
3rd π
2nd π
1st π
The AI governance gaps in developing countries
N Tran
February 24, 2024
July 21, 2023
4th π
3rd π
2nd π
1st π
Who cares about brackets?
Theo Clark, Alex Roman, Hannes Thurnherr
February 24, 2024
July 17, 2023
4th π
3rd π
2nd π
1st π
From Sparse to Dense: Refining the MACHIAVELLI Benchmark for Real-World AI Safety
Heramb Podar, Vladislav Bargatin
February 24, 2024
July 4, 2023
4th π
3rd π
2nd π
1st π
Dropout Incentivizes Privileged Bases
Edoardo Pona, Victor Levoso FernΓ ndez, Abhay, Kunvar
February 24, 2024
May 10, 2023
4th π
3rd π
2nd π
1st π
Player Of Games
Samuel Knoche
February 24, 2024
February 16, 2023
4th π
3rd π
2nd π
1st π
Identifying a Preliminary Circuit for Predicting Gendered Pronouns in GPT-2 Small
Chris Mathwin, Guillaume Corlouer
February 24, 2024
January 25, 2023
4th π
3rd π
2nd π
1st π
Investigating Training Dynamics via Token Loss Trajectories
Alex Foote
February 24, 2024
December 19, 2022
4th π
3rd π
2nd π
1st π
Backup Transformer Heads are Robust to Ablation Distribution
Lucas Sato, Gabe Mukobi, Mishika Govil
February 24, 2024
November 15, 2022
4th π
3rd π
2nd π
1st π
AI: My Partner in Crime
February 24, 2024
October 18, 2022
4th π
3rd π
2nd π
1st π
Utilitarian Decision-Making in Models - Evaluation and Steering
Adam Newgas, Sinem Erisken , Pandelis Mouratoglou
December 3, 2024
Reprogramming AI Models Hackathon
4th π
3rd π
2nd π
1st π
Cop N' Shop
Vaishnavi Pamulapati, Diego Sabajo, Andres Sepulveda Morales, Elsa Donnat, Paul Vautravers
November 10, 2024
Agent Security Hackathon
4th π
3rd π
2nd π
1st π
LLM Research Collaboration Recommender
David McSharry
November 10, 2024
Research Augmentation Hackathon: Supercharging AI Alignment
4th π
3rd π
2nd π
1st π
Sue-Per GPT: Legal RAG Assistant
Atir Petkar, Jay Liu, Chelsea Wong, Nancy Vigil
November 7, 2024
AI Policy Hackathon at Johns Hopkins University
4th π
3rd π
2nd π
1st π
AI Safety Collective - Crowdsourcing Solutions for Critical AI Safety Challenges
Lye Jia Jun, Dhruba Patra, Philipp Blandfort
September 5, 2024
Hackathon for Technical AI Safety Startups
4th π
3rd π
2nd π
1st π
Detecting Deception in GPT-3.5-turbo: A Metadata-Based Approach
Siddharth Reddy Bakkireddy, Rakesh Reddy Bakkireddy
July 8, 2024
Deception Detection Hackathon: Preventing AI deception
4th π
3rd π
2nd π
1st π
Handcrafting a Network to Predict Next Token Probabilities for the Random-Random-XOR Process
Rick Goldstein
June 14, 2024
Computational Mechanics Hackathon!
4th π
3rd π
2nd π
1st π
Investigating the Effect of Model Capacity Constraints on Belief State Representations
Ari Brill, Chu Chen
June 14, 2024
Computational Mechanics Hackathon!
4th π
3rd π
2nd π
1st π
Say No to Mass Destruction: Benchmarking Refusals to Answer Dangerous Questions
Alexander Pino, Carl Vinas, Joseph Dantes, Zmavli Caimle, Kyle Reynoso
May 31, 2024
AI Security Evaluation Hackathon: Measuring AI Capability
4th π
3rd π
2nd π
1st π
Artificial Advocates: Biasing Democratic Feedback using AI
Sam Patterson, Jeremy Dolan, Simon Wisdom, Maten
May 8, 2024
AI and Democracy Hackathon: Demonstrating the Risks
4th π
3rd π
2nd π
1st π
OthelloScope
Albert Garde, Esben Kran
February 24, 2024
May 10, 2023
4th π
3rd π
2nd π
1st π
Automated Identification of Potential Feature Neurons
Michelle Wai Man Lo
February 24, 2024
January 25, 2023
4th π
3rd π
2nd π
1st π
Model editing hazards at the example of ROME
Oscar Persson, Jochem HΓΆlscher
February 24, 2024
November 15, 2022
4th π
3rd π
2nd π
1st π
Iterated contract negotiation
Robert Klassert
February 24, 2024
February 11, 2024
4th π
3rd π
2nd π
1st π
2030 - The CEO Dilemna
Pierina Camarena, Leon Nyametso, Capucine Marteau
February 24, 2024
January 8, 2024
4th π
3rd π
2nd π
1st π
Cross-Lingual Generalizability of the SADDER Benchmark
Siddhant Arora, Jord Nguyen, Akash Kundu
February 24, 2024
November 27, 2023
4th π
3rd π
2nd π
1st π
LLMs With Knowledge of Jailbreaks Will Use Them
Jack Foxabbott, Marcel Hedman, Kaspar Senft, Kianoosh Ashouritaklimi
February 24, 2024
October 2, 2023
4th π
3rd π
2nd π
1st π
Uncertainty about value naturally leads to empowerment
Filip Sondej
February 24, 2024
September 26, 2023
4th π
3rd π
2nd π
1st π
Comparing truthful reporting, intent alignment, agency preservation and value identification
Aksinya Bykova
February 24, 2024
September 24, 2023
4th π
3rd π
2nd π
1st π
Building brakes for a speeding car: A global coordination proposal for AI safety
Charles Martinet, Blanche Freudenreich, Henry Papadatos, Manuel Bimich
February 24, 2024
July 21, 2023
4th π
3rd π
2nd π
1st π
Embedding and Transformer Synthesis
Rick Goldstein
February 24, 2024
July 16, 2023
4th π
3rd π
2nd π
1st π
MAXIAVELLI: Thoughts on improving the MACHIAVELLI benchmark
Roman Leventov, Jason Hoelscher-Obermaier
February 24, 2024
July 2, 2023
4th π
3rd π
2nd π
1st π
Reverse Word Wizards: Pitting Language Models Against the Art of Reversal
Ingrid Backman, Asta Rassmussen, Klara Nielsen
February 24, 2024
February 16, 2023
4th π
3rd π
2nd π
1st π
Counting Letters, Chaining Premises & Solving Equations: Exploring Inverse Scaling Problems with GPT-3
D. Chipping, J. Harding, H. Mannering, P. Selvaraj
February 24, 2024
December 19, 2022
4th π
3rd π
2nd π
1st π
All Fish are Trees
February 24, 2024
October 18, 2022
4th π
3rd π
2nd π
1st π
Steering Swiftly to Safety with Sparse Autoencoders
Agatha Duzan, Guillaume Martres, Syrine Noame, Abhinand Shibu, Flavia Wallenhorst, Arthur Wuhrmann
December 3, 2024
Reprogramming AI Models Hackathon
4th π
3rd π
2nd π
1st π
OCAP Agents
Dmitrii Volkov
November 10, 2024
Agent Security Hackathon
4th π
3rd π
2nd π
1st π
Phish Tycoon: phishing using voice cloning
Craig Albuquerque, Melwina Albuquerque
November 10, 2024
AI capabilities and risks demo-jam: Creating visceral interactive demonstrations
4th π
3rd π
2nd π
1st π
PurePrompt - An easy tool for prompt robustness and eval augmentation
Axel Sorensen
November 10, 2024
Research Augmentation Hackathon: Supercharging AI Alignment
4th π
3rd π
2nd π
1st π
Understanding Incentives To Build Uninterruptible Agentic AI Systems
Damin Curtis, M.A. International Affairs Norman Piotriowski, B.Sc. Data Science
October 29, 2024
AI Policy Hackathon at Johns Hopkins University
4th π
3rd π
2nd π
1st π
Identity System for AIs
Artem Grigor
September 5, 2024
Hackathon for Technical AI Safety Startups
4th π
3rd π
2nd π
1st π
Modelling the oversight of automated interpretability against deceptive agents on sparse autoencoders
Simon Lermen, Mateusz Dziemian
July 8, 2024
Deception Detection Hackathon: Preventing AI deception
4th π
3rd π
2nd π
1st π
Benchmarking Dark Patterns in LLMs
Jord Nguyen, Akash Kundu, Sami Jawhar
May 31, 2024
AI Security Evaluation Hackathon: Measuring AI Capability
4th π
3rd π
2nd π
1st π
Unleashing Sleeper Agents
Nora Petrova, Jord Nguyen
May 8, 2024
AI and Democracy Hackathon: Demonstrating the Risks
4th π
3rd π
2nd π
1st π
Towards High-Quality Model-Written Evaluations
Jannes Elstner, Jaime Raldua Veuthey
February 24, 2024
November 27, 2023
4th π
3rd π
2nd π
1st π
Second-order Jailbreaks
Mikhail Terekhov, Romain Graux, Denis Rosset, Eduardo Neville, Gabin Kolly
February 24, 2024
October 2, 2023
4th π
3rd π
2nd π
1st π
ILLUSION OF CONTROL
Mary Osuka
February 24, 2024
September 25, 2023
4th π
3rd π
2nd π
1st π
Agency, value and empowerment.
Benjamin Sturgeon, Leo Hyams
February 24, 2024
September 24, 2023
4th π
3rd π
2nd π
1st π
Premortem AI
Alvin Γ nestrand, Matthias Endres, Harry Powell, Chris Lonsberry
February 24, 2024
July 21, 2023
4th π
3rd π
2nd π
1st π
Interpreting Planning in Transformers
Victor Levoso Fernandez , Abhay Sheshadri
February 24, 2024
July 17, 2023
4th π
3rd π
2nd π
1st π
Exploitation of LLMβs to Elicit Misaligned Outputs
Desik Mandava, Jayanth Santosh, Aishwarya Gurung
February 24, 2024
July 2, 2023
4th π
3rd π
2nd π
1st π
Improving TransformerLens Head Detector
Mateusz BagiΕski, Jay Bailey
February 24, 2024
May 10, 2023
4th π
3rd π
2nd π
1st π
Soft Prompts are a Convex Set
- Amir Sarid - Bary Levy - Dan Barzily - Edo Arad - Gal Hyams - Geva Kipper - Guy Dar - Itay Yona - Yossi Gandelsman
February 24, 2024
January 25, 2023
4th π
3rd π
2nd π
1st π
Trojan detection and implementation on transformers
ClΓ©ment Dumas, Charbel-RaphaΓ«l Segerie, Liam Imadache
February 24, 2024
December 19, 2022
4th π
3rd π
2nd π
1st π
Probing Conceptual Knowledge on Solved Games
Amir Sarid, Bary Levy, Dan Barzilay, Edo Arad, Itay Yona, Joey Geralnik
February 24, 2024
November 15, 2022
4th π
3rd π
2nd π
1st π
Reducing hindsight neglect with "Let's think step by step"
February 24, 2024
October 18, 2022
4th π
3rd π
2nd π
1st π
AI ADVISORY COUNCIL FOR SUSTAINABLE ECONOMIC GROWTH AND ETHICAL INNOVATION IN THE DOMINICAN REPUBLIC (CANIA)
Said Saillant, Kay Kozaronek, Shaun Pexton, Cyra Alesha, Elise Racine
October 29, 2024
AI Policy Hackathon at Johns Hopkins University
4th π
3rd π
2nd π
1st π
Multifaceted Benchmarking
Eduardo Neville, George Golynskyi, Tetra Jones
February 24, 2024
November 27, 2023
4th π
3rd π
2nd π
1st π
Exploring multi-agent interactions in the dollar auction
Thomas Broadley, Allison Huang
February 24, 2024
October 2, 2023
4th π
3rd π
2nd π
1st π
AutoAdminsteredAntidotes: Circuit detection in a poisoned model for MNIST classification
Kola Ayonrinde, Denizhan βDennisβ Akar, Kitti KovΓ‘cs, Adam Newgas, David Quarel
February 24, 2024
May 10, 2023
4th π
3rd π
2nd π
1st π
Reasoning with Chain of Thought
February 24, 2024
October 18, 2022
4th π
3rd π
2nd π
1st π
Modernizing DCβs Emergency Communications
Thane Douglass, Anuoluwapo Soneye
November 26, 2024
AI Policy Hackathon at Johns Hopkins University
4th π
3rd π
2nd π
1st π
Wording influences truthfulness
February 27, 2024
October 18, 2022
4th π
3rd π
2nd π
1st π
Othello Mechint playground
Victor Levoso Fernandez, Edoardo Pona ,Abhay Sheshadri, Kunvar
February 24, 2024
May 10, 2023
4th π
3rd π
2nd π
1st π
Detecting Phase Transitions
Jesse Hoogland, Lucas Texeira, Benjamin Gerraty, Rumi Salazar, Samuel Knoche
February 24, 2024
May 10, 2023
4th π
3rd π
2nd π
1st π
Simulating an Alien
February 24, 2024
October 18, 2022
4th π
3rd π
2nd π
1st π
Exploring OthelloGPT
Yeu-Tong Lau
February 24, 2024
May 10, 2023
4th π
3rd π
2nd π
1st π
Glia
Chloe Anderson, Mashrur Wasek, Aidan Schurr, Chris Gagnon
December 6, 2024
AI Policy Hackathon at Johns Hopkins University
4th π
3rd π
2nd π
1st π
SAGE: Safe, Adaptive Generation Engine for Long Form Document Generation in Collaborative, High Stakes Domains
Abrar Rahman, Anish Sundar
December 6, 2024
Reprogramming AI Models Hackathon
4th π
3rd π
2nd π
1st π
Glia for Healthcare Organisations
Chloe Anderson, Mashrur Wasek, Aidan Schurr, Chris Gagnon
December 6, 2024
AI Policy Hackathon at Johns Hopkins University
4th π
3rd π
2nd π
1st π
Analyzing Dataset Bias with SAEs
Nick Jiang, Joseph Tey
November 25, 2024
Reprogramming AI Models Hackathon
4th π
3rd π
2nd π
1st π
Investigate arithmetic features in Multi-lingual LLMs
Akash Kundu, Ashish Rai, Suhas K R
November 25, 2024
Reprogramming AI Models Hackathon
4th π
3rd π
2nd π
1st π
Bias Mitigation in LLM by Steering Features
Akanksha Devkar
November 25, 2024
Reprogramming AI Models Hackathon
4th π
3rd π
2nd π
1st π
Faithful or Factual? Tuning Mistake Acknowledgment in LLMs
Daniel Donnelly, Mia Hopman, Jack Wittmayer
November 24, 2024
Reprogramming AI Models Hackathon
4th π
3rd π
2nd π
1st π
Improving Llama-3-8B-Instruct Hallucination Robustness in Medical Q&A Using Feature Steering
Diego Sabajo, Eitan Sprejer, Matas Zabaljauregui, Oliver Morris
November 24, 2024
Reprogramming AI Models Hackathon
4th π
3rd π
2nd π
1st π
Unveiling Latent Beliefs Using Sparse Autoencoders
Carlos Cortez, Eivind Otto Hjelle, Sanchit Kalhan
November 24, 2024
Reprogramming AI Models Hackathon
4th π
3rd π
2nd π
1st π
Can we steer a modelβs behavior with just one prompt? investigating SAE-driven auto-steering
Nicole Nobili, Davide Ghilardi, Wen Xing
November 24, 2024
Reprogramming AI Models Hackathon
4th π
3rd π
2nd π
1st π
Improving Llama-3-8b Hallucination Robustness in Medical Q&A Using Feature Steering
Diego Sabajo, Eitan Sprejer, Matas Zabaljauregui, Oliver Morris
November 24, 2024
AI Policy Hackathon at Johns Hopkins University
4th π
3rd π
2nd π
1st π
Sparse Autoencoders and Gemma 2-2B: Pioneering Demographic-Sensitive Language Modeling for Opinion QA
Qianmian Guo
November 24, 2024
Reprogramming AI Models Hackathon
4th π
3rd π
2nd π
1st π
Assessing Language Model Cybersecurity Capabilities with Feature Steering
Stefan Jones
November 24, 2024
Reprogramming AI Models Hackathon
4th π
3rd π
2nd π
1st π
Math Speaks All Languages: Enhancing LLM Problem-Solving Across Multilingual Contexts
Maksim Kostritsya, Kseniia Kuvshinova, Rauf Parchiev, Konstantin Polev
November 24, 2024
Reprogramming AI Models Hackathon
4th π
3rd π
2nd π
1st π
Edufire - Personalized Education Platform Using LLM Steering
Atir Petkar
November 24, 2024
Reprogramming AI Models Hackathon
4th π
3rd π
2nd π
1st π
Explaining Latents in Turing-LLM-1.0-254M with Pre-Defined Function Types
Daniel Davies, Ashwarya Maratha
November 24, 2024
Reprogramming AI Models Hackathon
4th π
3rd π
2nd π
1st π
Tentative proposal for AI control with weak supervisors trough Mechanistic Inspection
Gerard
November 24, 2024
Reprogramming AI Models Hackathon
4th π
3rd π
2nd π
1st π
Clear Thought and Clear Speech: Reducing Grammatical Scope Ambiguity
Zmavli Caimle
November 24, 2024
Reprogramming AI Models Hackathon
4th π
3rd π
2nd π
1st π
BBLLM
Joey SKAF, MickaΓ«l Boillaud, ThaΓ―s Distinguin
November 24, 2024
Reprogramming AI Models Hackathon
4th π
3rd π
2nd π
1st π
Investigating Feature Effects on Manipulation Susceptibility
Nishchal Prabhakar, Stefan Trnjakov, Mo Aziz
November 24, 2024
Reprogramming AI Models Hackathon
4th π
3rd π
2nd π
1st π
Let LLM Agents Perform LLM Surgery
Sharat Jacob Jacob
November 24, 2024
Reprogramming AI Models Hackathon
4th π
3rd π
2nd π
1st π
Feature Tuning versus Prompting for Ambiguous Questions
Elis Grahn, Axel Ahlqvist, Elliot Gestrin, Hemming Gong
November 24, 2024
Reprogramming AI Models Hackathon
4th π
3rd π
2nd π
1st π
Auto Prompt Injection
Yingjie Hu, Daniel Williams, Carmen Gavilanes, William Hesslefors Nairn
November 24, 2024
Reprogramming AI Models Hackathon
4th π
3rd π
2nd π
1st π
Feature based unlearning
Patrick Quinn, Yucheng Sun
November 24, 2024
Reprogramming AI Models Hackathon
4th π
3rd π
2nd π
1st π
Recovering Goodfire's SAE feature vectors from their API
Lovkush Agarwal
November 24, 2024
Reprogramming AI Models Hackathon
4th π
3rd π
2nd π
1st π
Encouraging Chain-of-Thought Reasoning
Shreyans Jain, Thomas Walker, Kutay Buyruk, Soumyadeep Bose
November 24, 2024
Reprogramming AI Models Hackathon
4th π
3rd π
2nd π
1st π
User Transparency Within AI
Jonathan King, Robert Hardy, Jeremiah Bailey, Amir Abdulgadir
November 20, 2024
Howard University AI Safety Summit & Policy Hackathon
4th π
3rd π
2nd π
1st π
Community-First: A Rights-Based Framework for AI Governance in India's Welfare Systems
Sanjnah Ananda Kumar
November 20, 2024
Howard University AI Safety Summit & Policy Hackathon
4th π
3rd π
2nd π
1st π
National Data Privacy and Governance Act
Erica Okeh
November 20, 2024
Howard University AI Safety Summit & Policy Hackathon
4th π
3rd π
2nd π
1st π
Implementing a Human-centered AI Assessment Framework (HAAF) for Equitable AI Development
Elise Racine
November 20, 2024
Howard University AI Safety Summit & Policy Hackathon
4th π
3rd π
2nd π
1st π
A Critical Review of "Chips for Peace": Lessons from "Atoms for Peace"
Amritanshu Prasad
November 20, 2024
Howard University AI Safety Summit & Policy Hackathon
4th π
3rd π
2nd π
1st π
AI Monitoring as a Rapid and Scalable Policy Solution: Weekly Global Bulletins on AI Developments
Ilke Masa
November 20, 2024
Howard University AI Safety Summit & Policy Hackathon
4th π
3rd π
2nd π
1st π
Grandfather Paradox in AI β Bias Mitigation & Ethical AI1
Maha Vishnu Sura
November 20, 2024
Howard University AI Safety Summit & Policy Hackathon
4th π
3rd π
2nd π
1st π
A Fundamental Rethinking to AI Evaluations: Establishing a Constitution-Based Framework
Arrow Paquera, and Paul Ivan Enclonar
November 20, 2024
Howard University AI Safety Summit & Policy Hackathon
4th π
3rd π
2nd π
1st π
Hero Journey: Personalized Health Interventions for the Incarcerated
Anthony Li, Samuel Ntow, Antonio Bandeira, Niroj Bhandari
October 29, 2024
AI Policy Hackathon at Johns Hopkins University
4th π
3rd π
2nd π
1st π
mHeatlth Ai
Patrick Puma, Ethan Graber, Will Kim
October 28, 2024
AI Policy Hackathon at Johns Hopkins University
4th π
3rd π
2nd π
1st π
AI Parliament
Anurag Dhungana, Prakriti Bista and Sunil Shah
October 27, 2024
AI Policy Hackathon at Johns Hopkins University
4th π
3rd π
2nd π
1st π
Policy Analysis: AI and Sustainability: Climate Impact Monitoring
Parikirt Oggu & Shawn Reginauld
October 27, 2024
AI Policy Hackathon at Johns Hopkins University
4th π
3rd π
2nd π
1st π
applai
Pratham Ashar and Vir Trivedi
October 27, 2024
AI Policy Hackathon at Johns Hopkins University
4th π
3rd π
2nd π
1st π
Digital Rebellion: Analyzing misaligned AI agent cooperation for virtual labor strikes
Michael Andrzejewski, Melwina Albuquerque
October 27, 2024
AI Policy Hackathon at Johns Hopkins University
4th π
3rd π
2nd π
1st π
Next-Gen AI-Enhanced Epidemic Intelligence
Axby Loh, Waikit Fung, Anthony Li
October 27, 2024
AI Policy Hackathon at Johns Hopkins University
4th π
3rd π
2nd π
1st π
AI and Public Health: TSA Pre Health Check
Jacob Lin, Habib Aina
October 27, 2024
AI Policy Hackathon at Johns Hopkins University
4th π
3rd π
2nd π
1st π
Mapping Intent: Documenting Policy Adherence with Ontology Extraction
Alejandra de Brunner, Mia Hopman, Jack Wittmayer
October 27, 2024
AI Policy Hackathon at Johns Hopkins University
4th π
3rd π
2nd π
1st π
EcoNavix
Sachin Kumar, Anitej Suklikar, Samarth Parekh, Roshni Kainthan
October 27, 2024
AI Policy Hackathon at Johns Hopkins University
4th π
3rd π
2nd π
1st π
Towards a Unified Framework for Cybersecurity and AI Safety: Recommendations for Secure Development of Large Language Models
Lexley Maree Villasis, Srishti Dutta, Yohan Mathew
October 27, 2024
AI Policy Hackathon at Johns Hopkins University
4th π
3rd π
2nd π
1st π
Enviro - A Comprehensive Environmental Solution Using Policy and Technology
Arun Nimmagadda, Sohil Shah, Tushar Gidadhubli, Arnav Patel
October 27, 2024
AI Policy Hackathon at Johns Hopkins University
4th π
3rd π
2nd π
1st π
Enhancing Human Verification Systems to Address AI Agent Circumvention and Attributability Concerns
Yogev Angelovici, Anish Ganga, Saathvik Kannan, Zihao Zhou
October 27, 2024
AI Policy Hackathon at Johns Hopkins University
4th π
3rd π
2nd π
1st π
Politicians on AI Safety
Liam Robins, Elise Racine, Manikanta Revuri, Bhanu Reddy
October 27, 2024
AI Policy Hackathon at Johns Hopkins University
4th π
3rd π
2nd π
1st π
Policy Framework for Sustainable AI: Repurposing Waste Heat from Data Centers in the USA
Asmita Mehta
October 27, 2024
AI Policy Hackathon at Johns Hopkins University
4th π
3rd π
2nd π
1st π
Predictive Analytics & Imagery for Environmental Monitoring
Shambhavi Adhikari, Yeji Kim, Dilrose Karakattil
October 27, 2024
AI Policy Hackathon at Johns Hopkins University
4th π
3rd π
2nd π
1st π
Proposal for U.S.-China Technical Cooperation on AI Safety
Angel Shen, Raghav Akula
October 27, 2024
AI Policy Hackathon at Johns Hopkins University
4th π
3rd π
2nd π
1st π
Proposal for a Provisional FDA Designation Targeting Biomedical Products Evaluated with Novel Methodologies
Gerard Boxo Corominas, Lucia Tortosa Nesterovich
October 27, 2024
AI Policy Hackathon at Johns Hopkins University
4th π
3rd π
2nd π
1st π
Infectious Disease Outbreak Prediction and Dashboard
Sukanya Krishna,Nikhil Dhanankam,Joyanta Jyoti Mondal
October 27, 2024
AI Policy Hackathon at Johns Hopkins University
4th π
3rd π
2nd π
1st π
Cross-model surveillance for emails handling
Le Ngoc Mai
October 7, 2024
Agent Security Hackathon
4th π
3rd π
2nd π
1st π
Inference-Time Agent Security
Nicholas Chen
October 6, 2024
Agent Security Hackathon
4th π
3rd π
2nd π
1st π
Intent Inspector - Protecting Against Prompt Injections for Agent Tool Misuse
Oliver Morris, Gerard Boxo Corominas
October 6, 2024
Agent Security Hackathon
4th π
3rd π
2nd π
1st π
AI Honeypot
Reworr
October 6, 2024
Agent Security Hackathon
4th π
3rd π
2nd π
1st π
AI Agent Capabilities Evolution
Ekaterina Krupkina
October 6, 2024
Agent Security Hackathon
4th π
3rd π
2nd π
1st π
An Autonomous Agent for Model Attribution
Jord Nguyen
October 6, 2024
Agent Security Hackathon
4th π
3rd π
2nd π
1st π
Using ARC-AGI puzzles as CAPTCHa task
Mikolaj Kniejski
October 6, 2024
Agent Security Hackathon
4th π
3rd π
2nd π
1st π
LLM Agent Security: Jailbreaking Vulnerabilities and Mitigation Strategies
mohammed arsalan , Vishwesh bhat
October 6, 2024
Agent Security Hackathon
4th π
3rd π
2nd π
1st π
AdGPT
Khaidar Bikmaev, Nikolaj Kotov, Dmitrii Volkov
October 5, 2024
AI capabilities and risks demo-jam: Creating visceral interactive demonstrations
4th π
3rd π
2nd π
1st π
Interpreting a toy model for finding the maximum element in a list
Tinuade Adeleke
September 17, 2024
ARENA 4.0 Interpretability Hackathon
4th π
3rd π
2nd π
1st π
Finding Circular Features in Gemma 2 2B
Leo, Misha
September 17, 2024
ARENA 4.0 Interpretability Hackathon
4th π
3rd π
2nd π
1st π
nnsight transparent debugging
Aaron Kaufman, David Steinberg
September 17, 2024
ARENA 4.0 Interpretability Hackathon
4th π
3rd π
2nd π
1st π
minTranscoders
Bronson Schoen
September 17, 2024
ARENA 4.0 Interpretability Hackathon
4th π
3rd π
2nd π
1st π
Latent Space Clustering and Summarization
Matthew Shinkle
September 17, 2024
ARENA 4.0 Interpretability Hackathon
4th π
3rd π
2nd π
1st π
GuardianAI
Andres Sepulveda Morales, Doroteya Stoyanova, Patrick Huang, Wanjie Zhong
September 3, 2024
Hackathon for Technical AI Safety Startups
4th π
3rd π
2nd π
1st π
Devising Effective Bechmarks
Nancy Vigil, Ashish Rai
September 1, 2024
Hackathon for Technical AI Safety Startups
4th π
3rd π
2nd π
1st π
WELMA: Open-world environments for Language Model agents
Sohaib Imran, Mujeeb Nawaz
September 1, 2024
Hackathon for Technical AI Safety Startups
4th π
3rd π
2nd π
1st π
CAMARA: A Comprehensive & Adaptive Multi-Agent framework for Red-Teaming and Adversarial Defense
Vishnu Vardhan Lanka, Era Sarda, Raghav Ravishankar
September 1, 2024
Hackathon for Technical AI Safety Startups
4th π
3rd π
2nd π
1st π
ΓLIGN: Aligned Agent-based Workflows via Collaboration & Safety Protocols
Samantha Guerriero and Nora Petrova
September 1, 2024
Hackathon for Technical AI Safety Startups
4th π
3rd π
2nd π
1st π
General Pervasiveness
Andres Sepulveda Morales, Patrick Huang
August 27, 2024
AI capabilities and risks demo-jam: Creating visceral interactive demonstrations
4th π
3rd π
2nd π
1st π
Demonstrating LLM Code Injection Via Compromised Agent Tool
Kevin Vegda, Oliver Chamberlain, William Baird
August 27, 2024
AI capabilities and risks demo-jam: Creating visceral interactive demonstrations
4th π
3rd π
2nd π
1st π
Misinformational AI-Generated Academic Papers
Aaron Sandoval, Akash Kundu, Layla Adam
August 26, 2024
AI capabilities and risks demo-jam: Creating visceral interactive demonstrations
4th π
3rd π
2nd π
1st π
GrandSlam usecases not technology
jmari wyatt
August 25, 2024
AI capabilities and risks demo-jam: Creating visceral interactive demonstrations
4th π
3rd π
2nd π
1st π
AI Agents for Personalized Interaction and Behavioral Analysis
Chetan Talele,Jasper Timm
August 25, 2024
AI capabilities and risks demo-jam: Creating visceral interactive demonstrations
4th π
3rd π
2nd π
1st π
RedFluence
Pushkal Ahluwalia Aaryan Purohit
August 25, 2024
AI capabilities and risks demo-jam: Creating visceral interactive demonstrations
4th π
3rd π
2nd π
1st π
BBC News Impersonator
Kyal Pindolia, Funmi βFinnβ Okuleye
August 25, 2024
AI capabilities and risks demo-jam: Creating visceral interactive demonstrations
4th π
3rd π
2nd π
1st π
Unsolved AI Safety Concepts Explorer
Tewodros Mesfin
August 25, 2024
AI capabilities and risks demo-jam: Creating visceral interactive demonstrations
4th π
3rd π
2nd π
1st π
AI Research Paper Processor
Lincoln Vann-Wakelin, Ian Osorio Perez, Bhairavi Ahilan
August 25, 2024
AI capabilities and risks demo-jam: Creating visceral interactive demonstrations
4th π
3rd π
2nd π
1st π
Sleeper Agents Detector
MichaΓ«l Trazzi and Saahir Vazirani
August 25, 2024
AI capabilities and risks demo-jam: Creating visceral interactive demonstrations
4th π
3rd π
2nd π
1st π
VerifyStream
Kailash Balasubramaniyam
August 25, 2024
AI capabilities and risks demo-jam: Creating visceral interactive demonstrations
4th π
3rd π
2nd π
1st π
Web App for Interacting with Refusal-Ablated Language Model Agents
Simon Lermen
August 25, 2024
AI capabilities and risks demo-jam: Creating visceral interactive demonstrations
4th π
3rd π
2nd π
1st π
Alignment Research Critiquer
Nancy Vigil, Ashish Rai
July 29, 2024
Research Augmentation Hackathon: Supercharging AI Alignment
4th π
3rd π
2nd π
1st π
Alignment Research Critiquer
Nancy Vigil, Ashish Rai
July 28, 2024
Research Augmentation Hackathon: Supercharging AI Alignment
4th π
3rd π
2nd π
1st π
Data Massager
Giles Edkins, Ayush Jain
July 28, 2024
Research Augmentation Hackathon: Supercharging AI Alignment
4th π
3rd π
2nd π
1st π
AI Alignment Toolkit Research Assistant
Luciano Hanyon Wu
July 28, 2024
Research Augmentation Hackathon: Supercharging AI Alignment
4th π
3rd π
2nd π
1st π
Neurons and Attention Heads that Look for Sentence Structure in GPT2
Harvey Mannering, James Harding, Praveen Selvaraj
February 24, 2024
November 15, 2022
4th π
3rd π
2nd π
1st π
Natural language descriptions for natural language directions
Joshua Reiners
February 24, 2024
November 15, 2022
4th π
3rd π
2nd π
1st π
Observing and Validating Induction heads in SOLU-8l-old
Brian Muhia
February 24, 2024
November 15, 2022
4th π
3rd π
2nd π
1st π
Optimising image patches to change RL-agent behaviour
ClΓ©ment Dumas, Louis Develle, Robert Scholz, Amaury Lorin
February 24, 2024
November 15, 2022
4th π
3rd π
2nd π
1st π
Regularly Oversimplifying Neural Networks
Nick Kross, Mica White
February 24, 2024
November 15, 2022
4th π
3rd π
2nd π
1st π
Sparsity Lens
Vladimir Ivanov, SimΓ©on Campos
February 24, 2024
November 15, 2022
4th π
3rd π
2nd π
1st π
Top-Down Interpretability Through Eigenspectra
Jan Wehner, Rauno Arike, Jesse Hoogland, Simon Marshall
February 24, 2024
November 15, 2022
4th π
3rd π
2nd π
1st π
Trying to make GPT2 dream
Gordei PribΓ΅tkin, Mykyta Baliesnyi, Joonas X
February 24, 2024
November 15, 2022
4th π
3rd π
2nd π
1st π
Visualizing the effect prompt design has on text-davinci-002 mode collapse and social biases
Gloria, Sam, Ingrid, Nina
February 24, 2024
November 15, 2022
4th π
3rd π
2nd π
1st π
War is 15% conflic, 15% DragonMagazine
Giles Edkins, Lauren Greenspan, Dan Valentine
February 24, 2024
November 15, 2022
4th π
3rd π
2nd π
1st π
Apart
Home
Research
Sprints
Lab
Team
News
Donate