This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
ApartSprints
Howard University AI Safety Summit & Policy Hackathon
67240c8fb0416a4520d2b4b6
Howard University AI Safety Summit & Policy Hackathon
November 21, 2024
Accepted at the 
67240c8fb0416a4520d2b4b6
 research sprint on 

A Fundamental Rethinking to AI Evaluations: Establishing a Constitution-Based Framework

While artificial intelligence (AI) presents transformative opportunities across various sectors, current safety evaluation approaches remain inadequate in preventing misuse and ensuring ethical alignment. This paper proposes a novel two-layer evaluation framework based on Constitutional AI principles. The first phase involves developing a comprehensive AI constitution through international collaboration, incorporating diverse stakeholder perspectives through an inclusive, collaborative, and interactive process. This constitution serves as the foundation for an LLM-based evaluation system that generates and validates safety assessment questions. The implementation of the 2-layer framework applies advanced mechanistic interpretability techniques specifically to frontier base models. The framework mandates safety evaluations on model platforms and introduces more rigorous testing for major AI companies' base models. Our analysis suggests this approach is technically feasible, cost-effective, and scalable while addressing current limitations in AI safety evaluation. The solution offers a practical path toward ensuring AI development remains aligned with human values while preventing the proliferation of potentially harmful systems.

By 
Arrow Paquera, and Paul Ivan Enclonar
🏆 
4th place
3rd place
2nd place
1st place
 by peer review
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

This project is private