Summary
As LLM agents become more prevalent and powerful, the ability to trace fine-tuned models back to their base models is increasingly important for issues of liability, IP protection, and detecting potential misuse. However, model attribution often must be done in a black-box context, as adversaries may restrict direct access to model internals. This problem remains a neglected but critical area of AI security research. To date, most approaches have relied on manual analysis rather than automated techniques, limiting their applicability. Our approach aims to address these limitations by leveraging the advanced reasoning capabilities of frontier LLMs to automate the model attribution process.
Cite this work:
@misc {
title={
An Autonomous Agent for Model Attribution
},
author={
Jord Nguyen
},
date={
10/6/24
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}