Nov 24, 2024
Recovering Goodfire's SAE feature vectors from their API
Lovkush Agarwal
Summary
In this project, we carry out an early trial to see whether Goodfire’s SAE feature vectors can be recovered using the information available from their API.
The strategy tried is: pick a feature of interest, construct a contrastive dataset using Goodfire’s API, then use TransformerLens to get a steering vector for the contrastive dataset, by simply calculating the average difference in the activations in each pair.
Cite this work:
@misc {
title={
Recovering Goodfire's SAE feature vectors from their API
},
author={
Lovkush Agarwal
},
date={
11/24/24
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}