Apr 27, 2026
Sparse Autoencoder Interpretability of the METAGENE-1 Genomic Foundation Model
Mannat Vikramaditya Jain, Peyton Jackson, Bridget Liu, Ciaran Walsh, Astrid Teo
We apply the mechanistic interpretability method of sparse autoencoders (SAEs) to the genomic foundation model METAGENE-1.
One of the more rigorous executions that I've seen so far. The cross delivery validation is very robust. Would like to see the raw residual probe baseline, but understand that it was already addressed in the limitation. If there's one thing to add, perhaps, a dual-use section to address the high consequence pathogens. Not much else to comment here, I really like the paper and how it was executed.
The organism-specific detector result is the strongest finding. The depth analysis showing early layers maintain sparse organism detectors while later layers compress into distributed encoding is also a useful observation. The gap is that the paper doesn’t show the SAE decomposition adds value over simply probing the raw model representations. Demonstrating that SAE features match or beat raw probes while being more interpretable would close the argument. The paper could also be about half its current length without losing anything essential.
Strong submission. Applying SAE interpretability to a metagenomic foundation model is a natural extension of recent work on protein language models, but hasn't been done at this scale or in this biosecurity context. The organism-specific detector finding is the standout result. Individual latents firing selectively on Norovirus or Human astrovirus sequences with 97-99% BLAST identity and zero false positives on non-pathogen sequences is a concrete, meaningful finding for anyone building auditable surveillance systems.
The main thing missing is a simple baseline. They train a classifier on SAE features and achieve an AUROC of 0.987. But they never train the same classifier directly on the raw model activations to compare. That experiment is a few lines of code given they already had the activations extracted. Without it, you can't tell whether the SAE decomposition adds anything beyond what the base model already encodes. The classification performance alone doesn't prove the SAE is doing useful work, it might just be reorganizing signal that was already there. That comparison is the experiment that would have made this substantially stronger and it was well within reach on a hackathon weekend.
The cross-delivery validation is well-executed, and the multi-layer analysis is a genuinely interesting addition. The finding that organism-level specificity concentrates in early layers and compresses into distributed pathogen encoding by layer 32 is the kind of mechanistic insight this line of work needs more of.
The broader direction here is worth pursuing. Metagenomic surveillance models that can explain which specific features drove a pathogen flag and map those features to known organisms are meaningfully different from black box classifiers. That's the gap between a research finding and something a public health official can actually act on.
Cite this work
@misc {
title={
(HckPrj) Sparse Autoencoder Interpretability of the METAGENE-1 Genomic Foundation Model
},
author={
Mannat Vikramaditya Jain, Peyton Jackson, Bridget Liu, Ciaran Walsh, Astrid Teo
},
date={
4/27/26
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


