Jul 28, 2025
Comments & Extensions of Subliminal Learning
Clark Miyamoto, Tets Nagashima, Alex Jiang
Here we perform empirical studies of subliminal learning. We find evidence that representations are important for subliminal learning to occur.
Andrew Mack
The authors focus on a nice, well-scoped project which is achievable within the constraints of a hackathon. They study a toy version of subliminal learning where a student model is distilled using *only* the "auxiliary logits" from a trained teacher model, possibly on a different dataset, where the auxiliary logits are designed to be completely meaningless. Nevertheless, they find that teacher accuracy can be transmitted to the student via the meaningless auxiliary logits.
It would have been nice to have a short discussion of how this toy version of subliminal learning relates to that of Cloud et al (I lean towards them being related but I'm not sure). In any case, the author's definition of "subliminal learning" is interesting enough to warrant study on its own merits.
They present some preliminary evidence that increasing width decreases their definition of subliminal learning - the most compelling evidence is from figure 4 where we see that increasing the number of auxiliary logits increases subliminal learning for student models of width <=2048, but barely affects subliminal learning for the width 4096 model. The authors draw a nice connection to NTK as a hypothesis which may explain this result.
Some of the results / presentation are pretty preliminary / rough around the edges (for example, the trend in figure 3 is not very clear; it would be nice to have some sort of p-value), but this is to be expected for a hackathon. The submission's strongest selling point is its focus on a minimal example of a "subliminal learning-type phenomenon" which still yields interesting results. This could be very useful for eventually building up to a tractable "physics of DL"-style theory of subliminal learning.
Ari Brill
The project presents a replication and extensions of recent experiments investigating subliminal learning. The experiments are well-motivated and rigorous, and appear to lay a solid foundation for further investigations of the origin of subliminal learning. The project could be improved by making the connection with physics more explicit. The possible connection between feature learning and subliminal learning that was briefly mentioned is interesting, and could be a fruitful topic to analyze further in future work.
Nikita Khomich
Excellent result, ive seen results like that for llms which were sort of vague and this is a very reasonable attempt to do explore the phenomenon for mlps and mnist dataset. Very good result especially within 3 days, good code as well, well-written.
Lauren
This is a timely project well scoped for a hackathon project, exploring subliminal learning in vision models as width, auxiliary logits, and noise are varied in a pretty clean way, representing a reasonably ‘physics-like’ way of doing controlled experiments. The results (representations should match, accuracy flattens as width approaches the NTK regime) were not surprising, but a good sanity check; follow up work could explore mechanisms for why this happens or safety relevant limits (e.g., the finite-width onset of NTK-like behavior) to subliminal learning or other failure modes.
The paper could benefit from a clearer abstract and more of an introduction/background about the safety relevance, and there could have been a stronger connection to physics (which, as it stands, was sort of implicit). The authors also seem surprised that their accuracy plateau didn’t match the paper, but as these use different metrics (MSE loss v. KL divergence), I wouldn’t expect them to capture the same information. In particular, KL maximizes information transfer, so the MSE is likely missing some small auxiliary logits that nevertheless carry subliminal signal.
Cite this work
@misc {
title={
(HckPrj) Comments & Extensions of Subliminal Learning
},
author={
Clark Miyamoto, Tets Nagashima, Alex Jiang
},
date={
7/28/25
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}