This work was done during one weekend by research workshop participants and does not represent the work of Apart Research.
ApartSprints
Interpretability Hackathon
Accepted at the 
Interpretability Hackathon
 research sprint on 
November 15, 2022

Natural language descriptions for natural language directions

When a network embeds a token, it has to be able to unembed it again. Giving the same task to a human would be like saying: "I am thinking of a word or part of a word. You can now decide on 700 questions that I have to answer to determine that word. Also, spelling and pronunciation don't exist, you have to find it through its meaning." Then, there are a bunch of questions that it intuitively makes sense to ask, and we'd strongly expect to find a lot of those represented in the embedding. Also, any computation will cut into this space, so it'll share some space with the semantic meanings and give rise to new meanings like "this word is the third in the sentence". All of this is more a property of the information theory of a language than a property of some specific transformer, so there'll be overlap between transformers that look at the same language.

By 
Joshua Reiners
🏆 
4th place
3rd place
2nd place
1st place
 by peer review
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

This project is private