Increasing Trust in Language Models through the Reuse of Verified Circuits

We define a trustworthy LM standard requiring task and circuit verification. A verified addition model is inserted into an untrained model, enabling addition and subtraction. Circuit reuse improves composite model verifiability and safety.

Paper

Code

Explainer

An interview with

Increasing Trust in Language Models through the Reuse of Verified Circuits

" was written by

during the Apart Lab Fellowship

Author contribution

No items found.

Citation

Send feedback

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Media kit

Quotes

No items found.

All figures

No items found.