We're thrilled to announce our collaboration with Arc Institute, a nonprofit research organization pioneering long-context biological foundation models (the "Evo" series). Through our partnership, we've developed methods to understand their model with unprecedented precision, enabling the extraction of meaningful units of model computation (i.e., features). Preliminary experiments have shown promising directions for steering these features to guide DNA sequence generation, though this work is still in its early stages.
Today, Arc has announced their next-generation biological foundation model, Evo 2, featuring 7B and 40B parameter architectures capable of processing sequences up to 1M base pairs at nucleotide level resolution. Trained across all domains of life, it enables both prediction and generation across various biological complexity levels. Through our collaboration, Goodfire and Arc have made exciting progress in applying interpretability techniques to Evo 2, discovering numerous biologically relevant features in the models, ranging from semantic elements like exon-intron boundaries to higher level concepts such as protein secondary structure.
Biological foundation models represent a unique challenge and opportunity for AI interpretability. Unlike language models that process human-readable text, these neural networks operate on DNA sequences—a biological code that even human experts struggle to directly read and understand. Evo 2 works with an especially complex version of this challenge, processing multiple layers of biological information: from raw DNA sequences to the proteins they encode, and the intricate RNA structures they form. By applying state-of-the-art interpretability techniques (similar to those detailed in our Understanding and Steering Llama 3 paper), we hope to:
This interpretability breakthrough could deepen our understanding of biological systems while enabling new approaches to genome engineering. These advances open possibilities for developing better disease treatments and improving human health.
We provide a high level overview of the work we’ve done below. The Mechanistic Interpretability sections of the preprint contain more detailed information on our findings.
Training an Evo 2 interpreter model (sparse autoencoder or SAE)
In our collaboration with Arc, we trained BatchTopK sparse autoencoders (SAEs) (why BatchTopK?) on layer 26 (why layer 26?) of Evo 2, applying techniques we've developed while interpreting language models. Working closely with Arc Institute scientists, we used these tools to understand how Evo 2 processes genetic information internally.
We discovered a wide range of features corresponding to sophisticated biological concepts. We also validated the relevance of many of these features with a large-scale alignment analysis between canonical biological concepts and SAE features (quantified by measuring the domain-F1 score [Cite InterPLM] between features and concepts).
We have some early signs of life on steering Evo 2 to precisely engineer new protein structures, but steering this model is considerably more complex than steering a language model. Further research is needed to unlock the full potential of this approach. The potential impact of steering Evo 2 is particularly significant: while language models can be prompted to achieve desired behaviors, a model that 'only speaks nucleotide' cannot. Learning to steer through features would unlock entirely new capabilities.
Why did we switch to BatchTopK?