Covariance pooling research
New research proposes using covariance pooling instead of standard mean pooling to preserve feature co‑occurrences and improve interpretability in AI models, with particular relevance for genomics and sequence analysis. The method aims to keep richer statistical structure from input features, which researchers argue can help downstream tasks that depend on feature interactions. (x.com)
Artificial intelligence models that read DNA usually squash thousands of positions into one average. A new paper argues they should keep a map of which features appear together instead. (goodfire.ai) The paper, published April 10 by Thomas Dooms, Nicholas K. Wang, and Michael T. Pearce at Goodfire, proposes “covariance pooling” for genomic foundation models including Evo 2, AlphaGenome, and Nucleotide Transformer v3. The method replaces mean pooling, the common step used to turn a sequence of token embeddings into one fixed-length vector. (goodfire.ai) In plain terms, mean pooling keeps the center of a cloud of features and throws away its shape. Covariance pooling keeps second-order structure — a record of which model features rise and fall together across a sequence. (goodfire.ai) That distinction shows up most clearly in genomics, where gene regulation depends on combinations of motifs spread across long stretches of DNA rather than on single positions alone. Foundation models in this field now read contexts as long as 1 million DNA letters and are being used for tasks including variant-effect prediction and functional track prediction. (nature.com 1) (nature.com 2) Goodfire said it computes covariance matrices from model activations and then learns a low-rank bottleneck to compress them into compact embeddings. The paper reports stronger results than mean pooling on Gene Ontology prediction and genomic track prediction using Nucleotide Transformer v3 representations. (goodfire.ai) The authors frame the approach as a new baseline for gene-level probing, which is the practice of training lightweight models on top of frozen foundation-model embeddings. They also say the same pooling idea can be used with supervision, without supervision, or with a mix of both. (goodfire.ai) The idea itself is not new to machine learning. Earlier computer-vision work showed that covariance pooling can beat average pooling by capturing richer statistics, though it also raised practical problems around estimating and normalizing large covariance matrices. (arxiv.org) The genomics angle is arriving as larger DNA models move from representation learning toward concrete biological tasks. Evo 2 was reported in Nature last month as a model trained on 9 trillion DNA base pairs, and AlphaGenome was reported in Nature in February as a 1-megabase DNA model for predicting functional genomic tracks at single-base resolution. (nature.com 1) (nature.com 2) A separate April 10 bioRxiv preprint from Goodfire and Mayo Clinic used Evo 2 embeddings for variant-effect prediction and said a probe on those embeddings reached 0.997 overall area under the receiver operating characteristic curve on 833,000 ClinVar variants. That paper describes training on the Gram matrix X⊤X rather than on a mean-pooled vector, pointing to the same bet on feature co-occurrence. (biorxiv.org) The immediate question is whether covariance pooling holds up outside Goodfire’s own evaluations and outside genomics. The broader claim is narrower than it sounds: when the interaction between features carries the signal, averaging may be the part that throws it away. (goodfire.ai)