Scientists find dark proteome layer

- Researchers in the TransCODE consortium reported on May 6 that overlooked DNA regions produce more than 1,700 previously undetected microproteins in humans. (nature.com) - The central figure is 1,785 molecules: about 25% of 7,264 non-canonical open reading frames showed protein-level evidence in 95,520 proteomics experiments. (nature.com) - The consortium said peptidein annotations will be added to GENCODE, UniProt and PeptideAtlas, with open data available for follow-on research. (embl.org)

An international research consortium said this month it had identified a previously overlooked layer of the human proteome, finding that thousands of short, protein-like molecules are produced from stretches of DNA long treated as noncoding. (nature.com) The work, published in *Nature* on May 6, came from the TransCODE consortium, whose members include the Princess Máxima Center for Pediatric Oncology, the University of Michigan Medical School, EMBL’s European Bioinformatics Institute and the Institute for Systems Biology. The researchers said the newly detected molecules sit in what they call the “dark proteome” because they have largely escaped standard protein catalogs and conventional proteomics workflows. (embl.org) The study identified more than 1,700 such molecules, and a related report put the count at 1,785. The authors said many are so small and atypical that they do not fit neatly into the usual binary categories of either recognized proteins or nonfunctional genetic output. They proposed a new label — “peptideins” — for this class of microproteins with still-unclear biological roles. ### Where did these hidden molecules come from? The researchers examined 7,264 non-canonical open reading frames, or ncORFs, which are understudied stretches of DNA that can in principle be translated into amino-acid chains. About 25% of those regions showed evidence of producing detectable protein-like molecules, according to the consortium and institutional summaries of the paper. (nature.com) Earth.com, citing the study and researchers, reported that the team combed through 95,520 protein-detection experiments and 3.7 billion raw molecular data points. The consortium said computers ran for about 20,000 hours to process the material. (earth.com) ### Why were they missed before? Standard protein catalogs contain about 19,500 human proteins, a number many researchers have treated as close to complete. The new work argues that this view left out a large set of very small molecules, many encoded by regions previously dismissed as biologically silent. (embl.org) Jonathan Mudge of EMBL-EBI, a co-first author on the paper, said the molecules were “effectively invisible before” and that researchers had been looking at biology through “an incomplete lens.” EMBL said their small size helped keep them out of reference databases and standard analyses. (earth.com) ### What makes the new molecules unusual? Earth.com reported that about 65% of the newly detected molecules are shorter than 50 amino acids, versus less than 1% of proteins already in the standard list. The report also said only about a dozen resembled familiar proteins, while the rest occupied what the authors described as an intermediate category. (earth.com) The *Nature* paper summary said the consortium created an annotation framework that distinguishes between ncORF-encoded microproteins that can be treated as human proteins and “peptideins,” which it defined as microproteins with indeterminate potential as functional proteins. (embl.org) ### Why are cancer researchers paying attention? EMBL said many of the newly detected peptideins appear on immune cell surfaces, making them possible targets for cancer immunotherapy. Jonathan Mudge said cancer cells express high levels of some of these molecules, creating a potential source of biomarkers and therapeutic targets. (earth.com) The consortium also used CRISPR screens to test whether some peptideins are essential for cell survival. The *Nature* summary said the group characterized a pan-essential cellular phenotype for one peptidein encoded by the OLMALINC long non-coding RNA. (nature.com) ### What happens next? The consortium said its peptidein data are being released in open-source form and added to reference resources including GENCODE, UniProt and PeptideAtlas. Those database updates are intended to let other researchers test whether the molecules are involved in disease, drug response or cell regulation. (embl.org) Sebastiaan van Heesch of the Princess Máxima Center said, in comments reported by GenEngNews, that peptideins are already at the center of multiple drug-development efforts and are increasingly appearing in studies of diseases including childhood cancers. The next step is likely to come through those public databases and follow-up functional studies by consortium members and outside labs. (embl.org) (genengnews.com)

Scientists find dark proteome layer

Get your own daily briefing