Scientists identify 1,700 dark proteins
- Researchers in the TransCODE consortium reported on May 6 that 1,785 previously overlooked protein-like molecules are produced from human non-canonical genome sequences. (nature.com) - The study examined 7,264 non-canonical open reading frames and found about 25% yielded detectable molecules, after analyzing 3.7 billion data points. (embl.org) - The consortium said the new molecules, called peptideins, will be added to GENCODE, UniProt and PeptideAtlas databases. (embl.org)
An international research consortium has identified 1,785 previously overlooked protein-like molecules made from parts of the human genome long treated as noncoding, according to a Nature paper published on May 6. The study, led through the TransCODE consortium, examined non-canonical open reading frames — stretches of sequence outside standard gene annotations — and concluded that many of them are translated in human cells. (nature.com) The researchers said the findings expand the known human proteome and introduce a new category of molecules they call “peptideins.” EMBL said the work involved collaborators from EMBL-EBI, the Princess Máxima Center for Pediatric Oncology, the University of Michigan, the Institute for Systems Biology and the Massachusetts Institute of Technology. (embl.org) ### Where did these newly identified molecules come from? The study focused on 7,264 non-canonical open reading frames, or ncORFs, which are genomic sequences outside the conventional protein-coding catalog. EMBL said the consortium found that around 25% of those ncORFs produced small protein-like molecules, yielding a final set of 1,785 candidates. ScienceAlert reported that these molecules come from genome regions “not usually thought to have this kind of biological machinery.” That framing reflects a broader shift in genomics, where DNA once dismissed as “junk” has increasingly been shown to contain regulatory elements and, in some cases, translated sequences. (nature.com) ### Why are researchers not just calling them ordinary proteins? The newly cataloged molecules are mostly very small and do not fit neatly into the standard reference set of human proteins, the consortium said. The authors introduced the term “peptidein” to describe this category and to distinguish it from conventional proteins already represented in major databases. (embl.org) Jonathan Mudge, co-first author and annotation project leader at EMBL-EBI, said the term was meant to bring molecules that had been “effectively invisible before” into reference annotation. EMBL said the consortium plans to add peptideins to GENCODE, UniProt and PeptideAtlas, where ncORFs and their products have largely been absent because of their small size. (sciencealert.com) ### How did the team decide these molecules were real? ScienceAlert said the researchers analyzed 3.7 billion data points from 95,520 experiments, a computational effort it described as taking about 20,000 hours. That analysis narrowed a broader candidate list to 1,785 detected microproteins or peptideins. (embl.org) Sebastiaan van Heesch of the Princess Máxima Center said the researchers used new techniques to formally define and make the molecules accessible to other scientists. The Nature paper’s summary described the work as a “large-scale proteomics analysis of the dark proteome,” indicating that the evidence came from protein-detection methods rather than sequence prediction alone. (embl.org) ### What do the authors say these molecules might be useful for? EMBL said many of the newly detected peptideins appear on immune cell surfaces, making them potential targets for cancer immunotherapy. Jonathan Mudge said cancer cells express high levels of some of these molecules, which could make them a source of biomarkers and therapeutic targets. (sciencealert.com) CRISPR screens were also used to test whether some peptideins are essential for cell survival, according to EMBL. That places the work beyond simple cataloging and into early functional testing, although the public summaries do not say that all 1,785 molecules have defined biological roles. (sciencealert.com) ### What changes next for researchers following this field? The TransCODE consortium said all data on the newly identified molecules have been made publicly available to support follow-on research. EMBL said the next step is incorporation into reference resources including GENCODE, UniProt and PeptideAtlas, which would make the sequences easier for other groups to study in disease, cell biology and proteomics work. (embl.org) Nature and related coverage published on May 6 and May 18 place the finding in a broader effort to map the “dark proteome,” a term used for protein products from overlooked genomic regions. The immediate milestone is database inclusion; the longer research path will depend on which peptideins can be tied to specific cellular functions or disease settings by named groups in subsequent studies. (embl.org) (nature.com)