Scientists find 'dark proteome' in humans

- On May 6, the TransCODE consortium reported in Nature that it had identified more than 1,700 previously overlooked protein-like molecules in humans. - The study examined 7,264 non-canonical open reading frames and found roughly 25% produced tiny molecules researchers named “peptideins,” EMBL said. - Next, the consortium plans to add peptideins to GENCODE, UniProt and PeptideAtlas and continue experimental validation.

Scientists have identified more than 1,700 previously overlooked protein-like molecules in human cells, adding a new layer to the map of the human proteome. The findings were published in Nature on May 6 by the TransCODE consortium, a collaboration involving EMBL-EBI, the Princess Máxima Center for Pediatric Oncology, the University of Michigan, the Institute for Systems Biology and MIT. The work focuses on stretches of DNA long treated as noncoding or biologically marginal, and argues that some of them are translated into small molecules with potential roles in disease. Researchers said the results still require broader experimental follow-up, but they have already begun folding the new entries into public reference databases. ### What exactly did the scientists find? The Nature paper analyzed 7,264 non-canonical open reading frames, or ncORFs, and found that about 25% of them produced small protein-like molecules. That yielded 1,785 newly identified entries in what researchers described as a previously undercounted part of the human proteome. The researchers coined a new term, “peptideins,” for many of these molecules because they did not fit neatly into standard protein categories. (nature.com) Jonathan Mudge, a co-first author and annotation project leader at EMBL-EBI, said the molecules had been “effectively invisible before” because of their size and because the genomic regions that encode them were often excluded from standard annotations. ### Why are these molecules being called part of a “dark proteome”? EMBL said the newly mapped molecules come from parts of the genome historically thought to be nonfunctional or noncoding. In practical terms, that means the proteins were missed not because they were absent, but because the search rules and reference catalogs were built around larger, conventional proteins. (embl.org) Nature and other coverage of the work used “dark proteins” or “dark proteome” as shorthand for that hidden category. The label refers to molecules that are poorly annotated, hard to detect and largely absent from major databases, rather than to a separate biochemical class. ### Why does this matter for cancer and disease research? STAT reported on May 6 that the team identified some of the newly found molecules in processes including cell division and DNA repair, and found several that appear unique to cancer cells. (embl.org) EMBL said many peptideins were detected on immune cell surfaces, making them possible candidates for cancer immunotherapy targets or biomarkers. (nature.com) A number of peptideins are already under development as drug targets, according to EMBL. Mudge said cancer cells express high levels of some of these molecules, which could make them useful in efforts to distinguish diseased cells from healthy tissue. ### Did the study prove what all of these molecules do? The paper did not assign a full biological function to every newly cataloged molecule. (statnews.com) EMBL said the consortium used CRISPR gene-editing screens to test whether some peptideins were essential for cell survival, but the broader functional picture remains incomplete. (embl.org) STAT described the findings as the start of a larger effort to determine what the molecules do in gene regulation, immune responses and cancer biology. That means the headline result is the scale of the mapping effort, while the next phase is experimental validation molecule by molecule. ### What changes next for researchers? (embl.org) The consortium said it will add peptideins to reference resources including GENCODE, UniProt and PeptideAtlas. Making those entries public could change how other scientists analyze sequencing, proteomics and disease data, because future studies will be able to search for these molecules directly instead of filtering them out. (statnews.com) The next concrete step is database integration and follow-up validation by the TransCODE consortium and other labs. EMBL said the group plans to keep releasing peptidein data publicly as it becomes available, while researchers test which molecules are biologically active and which might be useful in cancer or other disease studies. (embl.org)

Scientists find 'dark proteome' in humans

Get your own daily briefing