Researchers identify 1,700+ 'dark proteins'

- On May 18, ScienceAlert reported that researchers identified more than 1,700 previously uncharacterized “dark proteins” produced in human cells from overlooked DNA regions. - The study’s central number was 1,785 microproteins, detected from 7,264 non-canonical open reading frames after analysis of 95,520 experiments. - The Nature paper says the newly named “peptideins” will be added to GENCODE, UniProt and PeptideAtlas databases.

ScienceAlert reported on May 18 that researchers have identified more than 1,700 previously uncharacterized “dark proteins” in human cells, pointing to a larger human proteome than standard reference catalogs have recognized. The work comes from the TransCODE consortium and was published in Nature on May 6. The researchers said the molecules arise from overlooked stretches of DNA known as non-canonical open reading frames, or ncORFs. They coined a new term — “peptideins” — for many of the newly cataloged molecules because the group does not fit neatly into conventional protein definitions. ### Where were these proteins hiding? The newly reported molecules were traced to parts of the genome that were long treated as noncoding or biologically marginal. In the new analysis, the TransCODE team examined 7,264 ncORFs and found that about 25% produced small protein-like molecules. EMBL said those molecules were found in the “dark proteome,” a term used for gene products from previously overlooked sections of DNA. (sciencealert.com) Nature’s paper summary said the study used large-scale proteomics to show that many translated ncORFs encode microproteins and peptideins. ScienceAlert described this as a “previously hidden layer” of the human genome, adding that these molecules are often much smaller or more unusual than standard proteins. ### What did the researchers actually find? The headline figure was 1,785 microproteins. (embl.org) ScienceAlert said the team reached that number after analyzing 3.7 billion data points from 95,520 experiments, a process that took about 20,000 hours of computing time. The candidate list started with 7,264 ncORFs that earlier work had flagged as possible protein-producing sequences. (nature.com) Jonathan Mudge of EMBL-EBI, a co-first author on the paper, said the field had been “looking at biology through an incomplete lens.” Sebastiaan van Heesch of the Princess Máxima Center said the study showed that “thousands of overlooked genetic sequences contribute to the dark proteome” by producing a new class of protein-like molecules. (sciencealert.com) ### Why invent a new term like “peptideins”? The researchers said many of the newly found molecules do not resemble conventional proteins closely enough to be grouped with them without qualification. EMBL said the consortium introduced “peptidein” as a new category for these microproteins. ScienceAlert said the name reflects their ambiguity: they are protein-like, but many are so small or atypical that the team wanted a separate label. (embl.org) That naming step matters because reference databases have historically excluded many ncORFs and their products. EMBL said the consortium plans to add peptideins to GENCODE, UniProt and PeptideAtlas so other researchers can work from a shared annotation framework. ### Are these just curiosities, or could they matter in disease? Cancer was one of the clearest disease links cited by the researchers. (embl.org) EMBL said many newly detected peptideins appear on immune cell surfaces and could become targets for cancer immunotherapy. Mudge said cancer cells express high levels of some of these molecules, making them a potential source of biomarkers and therapeutic targets. The consortium also used CRISPR screens to test whether some peptideins are essential for cell survival, according to EMBL. ScienceAlert and related institutional summaries framed the work as a way to widen the search for disease mechanisms that may have been missed because standard protein catalogs were incomplete. ### What happens next? (embl.org) The next step is database integration and follow-up biology. EMBL said all data on the new proteins have been made publicly available, and the consortium plans to keep releasing peptidein data as it becomes available. The Nature paper and consortium statements identify GENCODE, UniProt and PeptideAtlas as the main databases that will incorporate the new category. (embl.org)

Researchers identify 1,700+ 'dark proteins'

Get your own daily briefing