Day in the life: data janitor work
- Computational biologist Chirag Gupta shared that typical days are about 80% 'data janitor' tasks and 20% actual science. - He argued AI agents can automate routine cleaning and plotting, potentially flipping that ratio toward more analysis. - The post frames AI as reshaping daily workflows for computational biologists, with automation handling mundane data prep (x.com).
Computational biologists often spend more time cleaning spreadsheets and fixing file formats than testing biological ideas, and Chirag Gupta says artificial intelligence could shift that balance. (cngupta.github.io, pipette.bio) Computational biology turns experiments into data problems: researchers take large genomics or proteomics files, standardize them, run software pipelines, and turn the outputs into charts and tables. The National Center for Biotechnology Information describes the field as using analytical and applied methods to extract meaning from biological information. (ncbi.nlm.nih.gov, pmc.ncbi.nlm.nih.gov) That workflow is usually a chain of many small steps, not one clean command. Reviews of bioinformatics pipelines say analyses depend on multiple software packages, file conversions, and intermediate outputs, which makes routine preparation work hard to avoid and hard to reproduce. (pmc.ncbi.nlm.nih.gov, pmc.ncbi.nlm.nih.gov) Gupta, a Wisconsin-based computational biologist who now builds an artificial intelligence genomics platform at Pipette.bio, wrote in a recent X post that a typical day can be roughly 80% “data janitor” work and 20% science. His public biography lists more than 10 years in genomics analysis and current work on machine-learning models for biomarker discovery. (cngupta.github.io, pipette.bio, scholar.google.com) By “data janitor,” Gupta was referring to the chores around analysis: cleaning messy inputs, checking formats, and making standard plots before interpretation starts. Google’s experimental Data Science Agent markets the same tasks — data cleaning, exploration, plotting, question answering, and predictive modeling — as work an agent can draft into a Colab notebook. (labs.google.com, pmc.ncbi.nlm.nih.gov) Biology-focused agent systems are now being built around that idea. A 2026 Nature Biotechnology paper on BioMedAgent said the framework learns to use bioinformatics tools and chain them into executable workflows, while a 2025 Scientific Reports paper on BioAgents reported human-comparable performance on some conceptual genomics tasks. (nature.com, nature.com) Specialized tools are also moving into narrower parts of the job. A 2026 bioRxiv paper describing PlotGDP presented an agent-based server for bioinformatics plotting, aimed at turning uploaded data and plain-language prompts into publication-style figures without local setup. (biorxiv.org) The catch is that automation does not remove the need for careful checking. Reproducibility guides in bioinformatics stress that transparent reporting, documented environments, and validated workflows are still required because large datasets and multistep software chains can fail in quiet ways. (pmc.ncbi.nlm.nih.gov, pmc.ncbi.nlm.nih.gov) That leaves Gupta’s 80/20 split less as a measurement of one lab than as a description of a common bottleneck in data-heavy biology. If agents reliably handle the janitorial work, the part of the day reserved for actual analysis gets bigger. (cngupta.github.io, nature.com)