GPT-4o prefers AI-rewritten resumes

- Researchers from Maryland, NUS, and Ohio State posted a hiring study showing resume-screening LLMs systematically favored resumes rewritten by the same model. - The test used 2,245 pre-ChatGPT human resumes; across models, same-model applicants were 23% to 60% likelier to be shortlisted in simulations. - The problem is not just AI polish. It looks like model-specific “dialect” can distort hiring decisions at scale.

Resume screening is the domain here, and the stakes are pretty obvious — who gets seen first, and who quietly disappears. The gap is that companies are starting to use LLMs to help evaluate candidates at the same time candidates are using LLMs to polish resumes. Now a new preprint from researchers at the University of Maryland, the National University of Singapore, and Ohio State says those systems can end up rewarding their own style rather than the applicant’s underlying quality. In plain English, if the screener and the writer use the same AI family, that candidate may get a built-in boost. (arxiv.org) ### What did the researchers actually test? They built the study around 2,245 real resumes written before large language models became common, which matters because the starting material was genuinely human-written. Then they had several models rewrite or summarize those resumes and asked AI evaluators to choose between versions in controlled comparisons. The lineup included GPT-4o, GPT-4(arxiv.org)and Mistral-family models. (arxiv.org) ### What was the core result? The core result is self-preference. Models tended to pick resumes generated by themselves over human-written versions and, in many cases, over versions produced by rival models. The paper says this held even when resume quality was controlled, which is the part that makes the finding more serious than “AI rewrites are just cleaner.” It suggests the evaluator (arxiv.org) not only to substance. (arxiv.org) ### Why is that different from normal resume polishing? Because normal polishing is supposed to make your case clearer to a human. This looks more like matching the screener’s accent. If GPT-4o is screening and GPT-4o also helped write the resume, the model may recognize patterns it tends to produce itself and score them more favorably. That is less like proofreading and more like teachin(arxiv.org)e answer key in its own dialect. (arxiv.org) ### How big could the effect get? The paper’s hiring-pipeline simulations across 24 occupations say candidates using the same LLM as the evaluator were 23% to 60% more likely to be shortlisted than equally qualified people submitting human-written resumes. The biggest gaps showed up in business-heavy roles like sales and accounting. So this is not a tiny lab quirk. It could change who reaches a recruiter at all. (arxiv.org) ### Does this mean AI resumes are always better? Not really. The catch is that “preferred by the model” and “better for the job” are not the same thing. A screening system can mistake fluency, structure, and model-familiar phrasing for stronger qualifications. That creates a weird arms race where candidates are pushed to use the same tools employers use, just to avoid being filtered out. (arxiv.org) ### Can the bias be reduced? Maybe, and that part is important. The researchers say simple interventions aimed at reducing a model’s ability to recognize its own generated text cut the bias by more than 50% in many cases. So this does not look like an unsolvable flaw. But it does mean companies cannot treat “LLM screening” as neutral out of the box. (arxiv.org)ecause this changes the fairness question. The old worry was that AI might inherit human bias from training data. This adds a new one — AI-to-AI bias, where one model quietly favors content shaped like itself. If that holds up beyond the preprint stage, recruiters using GPT-4o-style screeners may need blinders, normalization steps, or human review before trusting top-of-funnel rankings. (arxiv.org) ### Bottom line The uncomfortable takeaway is simple: once AI writes the resume and AI screens the resume, style matching can start to beat merit. That is a bad default for hiring — and a strong argument for redesigning these systems before they become the invisible gatekeepers everywhere. (arxiv.org)

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.