TL;DR: A peer-reviewed research paper from researchers at the University of Maryland, National University of Singapore, and Ohio State University quantified something every job seeker now needs to understand. AI hiring tools systematically prefer resumes generated by the same LLM that is evaluating them — by 67-82% across major commercial and open-source models. In simulated hiring pipelines across 24 occupations, candidates whose resume model matched the employer's evaluator were 23-60% more likely to be shortlisted than equally qualified applicants submitting human-written resumes. The disadvantage is largest in business fields like sales, accounting, and finance — and you usually cannot find out which LLM the employer is using. OneResume.ai is the only resume platform engineered to hedge this bias by writing with all three frontier model families: OpenAI, Anthropic Claude, and xAI Grok.

Key Takeaways

AI evaluators show 67-82% self-preference bias for resumes generated by the same model, even when underlying content quality is held constant — meaning identical credentials get systematically different scores depending on which AI wrote the resume [1]
Same-model match produces a 23-60% shortlist advantage in simulated hiring pipelines across 24 occupations; the largest disadvantages fall on business-related fields like sales, accounting, and finance [1]
The major commercial frontier models tested (GPT-4o, GPT-4-turbo, DeepSeek-V3, Qwen-2.5-72B, LLaMA 3.3-70B) all exceed 65% self-preference bias against human-written resumes; GPT-4o exceeds 80% [1]
ATS vendors and employers rarely disclose which LLM they use to screen resumes, so candidates cannot practically reverse-engineer which single model to write with [2]
Multi-LLM resume engineering — writing with OpenAI, Claude, and Grok rather than betting on one model — is the only structural hedge against an evaluator-model you cannot identify

What Self-Preference Bias Is and Why It Matters

In a paper titled "AI Self-preferencing in Algorithmic Hiring: Empirical Evidence and Insights," researchers Jiannan Xu (University of Maryland), Gujie Li (National University of Singapore), and Jane Yi Jiang (Ohio State) documented a behavior in large language models that has direct, immediate consequences for anyone applying to a job in 2026.

Self-preference bias is the documented tendency of an LLM to favor content it generated itself over equivalent content produced by a human or by a different AI model — even when the underlying quality is held constant. It is not a bug. It is a property that emerges from how these models learn to recognize linguistic patterns that align with their own generative style [1].

In a hiring context, this matters because LLMs are increasingly used on both sides of the resume screening process. Candidates use ChatGPT, Claude, or Grok to draft and refine their resumes. Employers deploy similar AI tools to screen and rank those same resumes. When the candidate's writing model matches the employer's evaluator model, the resume gets favored. When it does not, the resume gets penalized — even if the candidate is equally qualified.

The researchers ran a controlled correspondence experiment with 2,245 human-written resumes from a professional resume-building platform sourced before generative AI was widely adopted. For each resume, they generated counterfactual versions using a range of state-of-the-art LLMs: GPT-4o, GPT-4o-mini, GPT-4-turbo, LLaMA 3.3-70B, Mistral-7B, Qwen-2.5-72B, and DeepSeek-V3. Then they had each LLM evaluate every version. The measured bias was consistent and substantial.

The Numbers: Just How Big Is the Same-Model Advantage?

The research distinguishes two forms of bias. LLM-vs-Human is when a model prefers its own output over a human-written equivalent. LLM-vs-LLM is when a model prefers its own output over a different LLM's version.

The LLM-vs-Human bias is severe across the board:

GPT-4o: more than 80% self-preference bias against human-written resumes
GPT-4-turbo, DeepSeek-V3, Qwen-2.5-72B, LLaMA 3.3-70B: all exceed 65% self-preference bias even after controlling for content quality
Average range across models tested: 67-82%

The LLM-vs-LLM bias is more variable but still significant. DeepSeek-V3 prefers its own outputs by 69% over LLaMA 3.3-70B versions and by 28% over GPT-4o versions. Some models (notably GPT-4o and LLaMA 3.3-70B in this dataset) do not show consistent bias when evaluating other models' outputs, but as evaluators they still show strong bias for their own.

When the researchers translated these biases into simulated hiring pipelines across 24 occupations, the operational impact was unmistakable: candidates using the same LLM as the evaluator were 23-60% more likely to be shortlisted than equally qualified candidates submitting human-written resumes. In practical terms, that is the difference between getting an interview and never hearing back — for the same person, with the same experience, applying to the same job.

The disadvantage was most severe in business-related fields: sales, accounting, and finance showed the largest gaps. These are exactly the fields where LLM-based screening has been adopted most aggressively, and where the volume of applicants per role is highest.

The Real Problem: You Do Not Know Which LLM Your Employer Is Using

Here is where the research gets uncomfortable for job seekers who are betting their applications on a single AI tool.

ATS vendors generally do not disclose which evaluator LLM powers their screening modules. Some use OpenAI's models. Some use Anthropic's. Some use open-source models like LLaMA, DeepSeek, or Qwen run on their own infrastructure. Many use a combination, depending on the job category and the customer. Employers themselves often do not know which model is doing what at each stage of their hiring funnel — they buy a screening product and trust the vendor [2].

So if you wrote your resume entirely in ChatGPT, you have effectively bet your application on the employer using OpenAI as their evaluator. If they use Anthropic Claude, you may be in the 28-69% LLM-vs-LLM disadvantage zone. If they use a different model entirely, the picture is even more uncertain. And if they ensemble multiple models (a growing pattern that the researchers actually identify as a mitigation strategy), the answer depends on which models are in the ensemble and how they are weighted.

The single-model resume strategy that worked acceptably in 2024 — when ATS systems were primarily keyword scanners — has become a structural gamble in 2026, when those same systems use LLMs that have measurable preferences about which other LLMs write the content they read. Without knowing which model is on the other side, betting on one model is betting blind.

Why Industries Hit Hardest Are Where You Need the Edge Most

The research identified a clear pattern: self-preference bias has the largest operational impact on business-related occupations. Sales, accounting, and finance showed the most pronounced shortlist advantages for same-model candidates. Less screening-intensive fields like agriculture, arts, and automotive showed smaller gaps.

This pattern is not random. The fields most exposed to AI hiring bias are exactly the fields where:

Candidate volume is highest, so LLM screening is most aggressively deployed
Job descriptions are most standardized, making LLM-based evaluation feel reliable to hiring managers
Resume language tends toward measurable, quantitative achievements that LLMs are most calibrated to score
Internal tooling budgets support multi-model AI procurement

If you are applying for sales, finance, accounting, consulting, marketing, operations, or any white-collar role at a company with more than a few hundred employees, the probability that an LLM will read your resume before any human does is now north of 60% (Gartner, 2025) [3]. The probability that it will be one specific model is unknown to you. The cost of mismatch is concretely measurable: 23-60% fewer interviews.

How OneResume.ai Hedges With OpenAI, Claude, and Grok

OneResume.ai is built around the structural insight that this research validates: in a world where you cannot reliably predict which LLM evaluates your resume, the only sound strategy is to write with all of them. We use the three frontier LLM families — OpenAI (GPT-4 family), Anthropic (Claude family), and xAI (Grok family) — across the resume engineering pipeline so that whichever evaluator the employer uses, your resume reflects the linguistic patterns and stylistic features that family is most likely to reward.

The way it works practically:

Drafting layer. When OneResume.ai generates a tailored resume from your master profile, it does not pick a single LLM and run with the output. It evaluates how each of the three model families would phrase, structure, and emphasize the experience for the specific job description you are targeting. The resulting draft synthesizes the strengths of each family — using the precision of GPT-4-class models for technical phrasing, the balanced reasoning of Claude for narrative coherence, and Grok's pattern recognition for matching contemporary job-description language.

Evaluation layer. Before you export your resume, OneResume.ai runs it through a multi-model evaluation. Each frontier family scores the document against the target posting. If any model gives the resume a low score, the system surfaces the specific phrasing or structural choices that triggered the gap — and offers an alternate version that scores well across all three families. You see and approve every change.

Export layer. When you download or apply, you can choose to export the version that scored highest on the cross-family blend (recommended for unknown-evaluator situations), or — if you happen to know the company uses a specific provider — a version optimized for that family.

This is structurally different from any single-LLM resume tool, including the obvious mainstream products built around just one provider. ChatGPT-only resume builders bet on OpenAI being the evaluator. Claude-only tools bet on Anthropic. The research now shows — quantitatively, across 24 occupations and seven major models — that those bets cost job seekers between 23% and 60% of their shortlists when the bet is wrong.

Why "Just Use AI to Write It" Is No Longer Enough Advice

For the past two years, the conventional career advice has been: use AI to write or refine your resume. That advice is not wrong — AI-written resumes outperform human-written ones in most algorithmic screening contexts. But this research shows that the advice is incomplete in a way that costs real interviews.

The new question is not whether to use AI. It is which AI, or how many. A candidate who used ChatGPT exclusively for their resume and is applying to a company whose ATS uses an Anthropic-based evaluator is in a worse position than they would have been writing the resume themselves and uploading it raw — because they have introduced a stylistic signature that the evaluator model recognizes as "not mine" and downweights accordingly.

The researchers tested two mitigation strategies in their paper. The first — system prompting evaluator models to ignore the origin of resumes — reduced bias by 17-63%. The second — using a majority voting ensemble that combines the evaluator with smaller models exhibiting weaker self-recognition — also produced substantial reductions. Both strategies depend on the employer implementing them. Job seekers cannot rely on every employer adopting these mitigations.

What job seekers can do is the candidate-side analog: write resumes that perform well across all major frontier models, so the employer's choice of evaluator becomes irrelevant. That is what multi-LLM resume engineering is. That is what OneResume.ai does.

Why This Matters Right Now

LLM-based resume screening is not a future trend. It is current operational reality. As of Q1 2026, 79% of enterprises have adopted some form of AI agent in their hiring funnel, and Gartner projects that 40% of enterprise applications will embed task-specific AI agents by the end of 2026 [3]. Major ATS vendors including iCIMS, Greenhouse, Workday, and Lever have all introduced LLM-based screening modules in the past 18 months.

The candidates being interviewed in 2026 are, by selection effect, the ones whose resumes survived the AI evaluator stage. The candidates not being interviewed include both genuinely under-qualified applicants and equally qualified ones whose resume model did not match the employer's evaluator model. The latter category is large — 23 to 60% of equivalent applicants in this study — and entirely a function of which AI tool the candidate happened to use to draft their resume.

Job seekers who learn this pattern and adapt — using multi-LLM platforms, hedging their model exposure, treating the choice of writing tool as a strategic decision rather than a convenience — will compound their interview advantage over the next two to three years. Those who continue using whichever single LLM happens to be most familiar will compound their disadvantage.

The research is published. The numbers are documented. The strategic implication is direct: do not bet your career on one AI's writing style.

FAQ

Q: What is AI self-preference bias in hiring? A: AI self-preference bias is the documented tendency of large language models to favor content generated by themselves over equivalent content from humans or other AI models, even when content quality is held constant. In algorithmic resume screening, this means an AI evaluator gives systematically higher scores to resumes written by the same model doing the evaluation.

Q: How big is the bias against human-written or different-model resumes? A: Research published in early 2026 measured self-preference bias of 67-82% across major LLMs, with GPT-4o exceeding 80%. Candidates whose resumes were written by the same model as the employer's evaluator were 23-60% more likely to be shortlisted than equally qualified applicants submitting human-written resumes.

Q: How am I supposed to know which LLM an employer is using? A: You usually cannot. ATS vendors do not disclose which evaluator models they use, and many companies use multiple tools across the hiring funnel. Multi-LLM optimization is the only way to hedge against not knowing.

Q: Are some industries hit harder by this bias than others? A: Yes. The disadvantage is most severe in business-related fields including sales, accounting, and finance. It is less pronounced in agriculture, arts, and automotive — fields with lower LLM-screening adoption.

Q: How does OneResume.ai address this problem? A: OneResume.ai uses all three frontier LLM families — OpenAI, Anthropic Claude, and xAI Grok — to engineer your resume rather than betting your application on a single model. Whichever evaluator the employer uses, your resume reflects the linguistic patterns and stylistic features that family rewards.

Sources

Xu, J., Li, G., Jiang, J. Y., "AI Self-preferencing in Algorithmic Hiring: Empirical Evidence and Insights," Manuscript submitted to Manufacturing & Service Operations Management, February 2026 — https://arxiv.org/abs/2509.00462
iCIMS, Greenhouse, Workday, Lever ATS LLM Screening Module Documentation, 2026 — https://www.icims.com / https://www.greenhouse.io / https://www.workday.com / https://www.lever.co
Gartner CHRO Workforce Priorities Survey 2026 — https://www.gartner.com
Panickssery et al., "LLM Self-Preference in Evaluation Tasks," 2024 — https://arxiv.org
ResumeBuilder.com Industry Survey on AI in Hiring, 2024 — https://www.resumebuilder.com

AI Self-Preference Bias: Why One LLM Cannot Write Your Resume