Methodology · Whitepaper v1.0
The Irreplaceable Score Methodology
A 4-dimension framework for career AI readiness. Anchored in primary research from Anthropic, WEF, Goldman Sachs, McKinsey, and Stanford HAI.
Version 1.0 · April 2026 · 30+ peer-reviewed + industry sources
Executive Summary
The Irreplaceable Score is a 0–100 composite index that quantifies how well-positioned an individual is to thrive as AI reshapes knowledge work. It is computed from four weighted dimensions, each scored 0–100 against a transparent rubric anchored in primary research from Anthropic, the World Economic Forum, Goldman Sachs, McKinsey, Stanford HAI, and peer-reviewed labor economics (Eloundou et al., 2023).
The four dimensions are: (1) Personal AI Fluency — the respondent's actual use of AI; (2) Company AI Maturity — their employer's organizational readiness; (3) Industry AI Disruption — sector-level exposure based on observed task-level data; and (4) Role Amplification — how much the respondent's specific job can be leveraged (vs. displaced) by AI.
Each dimension is decomposed into 4–5 sub-criteria with explicit weights, five level labels (Spectator → Native, or equivalent), and example behaviors. Composite scoring produces a score, a ±confidence interval, and one of four readiness categories: Behind (<30), At Risk (30–50), Emerging (50–75), Leveraged (75+). The rubric is designed for repeatable LLM scoring, psychometric defensibility, and enterprise reporting.
The 4 Dimensions
Each dimension is scored 0–100 against a transparent rubric with sub-criteria, five level labels, and example behaviors. The four dimensions combine into a weighted composite described in Section 3.
Dimension 1 · Weight 30%
Personal AI Fluency
Personal AI Fluency measures the respondent's actual hands-on proficiency with AI tools in their own work. It is not self-reported confidence — it is inferred from behavior: what tools they use, how often, what they produce with them, and whether they collaborate, direct, or build. It is the only dimension fully under the individual's control, and therefore the lever the assessment emphasizes most.
Sub-criteria and weights
| Sub-criterion | What it measures | Assessment input | Weight |
|---|---|---|---|
| 1a. Tenure & Frequency | Length and cadence of AI use | “How long have you used AI tools?” + “How often?” | 15% |
| 1b. Tool Diversity | Breadth of stack (chat, coding, image, agents, automation) | Tool checklist + verticals used | 20% |
| 1c. Collaboration Mode | Where they sit on Anthropic's 5-mode spectrum | Task-description prompts → LLM classifies | 25% |
| 1d. Builder Index | Do they create reusable AI artifacts (prompts, GPTs, agents, scripts)? | “Have you built…” behavior checklist | 20% |
| 1e. Workflow Integration | Is AI embedded in daily work or sporadic? | % of weekly work hours + integration scenarios | 20% |
Levels
| Score | Level | Description | Example behaviors |
|---|---|---|---|
| 0–30 | Spectator | Passive awareness; little to no hands-on use. | Tried ChatGPT once or twice; trivia/curiosity only; no work tasks; doesn't know what “prompt engineering” means. |
| 30–55 | Dabbler | Occasional use, mostly single-shot prompts for content. | Uses ChatGPT/Claude 1–2×/week for emails or drafts; copies outputs without editing; one tool only; treats AI as autocomplete. |
| 55–75 | Operator | Daily user with multiple tools and iterative workflows. | 2–3 AI tools (chat + coding/image); multi-turn conversations; validation/iteration loops; integrates AI into ≥3 work tasks. |
| 75–90 | Builder | Composes AI into repeatable systems. | Writes reusable prompts/skills; chains tools; uses coding agents; has built an internal GPT or automation; mentors colleagues. |
| 90–100 | Native | AI is the default operating layer for knowledge work. | Ships agentic workflows in production; evaluates models against benchmarks; spends >50% of work hours with AI in the loop. |
Research anchoring
- Anthropic Economic Index — 5-mode collaboration taxonomy (Directive / Feedback Loop / Task Iteration / Learning / Validation)
- Anthropic Economic Index (March 2026) — tenure / learning-curve findings on compounding sophistication
- Workera AI Skills Framework — 4-domain proficiency ladder
- Stanford HAI AI Index 2025 — Functional / Critical / Ethical literacy pillars
- WEF Future of Jobs 2025 — AI & big data literacy as a top-7 rising skill
Dimension 2 · Weight 20%
Company AI Maturity
Company AI Maturity measures how far the respondent's employer has moved along the AI-adoption S-curve. A highly fluent individual trapped in an “Experimenter” company will under-realize their leverage; a moderately fluent individual in a “Leader” company will be pulled forward by organizational momentum. This dimension captures that environmental lift (or drag).
Sub-criteria and weights
| Sub-criterion | What it measures | Assessment input | Weight |
|---|---|---|---|
| 2a. Deployment Breadth | How many teams use AI, not just IT | “Which teams at your company use AI?” | 25% |
| 2b. Tooling & Access | Do employees have enterprise AI tools and models? | “Does your company pay for Copilot/Claude/ChatGPT Enterprise?” | 20% |
| 2c. Workflow Redesign | Has AI changed how work gets done, not just layered on top? | Scenarios: meetings, reviews, hiring, coding, support | 25% |
| 2d. Leadership Signal | Is AI a CEO/exec priority with measurable targets? | “Has leadership set AI goals/KPIs?” | 15% |
| 2e. Governance & Safety | Policies, red-team, data handling — signals operational maturity | “Does your company have an AI policy? Approved-tools list?” | 15% |
Levels
| Score | Level | Description | Example behaviors |
|---|---|---|---|
| 0–30 | AI-Dark (Pre-Experimenter) | No official tools; AI use is shadow IT or banned. | “We're not allowed to use ChatGPT”; no license; no policy; leadership silent. |
| 30–55 | Experimenter | Pilots in 1–2 teams (usually engineering); no scale. | Engineering has Copilot; marketing sneaks ChatGPT; no enterprise contract; no KPIs. |
| 55–75 | Practitioner | Enterprise licenses, multi-team usage, early workflow changes. | Claude/Copilot deployed org-wide; policies in place; some redesigned workflows; CEO mentions AI in all-hands. |
| 75–90 | Scaler | AI is central to 2+ P&L lines with measured impact. | AI features shipped in product; customer-facing agents live; 20%+ productivity targets tied to AI; internal AI platform team. |
| 90–100 | Leader | AI-native — org structure and economics reshaped by AI. | AI-native org design; AI-first products; EBIT impact attributed; referenced in investor letters; hiring “AI-native” as a criterion. |
Research anchoring
- McKinsey — Superagency in the Workplace (Jan 2025): 3× leader/employee gap; ~1% of companies are “mature”
- McKinsey — The State of AI 2024/2025: 65% of orgs regularly use GenAI
- McKinsey 4-archetype model: Experimenters, Practitioners, Scalers, Leaders
- Gartner AI Maturity Index — 5-stage model
- Anthropic Economic Index (Sept 2025) — enterprise API usage is 77% automation-dominant
Dimension 3 · Weight 25%
Industry AI Disruption
Industry AI Disruption measures sector-level exposure: how much of the typical work done in the respondent's industry is at risk of automation or augmentation over the next 24–36 months. This is where high individual fluency can't fully compensate — if the whole sector reprices labor, earnings scarring is the base rate. Note: in the composite formula this dimension is flipped (100 − D3) so that a more-disrupted industry lowers the Irreplaceable Score.
Sub-criteria and weights
| Sub-criterion | What it measures | Assessment input | Weight |
|---|---|---|---|
| 3a. Sector Task Exposure | % of typical tasks in sector that AI can do today | Industry → lookup in the occupation/industry exposure table | 35% |
| 3b. Labor Market Signal | Hiring freezes, layoffs, reorgs citing AI | Recent news index + BLS projections | 25% |
| 3c. Pricing Power Shift | Is AI compressing margins / unit economics? | Sector pricing trend proxies | 15% |
| 3d. Incumbent vs. AI-Native Competition | Are AI-native challengers winning share? | Market-structure cue (e.g., Harvey vs. legacy law; Cursor vs. legacy IDE) | 15% |
| 3e. Regulatory Moat | Does regulation protect human labor (healthcare, legal licensure)? | Regulated-profession flag (inverse) | 10% |
Levels
| Score | Level | Description | Example behaviors |
|---|---|---|---|
| 0–30 | Insulated | Sector largely physical / regulated / localized. | Skilled trades, hands-on healthcare, social work; AI assists admin but not core work. |
| 30–55 | Partial Exposure | Some tasks automatable; core human judgment still load-bearing. | Management, sales, mid-level healthcare; AI compresses admin but not the relationship. |
| 55–75 | High Exposure | Majority of tasks AI-addressable; repricing pressure already visible. | Finance analysis, legal drafting, education content — hiring freezes and role re-scoping evident. |
| 75–90 | Disrupted | Sector in active restructuring; headcount flat or down despite revenue growth. | Customer support, copywriting, graphic design, first-line coding. |
| 90–100 | Existential | AI is the product; human labor per unit of output is collapsing fast. | Translation, basic content mills, SEO content, first-draft code, template design. |
Research anchoring
- Anthropic Economic Index — observed task-level usage by SOC code (gold standard for measured exposure)
- Eloundou, Manning, Mishkin, Rock (2023) — GPTs are GPTs: O*NET 19,265-task exposure rubric (arXiv:2303.10130)
- Goldman Sachs — generative AI could expose ~300M FTE jobs globally (2023, updated 2025)
- WEF Future of Jobs 2025 — 22% of jobs disrupted by 2030; industry-specific cuts
- BLS Occupational Outlook Handbook (2024–2034 projections)
Occupation / industry exposure reference (Dimension 3)
Derived from Anthropic's observed SOC-level usage (Claude conversations) and Eloundou et al. (2023) task-level exposure, cross-checked against Goldman Sachs industry cuts. Exposure is converted to a 0–100 Disruption Score (higher = more disruption / repricing risk).
| Occupation / Industry cluster | SOC major group | Anthropic observed use | Eloundou α>0.5 task share | Disruption Score |
|---|---|---|---|---|
| Software / IT services | 15-0000 | 37.2% | ~70% | 85–95 |
| Marketing, PR, content, creative writing | 27-0000 / 11-2000 | 10.3% | ~65% | 75–90 |
| Education, instruction, tutoring | 25-0000 | 12.4% | ~55% | 65–80 |
| Finance, banking, insurance analysis | 13-2000 | ~8% | ~60% | 70–85 |
| Legal, paralegal, compliance | 23-0000 | ~5% | ~63% | 70–85 |
| Customer support, call centers | 43-4000 | ~6% | ~55% | 75–90 |
| Management, operations | 11-0000 | 3–5% | ~40% | 45–60 |
| Sales | 41-0000 | ~3% | ~35% | 40–55 |
| Healthcare practitioners (dx/admin) | 29-0000 | <2% | ~30% | 35–55 |
| Healthcare support / aides | 31-0000 | <1% | ~10% | 15–30 |
| Skilled trades, installation, repair | 49-0000 | <1% | <10% | 10–25 |
| Transportation (drivers) | 53-0000 | <1% | ~15% | 20–35 |
| Personal care, cleaning, food prep | 35-0000 / 37-0000 | <0.5% | <8% | 5–20 |
| Construction, farming, fishing, forestry | 47-0000 / 45-0000 | 0.1–0.3% | <5% | 5–15 |
Dimension 4 · Weight 25%
Role Amplification
Where Industry Disruption asks “is the sector being repriced?”, Role Amplification asks “within this sector, does this specific role get amplified (leveraged) or compressed (automated away) by AI?” A junior lawyer doing doc review sits in the same industry as a senior litigator — but one role is being amplified by AI and the other is being absorbed by it. This dimension captures that split.
Sub-criteria and weights
| Sub-criterion | What it measures | Assessment input | Weight |
|---|---|---|---|
| 4a. Amplification Ratio | Productivity lift from AI on this role's core tasks (e.g., SWE +55%) | Role-to-lift lookup from research | 30% |
| 4b. Judgment Density | Share of role that's high-context decision-making vs. executable tasks | Task decomposition from job description | 25% |
| 4c. Human-Capital Leverage | Does seniority/network/trust compound in this role? | Seniority + relationship-facing flags | 20% |
| 4d. Creative/Strategic Mix | WEF rising skills (creative thinking, leadership, complex problem solving) as % of role | Scenario-based self-report | 15% |
| 4e. AI-Complement vs. AI-Substitute | Does AI make this person more hireable (complement) or less (substitute)? | Derived from 4a–4d | 10% |
Levels
| Score | Level | Description | Example behaviors |
|---|---|---|---|
| 0–30 | Compressed | Role is largely executable tasks AI already does well. | Junior copywriter, template designer, first-line support, basic data entry, doc review paralegal. |
| 30–55 | Shrinking | Role still needed but team sizes being cut as AI takes the repeatable core. | Mid-level analyst, junior coder, content marketer, entry-level recruiter screening. |
| 55–75 | Stable | Role changes substantially but headcount holds; humans curate AI output. | Experienced consultants, account managers, teachers, mid-career product managers. |
| 75–90 | Amplified | AI makes this person 2–5× more productive; demand rising. | Senior engineers using agents, principal designers, investigative journalists, senior sales, founders, senior clinicians with AI dx assist. |
| 90–100 | Leveraged | Role is an AI-leverage point — one person now does what a team used to. | Founder-engineers shipping multi-agent products; solo operators running AI-native businesses; chief-of-staff humans orchestrating agents. |
Research anchoring
- Anthropic Economic Index — automation vs. augmentation breakdown per SOC code
- WEF Future of Jobs 2025 — top growing roles (tech, care, education, green)
- McKinsey Superagency — productivity benchmarks by role (coding +55%, support +14%, writing +40%)
- Eloundou et al. 2023 — β-exposure: with LLM tooling, 47–56% of tasks accelerated
- BLS Occupational Outlook projections
The Composite Score
The four dimensions are not weighted equally. Weights reflect (a) the lever the individual controls, and (b) the prognostic power of each dimension in published research.
| Dimension | Weight | Rationale |
|---|---|---|
| D1 — Personal AI Fluency | 30% | Highest individual agency; the actionable lever. Strongest research link to short-term career outcomes (Anthropic tenure data, McKinsey productivity). |
| D2 — Company AI Maturity | 20% | Environmental lift; matters, but can be changed by switching jobs. Lower weight to avoid penalizing great individuals stuck in laggard firms. |
| D3 — Industry AI Disruption | 25% | Strong structural determinant of earnings trajectory per Goldman and Eloundou. Acts as a multiplier on the risk side (flipped in the formula). |
| D4 — Role Amplification | 25% | Complements D3 — same industry can have amplified and compressed roles. Anthropic's within-occupation variance justifies weighting it equally to industry. |
Formula
D3_flipped = 100 − D3
Irreplaceable Score = round(S_raw), clipped to [0, 100]
Industry Disruption (D3) is a risk score — a more-disrupted industry should lower the Irreplaceable Score unless personal fluency or role amplification compensate. Flipping it (100 − D3) makes the composite point in the correct direction.
Confidence interval
Each dimension is estimated from a small number of inputs with known variance. We treat the final score as a weighted sum of four independent estimates:
SE_total = sqrt( Σ (w_i · SE_i)² )
CI_95 = Irreplaceable Score ± 1.96 · SE_total
In practice this yields a ±5 to ±9 confidence band for most respondents. The band widens when dimension scores sit on level boundaries, consistency checks flag ambiguous inputs, or free-text responses are short / low-signal.
Peer benchmarking
Percentile is computed against a rolling cohort of prior respondents, segmented by (industry, role seniority, region). We report the percentile only when ≥50 peers are present in the triad; otherwise we fall back to industry-only, then global. Inspired by Item Response Theory scoring practice: rather than z-scoring raw totals, we percentile-rank within calibrated subgroups so comparisons stay fair across cohorts.
Readiness Categories
The Irreplaceable Score maps onto four readiness bands, each with a characteristic behavioral profile and a research-anchored 12-month outlook. Thresholds are soft — a score of 49 and 51 should be treated similarly; surfaces should show distance to the next band to encourage action.
| Band | Label | Population share | Behavioral profile | 12-month outlook (research-anchored) |
|---|---|---|---|---|
| <30 | Behind | ~25–30% of knowledge workers | Little or no hands-on AI use; employer is AI-dark; role is in a compressed/shrinking band in a disrupted sector. | Highest earnings-scarring risk per Goldman 2023 and WEF (11% of workers unlikely to get reskilling). Anthropic enterprise hiring data shows ~-14% relative job-finding rate for highly exposed roles without AI fluency. Priority: upskill immediately. |
| 30–50 | At Risk | ~30–35% | Some exposure, single-tool use; company is an Experimenter; role is partially exposed. | Likely to feel “AI anxiety” without measurable productivity gain. Risk of lateral churn as roles re-scope. Priority: tool diversification + workflow integration. |
| 50–75 | Emerging | ~25–30% | Daily operator across multiple tools; company is a Practitioner; role is stable or amplified. | Positioned to capture WEF “rising skill” premium. McKinsey reports 40–55% productivity gains for this cohort. Priority: move from Operator → Builder. |
| 75+ | Leveraged | ~10–15% | Builder or Native; company is a Scaler/Leader or the respondent is a founder; role is Amplified/Leveraged. | Compounding advantage. Anthropic directive-mode data shows this cohort ships 2–5× more output per week. Outcomes: promotion, equity upside, startup optionality. Priority: compound leverage — ship AI-native work publicly, mentor, recruit. |
Scoring Guardrails
The rubric is transparent, which means respondents can try to game it. We apply three classes of guardrails.
Gamability detection
- Fluency vs. output mismatch: D1 inputs claim “Native” use but free-text descriptions show generic AI vocabulary → cap D1 at Operator ceiling (75).
- Tool-list bloat: claiming 10+ tools used weekly without describing one concrete workflow → cap Tool Diversity (1b) at 60.
- Builder claims without evidence: “I build agents” with no artifact URL / repo / internal link → cap Builder Index (1d) at 60.
- Company-maturity inflation: Scaler/Leader claim but role inputs don't reflect AI integration → discount D2 by 15%.
- Keyword stuffing: repetition of buzzwords (“agentic, LLM, RAG, fine-tuned”) without concrete nouns → consistency flag + widen CI.
Consistency checks
- D1 internal: Tenure × frequency × tools should tell one story. A 3-month user claiming 10 tools and Builder maturity is inconsistent — widen CI and discount the outlier sub-criterion by 20%.
- D1 × D2: Builder/Native individual at an AI-Dark company → flagged as “misaligned environment” (often a future job-switcher signal; noted, not penalized).
- D3 × D4: A role Leveraged in an Existential sector is rare (solo AI-native operators). Requires concrete evidence in free text; otherwise D4 is capped at Amplified (90).
Floor & ceiling logic
- Floor of 20: every completed assessment scores ≥20. A raw 0 doesn't reflect reality — even AI-dark workers have some career optionality.
- Ceiling of 98 (v1): 99–100 is reserved for validated AI-native operators (shipped AI products, public AI work with traction). This keeps the top band credible and prevents self-scored perfection.
- Hard floor: D3 ≥ 75 and D4 ≤ 30 → Irreplaceable Score is capped at 45 regardless of D1. Industry-role fit dominates when the role is being actively eliminated.
- Hard floor lift: D1 ≥ 85 → minimum Irreplaceable Score of 50. A Native/Builder individual can always move, even from a bad sector/role.
Sources & Citations
Primary Research (Anthropic, WEF, Goldman, Eloundou, BLS, Stanford HAI)
- Anthropic (Feb 2025) — Anthropic Economic Index: Mapping AI use across the economy
- Anthropic (Sept 2025) — Uneven geographic and enterprise AI adoption
- Anthropic Economic Index — landing page (updated Mar 2026)
- Handa et al. (2025) — Which Economic Tasks are Performed with AI? (arXiv:2503.04761)
- Anthropic Economic Index dataset (Hugging Face)
- WEF — The Future of Jobs Report 2025
- WEF press release — 78 Million New Job Opportunities by 2030
- Goldman Sachs (2023, updated 2024–25) — Generative AI could raise global GDP by 7%
- Eloundou, Manning, Mishkin, Rock (2023) — GPTs are GPTs (arXiv:2303.10130)
- US Bureau of Labor Statistics — Occupational Outlook Handbook (2024–2034)
- O*NET OnLine — occupational task taxonomy
- Maslej et al. (Apr 2025) — Stanford HAI 2025 AI Index Report
Industry Frameworks (McKinsey, Gartner, Workera, Coursera, a16z, Sequoia)
- McKinsey (Jan 2025) — Superagency in the Workplace
- McKinsey — The State of AI (2024, 2025)
- McKinsey Global Institute (2018) — Notes from the AI Frontier
- WEF Reskilling Revolution initiative
- Workera — AI Skills Framework
- Andrew Ng — Generative AI for Everyone (Coursera / DeepLearning.AI)
- Coursera — Google AI Essentials
- a16z — Top 100 GenAI Consumer Apps (2024/2025)
- Sequoia Capital — Generative AI's Act Two
Academic Foundations (IRT, SJT methodology, labor economics)
- Autor, Levy, Murnane (2003) — The Skill Content of Recent Technological Change (QJE)
- Lord, F. M. (1980) — Applications of Item Response Theory to Practical Testing Problems
- Embretson & Reise (2000) — Item Response Theory for Psychologists
- Motowidlo, Dunnette & Carter (1990) — Low-Fidelity Simulation (foundational SJT, J. Applied Psychology)
- Whetzel & McDaniel (2009) — Situational Judgment Tests: An Overview (HRM Review)
- Brynjolfsson, Li, Raymond (2023) — Generative AI at Work (NBER WP 31161)
- Bick, Blandin, Deming (2024) — The Rapid Adoption of Generative AI (NBER)
Document version: v1.0 · Last updated 2026-04-18 · Maintainer: Human in Residence