Methodology · Whitepaper v1.0

The Irreplaceable Score Methodology

A 4-dimension framework for career AI readiness. Anchored in primary research from Anthropic, WEF, Goldman Sachs, McKinsey, and Stanford HAI.

Version 1.0 · April 2026 · 30+ peer-reviewed + industry sources

Executive Summary

The Irreplaceable Score is a 0–100 composite index that quantifies how well-positioned an individual is to thrive as AI reshapes knowledge work. It is computed from four weighted dimensions, each scored 0–100 against a transparent rubric anchored in primary research from Anthropic, the World Economic Forum, Goldman Sachs, McKinsey, Stanford HAI, and peer-reviewed labor economics (Eloundou et al., 2023).

The four dimensions are: (1) Personal AI Fluency— the respondent's actual use of AI; (2) Company AI Maturity— their employer's organizational readiness; (3) Industry AI Disruption — sector-level exposure based on observed task-level data; and (4) Role Amplification— how much the respondent's specific job can be leveraged (vs. displaced) by AI.

Each dimension is decomposed into 4–5 sub-criteria with explicit weights, five level labels (Spectator → Native, or equivalent), and example behaviors. Composite scoring produces a score, a ±confidence interval, and one of four readiness categories: Behind (<30), At Risk (30–50), Emerging (50–75), Leveraged (75+). The rubric is designed for repeatable LLM scoring, psychometric defensibility, and enterprise reporting.

The 4 Dimensions

Each dimension is scored 0–100 against a transparent rubric with sub-criteria, five level labels, and example behaviors. The four dimensions combine into a weighted composite described in Section 3.

Dimension 1 · Weight 30%

Personal AI Fluency

Personal AI Fluency measures the respondent's actual hands-on proficiency with AI tools in their own work. It is not self-reported confidence — it is inferred from behavior: what tools they use, how often, what they produce with them, and whether they collaborate, direct, or build. It is the only dimension fully under the individual's control, and therefore the lever the assessment emphasizes most.

Sub-criteria and weights

Sub-criterion	What it measures	Assessment input	Weight
1a. Tenure & Frequency	Length and cadence of AI use	“How long have you used AI tools?” + “How often?”	15%
1b. Tool Diversity	Breadth of stack (chat, coding, image, agents, automation)	Tool checklist + verticals used	20%
1c. Collaboration Mode	Where they sit on Anthropic's 5-mode spectrum	Task-description prompts → LLM classifies	25%
1d. Builder Index	Do they create reusable AI artifacts (prompts, GPTs, agents, scripts)?	“Have you built…” behavior checklist	20%
1e. Workflow Integration	Is AI embedded in daily work or sporadic?	% of weekly work hours + integration scenarios	20%

Levels

Score	Level	Description	Example behaviors
0–30	Spectator	Passive awareness; little to no hands-on use.	Tried ChatGPT once or twice; trivia/curiosity only; no work tasks; doesn't know what “prompt engineering” means.
30–55	Dabbler	Occasional use, mostly single-shot prompts for content.	Uses ChatGPT/Claude 1–2×/week for emails or drafts; copies outputs without editing; one tool only; treats AI as autocomplete.
55–75	Operator	Daily user with multiple tools and iterative workflows.	2–3 AI tools (chat + coding/image); multi-turn conversations; validation/iteration loops; integrates AI into ≥3 work tasks.
75–90	Builder	Composes AI into repeatable systems.	Writes reusable prompts/skills; chains tools; uses coding agents; has built an internal GPT or automation; mentors colleagues.
90–100	Native	AI is the default operating layer for knowledge work.	Ships agentic workflows in production; evaluates models against benchmarks; spends >50% of work hours with AI in the loop.

Research anchoring

Anthropic Economic Index — 5-mode collaboration taxonomy (Directive / Feedback Loop / Task Iteration / Learning / Validation)
Anthropic Economic Index (March 2026) — tenure / learning-curve findings on compounding sophistication
Workera AI Skills Framework — 4-domain proficiency ladder
Stanford HAI AI Index 2025 — Functional / Critical / Ethical literacy pillars
WEF Future of Jobs 2025 — AI & big data literacy as a top-7 rising skill

Dimension 2 · Weight 20%

Company AI Maturity

Company AI Maturity measures how far the respondent's employer has moved along the AI-adoption S-curve. A highly fluent individual trapped in an “Experimenter” company will under-realize their leverage; a moderately fluent individual in a “Leader” company will be pulled forward by organizational momentum. This dimension captures that environmental lift (or drag).

Sub-criteria and weights

Sub-criterion	What it measures	Assessment input	Weight
2a. Deployment Breadth	How many teams use AI, not just IT	“Which teams at your company use AI?”	25%
2b. Tooling & Access	Do employees have enterprise AI tools and models?	“Does your company pay for Copilot/Claude/ChatGPT Enterprise?”	20%
2c. Workflow Redesign	Has AI changed how work gets done, not just layered on top?	Scenarios: meetings, reviews, hiring, coding, support	25%
2d. Leadership Signal	Is AI a CEO/exec priority with measurable targets?	“Has leadership set AI goals/KPIs?”	15%
2e. Governance & Safety	Policies, red-team, data handling — signals operational maturity	“Does your company have an AI policy? Approved-tools list?”	15%

Levels

Score	Level	Description	Example behaviors
0–30	AI-Dark (Pre-Experimenter)	No official tools; AI use is shadow IT or banned.	“We're not allowed to use ChatGPT”; no license; no policy; leadership silent.
30–55	Experimenter	Pilots in 1–2 teams (usually engineering); no scale.	Engineering has Copilot; marketing sneaks ChatGPT; no enterprise contract; no KPIs.
55–75	Practitioner	Enterprise licenses, multi-team usage, early workflow changes.	Claude/Copilot deployed org-wide; policies in place; some redesigned workflows; CEO mentions AI in all-hands.
75–90	Scaler	AI is central to 2+ P&L lines with measured impact.	AI features shipped in product; customer-facing agents live; 20%+ productivity targets tied to AI; internal AI platform team.
90–100	Leader	AI-native — org structure and economics reshaped by AI.	AI-native org design; AI-first products; EBIT impact attributed; referenced in investor letters; hiring “AI-native” as a criterion.

Research anchoring

McKinsey — Superagency in the Workplace (Jan 2025): 3× leader/employee gap; ~1% of companies are “mature”
McKinsey — The State of AI 2024/2025: 65% of orgs regularly use GenAI
McKinsey 4-archetype model: Experimenters, Practitioners, Scalers, Leaders
Gartner AI Maturity Index — 5-stage model
Anthropic Economic Index (Sept 2025) — enterprise API usage is 77% automation-dominant

Dimension 3 · Weight 25%

Industry AI Disruption

Industry AI Disruption measures sector-level exposure: how much of the typical work done in the respondent's industry is at risk of automation or augmentation over the next 24–36 months. This is where high individual fluency can't fully compensate — if the whole sector reprices labor, earnings scarring is the base rate. Note: in the composite formula this dimension is flipped (100 − D3) so that a more-disrupted industry lowers the Irreplaceable Score.

Sub-criteria and weights

Sub-criterion	What it measures	Assessment input	Weight
3a. Sector Task Exposure	% of typical tasks in sector that AI can do today	Industry → lookup in the occupation/industry exposure table	35%
3b. Labor Market Signal	Hiring freezes, layoffs, reorgs citing AI	Recent news index + BLS projections	25%
3c. Pricing Power Shift	Is AI compressing margins / unit economics?	Sector pricing trend proxies	15%
3d. Incumbent vs. AI-Native Competition	Are AI-native challengers winning share?	Market-structure cue (e.g., Harvey vs. legacy law; Cursor vs. legacy IDE)	15%
3e. Regulatory Moat	Does regulation protect human labor (healthcare, legal licensure)?	Regulated-profession flag (inverse)	10%

Levels

Score	Level	Description	Example behaviors
0–30	Insulated	Sector largely physical / regulated / localized.	Skilled trades, hands-on healthcare, social work; AI assists admin but not core work.
30–55	Partial Exposure	Some tasks automatable; core human judgment still load-bearing.	Management, sales, mid-level healthcare; AI compresses admin but not the relationship.
55–75	High Exposure	Majority of tasks AI-addressable; repricing pressure already visible.	Finance analysis, legal drafting, education content — hiring freezes and role re-scoping evident.
75–90	Disrupted	Sector in active restructuring; headcount flat or down despite revenue growth.	Customer support, copywriting, graphic design, first-line coding.
90–100	Existential	AI is the product; human labor per unit of output is collapsing fast.	Translation, basic content mills, SEO content, first-draft code, template design.

Research anchoring

Anthropic Economic Index — observed task-level usage by SOC code (gold standard for measured exposure)
Eloundou, Manning, Mishkin, Rock (2023) — GPTs are GPTs: O*NET 19,265-task exposure rubric (arXiv:2303.10130)
Goldman Sachs — generative AI could expose ~300M FTE jobs globally (2023, updated 2025)
WEF Future of Jobs 2025 — 22% of jobs disrupted by 2030; industry-specific cuts
BLS Occupational Outlook Handbook (2024–2034 projections)

Occupation / industry exposure reference (Dimension 3)

Derived from Anthropic's observed SOC-level usage (Claude conversations) and Eloundou et al. (2023) task-level exposure, cross-checked against Goldman Sachs industry cuts. Exposure is converted to a 0–100 Disruption Score (higher = more disruption / repricing risk).

Occupation / Industry cluster	SOC major group	Anthropic observed use	Eloundou α>0.5 task share	Disruption Score
Software / IT services	15-0000	37.2%	~70%	85–95
Marketing, PR, content, creative writing	27-0000 / 11-2000	10.3%	~65%	75–90
Education, instruction, tutoring	25-0000	12.4%	~55%	65–80
Finance, banking, insurance analysis	13-2000	~8%	~60%	70–85
Legal, paralegal, compliance	23-0000	~5%	~63%	70–85
Customer support, call centers	43-4000	~6%	~55%	75–90
Management, operations	11-0000	3–5%	~40%	45–60
Sales	41-0000	~3%	~35%	40–55
Healthcare practitioners (dx/admin)	29-0000	<2%	~30%	35–55
Healthcare support / aides	31-0000	<1%	~10%	15–30
Skilled trades, installation, repair	49-0000	<1%	<10%	10–25
Transportation (drivers)	53-0000	<1%	~15%	20–35
Personal care, cleaning, food prep	35-0000 / 37-0000	<0.5%	<8%	5–20
Construction, farming, fishing, forestry	47-0000 / 45-0000	0.1–0.3%	<5%	5–15

Dimension 4 · Weight 25%

Role Amplification

Where Industry Disruption asks “is the sector being repriced?”, Role Amplification asks “within this sector, does this specific role get amplified (leveraged) or compressed (automated away) by AI?” A junior lawyer doing doc review sits in the same industry as a senior litigator — but one role is being amplified by AI and the other is being absorbed by it. This dimension captures that split.

Sub-criteria and weights

Sub-criterion	What it measures	Assessment input	Weight
4a. Amplification Ratio	Productivity lift from AI on this role's core tasks (e.g., SWE +55%)	Role-to-lift lookup from research	30%
4b. Judgment Density	Share of role that's high-context decision-making vs. executable tasks	Task decomposition from job description	25%
4c. Human-Capital Leverage	Does seniority/network/trust compound in this role?	Seniority + relationship-facing flags	20%
4d. Creative/Strategic Mix	WEF rising skills (creative thinking, leadership, complex problem solving) as % of role	Scenario-based self-report	15%
4e. AI-Complement vs. AI-Substitute	Does AI make this person more hireable (complement) or less (substitute)?	Derived from 4a–4d	10%

Levels

Score	Level	Description	Example behaviors
0–30	Compressed	Role is largely executable tasks AI already does well.	Junior copywriter, template designer, first-line support, basic data entry, doc review paralegal.
30–55	Shrinking	Role still needed but team sizes being cut as AI takes the repeatable core.	Mid-level analyst, junior coder, content marketer, entry-level recruiter screening.
55–75	Stable	Role changes substantially but headcount holds; humans curate AI output.	Experienced consultants, account managers, teachers, mid-career product managers.
75–90	Amplified	AI makes this person 2–5× more productive; demand rising.	Senior engineers using agents, principal designers, investigative journalists, senior sales, founders, senior clinicians with AI dx assist.
90–100	Leveraged	Role is an AI-leverage point — one person now does what a team used to.	Founder-engineers shipping multi-agent products; solo operators running AI-native businesses; chief-of-staff humans orchestrating agents.

Research anchoring

Anthropic Economic Index — automation vs. augmentation breakdown per SOC code
WEF Future of Jobs 2025 — top growing roles (tech, care, education, green)
McKinsey Superagency — productivity benchmarks by role (coding +55%, support +14%, writing +40%)
Eloundou et al. 2023 — β-exposure: with LLM tooling, 47–56% of tasks accelerated
BLS Occupational Outlook projections

The Composite Score

The four dimensions are not weighted equally. Weights reflect (a) the lever the individual controls, and (b) the prognostic power of each dimension in published research.

Dimension	Weight	Rationale
D1 — Personal AI Fluency	30%	Highest individual agency; the actionable lever. Strongest research link to short-term career outcomes (Anthropic tenure data, McKinsey productivity).
D2 — Company AI Maturity	20%	Environmental lift; matters, but can be changed by switching jobs. Lower weight to avoid penalizing great individuals stuck in laggard firms.
D3 — Industry AI Disruption	25%	Strong structural determinant of earnings trajectory per Goldman and Eloundou. Acts as a multiplier on the risk side (flipped in the formula).
D4 — Role Amplification	25%	Complements D3 — same industry can have amplified and compressed roles. Anthropic's within-occupation variance justifies weighting it equally to industry.

Formula

S_raw = 0.30·D1 + 0.20·D2 + 0.25·D3_flipped + 0.25·D4
D3_flipped = 100 − D3
Irreplaceable Score = round(S_raw), clipped to [0, 100]

Industry Disruption (D3) is a risk score — a more-disrupted industry should lower the Irreplaceable Score unless personal fluency or role amplification compensate. Flipping it (100 − D3) makes the composite point in the correct direction.

Confidence interval

Each dimension is estimated from a small number of inputs with known variance. We treat the final score as a weighted sum of four independent estimates:

SE_i = (range of level band) / 4 # ≈ ±6 for a 25-point band
SE_total = sqrt( Σ (w_i · SE_i)² )
CI_95 = Irreplaceable Score ± 1.96 · SE_total

In practice this yields a ±5 to ±9 confidence band for most respondents. The band widens when dimension scores sit on level boundaries, consistency checks flag ambiguous inputs, or free-text responses are short / low-signal.

Peer benchmarking

Percentile is computed against a rolling cohort of prior respondents, segmented by (industry, role seniority, region). We report the percentile only when ≥50 peers are present in the triad; otherwise we fall back to industry-only, then global. Inspired by Item Response Theory scoring practice: rather than z-scoring raw totals, we percentile-rank within calibrated subgroups so comparisons stay fair across cohorts.

Readiness Categories

The Irreplaceable Score maps onto four readiness bands, each with a characteristic behavioral profile and a research-anchored 12-month outlook. Thresholds are soft — a score of 49 and 51 should be treated similarly; surfaces should show distance to the next band to encourage action.

Band	Label	Population share	Behavioral profile	12-month outlook (research-anchored)
<30	Behind	~25–30% of knowledge workers	Little or no hands-on AI use; employer is AI-dark; role is in a compressed/shrinking band in a disrupted sector.	Highest earnings-scarring risk per Goldman 2023 and WEF (11% of workers unlikely to get reskilling). Anthropic enterprise hiring data shows ~-14% relative job-finding rate for highly exposed roles without AI fluency. Priority: upskill immediately.
30–50	At Risk	~30–35%	Some exposure, single-tool use; company is an Experimenter; role is partially exposed.	Likely to feel “AI anxiety” without measurable productivity gain. Risk of lateral churn as roles re-scope. Priority: tool diversification + workflow integration.
50–75	Emerging	~25–30%	Daily operator across multiple tools; company is a Practitioner; role is stable or amplified.	Positioned to capture WEF “rising skill” premium. McKinsey reports 40–55% productivity gains for this cohort. Priority: move from Operator → Builder.
75+	Leveraged	~10–15%	Builder or Native; company is a Scaler/Leader or the respondent is a founder; role is Amplified/Leveraged.	Compounding advantage. Anthropic directive-mode data shows this cohort ships 2–5× more output per week. Outcomes: promotion, equity upside, startup optionality. Priority: compound leverage — ship AI-native work publicly, mentor, recruit.

Scoring Guardrails

The rubric is transparent, which means respondents can try to game it. We apply three classes of guardrails.

Gamability detection

Fluency vs. output mismatch: D1 inputs claim “Native” use but free-text descriptions show generic AI vocabulary → cap D1 at Operator ceiling (75).
Tool-list bloat: claiming 10+ tools used weekly without describing one concrete workflow → cap Tool Diversity (1b) at 60.
Builder claims without evidence: “I build agents” with no artifact URL / repo / internal link → cap Builder Index (1d) at 60.
Company-maturity inflation: Scaler/Leader claim but role inputs don't reflect AI integration → discount D2 by 15%.
Keyword stuffing: repetition of buzzwords (“agentic, LLM, RAG, fine-tuned”) without concrete nouns → consistency flag + widen CI.

Consistency checks

D1 internal: Tenure × frequency × tools should tell one story. A 3-month user claiming 10 tools and Builder maturity is inconsistent — widen CI and discount the outlier sub-criterion by 20%.
D1 × D2: Builder/Native individual at an AI-Dark company → flagged as “misaligned environment” (often a future job-switcher signal; noted, not penalized).
D3 × D4: A role Leveraged in an Existential sector is rare (solo AI-native operators). Requires concrete evidence in free text; otherwise D4 is capped at Amplified (90).

Floor & ceiling logic

Floor of 20: every completed assessment scores ≥20. A raw 0 doesn't reflect reality — even AI-dark workers have some career optionality.
Ceiling of 98 (v1): 99–100 is reserved for validated AI-native operators (shipped AI products, public AI work with traction). This keeps the top band credible and prevents self-scored perfection.
Hard floor: D3 ≥ 75 and D4 ≤ 30 → Irreplaceable Score is capped at 45 regardless of D1. Industry-role fit dominates when the role is being actively eliminated.
Hard floor lift: D1 ≥ 85 → minimum Irreplaceable Score of 50. A Native/Builder individual can always move, even from a bad sector/role.

Sources & Citations

Primary Research (Anthropic, WEF, Goldman, Eloundou, BLS, Stanford HAI)

Industry Frameworks (McKinsey, Gartner, Workera, Coursera, a16z, Sequoia)

Academic Foundations (IRT, SJT methodology, labor economics)

Document version: v1.0 · Last updated 2026-04-18 · Maintainer: Human in Residence

Ready to see where you stand?

Take the Assessment →