Methodology

The Stunt Double Index is produced by Stunt Double, an AI testing platform that dispatches real agents to carry out real user tasks. Every domain is measured with two evidence layers, and every number on a scorecard traces back to one of them.

Layer 1: Deterministic protocol probes

Cheap, repeatable HTTP measurements run on every score request. We fetch the homepage and key task paths (pricing, signup, cart, contact), robots.txt and sitemap, and the discovery endpoints agents actually use in 2026: MCP server cards (/.well-known/mcp/server-card.json), OAuth/OIDC discovery metadata (RFC 8414, RFC 9728), A2A agent cards, llms.txt, and Universal Commerce Protocol discovery. We also test markdown content negotiation, detect bot-challenge interstitials from headers rather than keywords, and fetch the homepage with each provider’s declared user-agent to measure reachability per vendor. Every check records pass/fail evidence you can see on the scorecard.

Layer 2: Live agent sessions

Real browser agents run the sector’s task suite against the domain (e.g. find a product and reach checkout, extract the refund policy, locate API docs). Sessions run on hosted browsers, are recorded step by step, and are scored by a rubric grader independent of the agent that ran the task. Session replays are available to verified owners.

How the two layers combine

Each category keeps its probe score and its session score separately. The headline category score blends them: 60% agent sessions, 40% probes, where both exist. Domains that have only been probed show probe evidence alone until their first session cohort completes. The overall score is the weighted average of the eight category scores using the weights below.

Agent providers

Session cohorts run weekly on the providers below. Each category is measured across every provider and the per-provider scores are averaged. Only providers with a live, configured harness are listed; everything shown here contributes real session results.

  • Claude
    Anthropic
  • ChatGPT Agent
    OpenAI
  • Gemini
    Google

Categories & weights

  1. Brand awareness10%

    Measures how often and how accurately leading agents mention your brand in unprompted research and comparison tasks.

  2. Discovery15%

    How well agents discover, shortlist and recommend your product when a user asks for a solution in your category.

  3. Information retrieval15%

    Quality of structured data, semantics and navigability for agents extracting pricing, specs and policies.

  4. Market ranking15%

    Your relative placement vs. competitors when agents produce ranked lists for buyers in your market.

  5. Accuracy15%

    How often agents report correct pricing, features, availability and policies, and how often they hallucinate.

  6. Task completion15%

    Whether an agent can carry out a delegated interaction end-to-end: a primary call-to-action is exposed, the destination is reachable without a login wall, no bot-challenge blocks the first step, and the service signals support for agent-friendly auth.

  7. Delegated access10%

    Availability of scoped auth (OIDC, API keys, MCP), rate limits, and tooling for delegated agent sessions.

  8. Contact & communication5%

    Whether an agent acting on a user’s behalf can find a way to contact or communicate with the business: contact page, support email, live chat, phone, help hub, or a social profile it can DM.

Opportunity-cost projections

Unclaimed scorecards show a modelled “value at risk” figure. It is a directional estimate, not a measurement: we apply a projected 2027 agent-initiated share of sector traffic (synthesised from Gartner’s agentic-AI forecasts and Adobe Analytics’ AI-referred retail traffic trend) to an illustrative sector revenue baseline, scaled by the domain’s gap to a perfect score. Owners who claim their domain can replace the baseline with actuals or hide the model entirely. Sources are cited inline wherever the figure appears.

Ethics

We identify our agents with a stable UA token and honour opt-outs declared in robots.txt and headers. We don’t store PII, we never complete payments or submit real registration details, and we rate-limit against target domains. Anything you see in a public friction report, an agent acting on behalf of a real customer could also encounter.