Methodology

The Stunt Double Index is produced by Stunt Double, an AI testing platform that dispatches real agents to carry out real user tasks. Every score on the index comes from observed agent behaviour — not crawling heuristics.

Agent providers

We run weekly cohorts of sessions on the following providers. Each category is measured across every provider and the per-provider scores are averaged.

  • Claude
    Anthropic
  • ChatGPT Agent
    OpenAI
  • Gemini
    Google
  • Perplexity
    Perplexity
  • Copilot
    Microsoft
  • Browserbase Operator
    Browserbase

Categories & weights

  1. Brand awareness10%

    Measures how often and how accurately leading agents mention your brand in unprompted research and comparison tasks.

  2. Discovery15%

    How well agents discover, shortlist and recommend your product when a user asks for a solution in your category.

  3. Information retrieval15%

    Quality of structured data, semantics and navigability for agents extracting pricing, specs and policies.

  4. Market ranking15%

    Your relative placement vs. competitors when agents produce ranked lists for buyers in your market.

  5. Accuracy15%

    How often agents report correct pricing, features, availability and policies — and how often they hallucinate.

  6. Checkout15%

    End-to-end completion rate for agentic checkout across guest, logged-in and delegated-access flows.

  7. Delegated access10%

    Availability of scoped auth (OIDC, API keys, MCP), rate limits, and tooling for delegated agent sessions.

  8. Support & returns5%

    Completion rate and clarity when agents attempt to get help, cancel, return, or manage a subscription.

Task protocol

Each session runs a reference task for the sector (e.g. purchase a product, research a return policy, price-compare a loan). Outcomes are scored by a rubric agent independent of the one running the task. All sessions are recorded and replayable for claimed domains.

Ethics

We identify our agents with a stable UA token and honour opt-outs declared in robots.txt and headers. We don't store PII and we rate-limit against target domains. Anything you see in a public friction report, an agent acting on behalf of a real customer could also encounter.