How to Measure Your AI Visibility Score — Method and Metrics

There's a fundamental problem with AI visibility: unlike SEO, there's no Google Search Console for LLMs. ChatGPT doesn't tell you how many times your brand was mentioned. Perplexity has no impressions dashboard. And according to BrandMentions.link data (2026), only 20% of brand mentions in ChatGPT include a clickable link — the remaining 80% are invisible to your traditional analytics.

Yet measuring AI visibility is possible — provided you use the right metrics and a reproducible methodology. This guide gives you the complete framework.

Why traditional SEO metrics aren't enough

Domain Authority, number of backlinks, Google positions — these metrics measure signals that Google values. LLMs don't consult Ahrefs before generating a response. A brand with a Domain Rating of 8 can be the most cited in ChatGPT for its category if its content is correctly structured and its entity signals are clear.

The correlation between Google ranking and LLM citations is rapidly eroding: in 2025, 76% of Google AI Overview citations came from the organic top 10. By early 2026, this figure had fallen to 38%. The two systems are decoupling — and require distinct metrics.

The 5 metrics that matter

Metric 1 — Mention Rate (Presence Rate)

Definition: how many of the tested prompts does your brand appear in, per platform.

Calculation: number of prompts where you appear ÷ total number of prompts tested × 100

Example: you test 20 prompts on ChatGPT, you appear in 8 responses → 40% mention rate

Interpretation:

0-20%: weak visibility, structural problem
20-50%: partial visibility, clear opportunities
50-80%: good visibility, fine-tuning possible
80-100%: strong visibility, maintenance objective

This is the fundamental metric — it establishes your baseline and is comparable over time.

Metric 2 — AI Share of Voice (AI SoV)

Definition: your share of citations on your prompt set, all competitors combined.

Calculation: number of times you're cited ÷ total number of citations (you + all competitors) × 100

Example: on 20 ChatGPT prompts, there were 40 citations total (you and your competitors). You appear 8 times → 20% AI SoV

Why it matters: the mention rate tells you if you're visible, the AI SoV tells you what share of perceptive market you hold. A company with a 40% mention rate can have a 10% SoV if competitors appear 5 times more often.

Metric 3 — Position score

Definition: what position you appear in responses — first cited, middle of list, last.

Calculation: assign 3 points for a first citation, 2 for a middle citation, 1 for a last citation. Calculate your average score across all prompts.

Why it matters: LLMs tend to present the first entity mentioned as the default recommendation. AirOps research found that only 30% of brands maintain their visibility from one response to the next — and only 20% maintain their presence across 5 consecutive responses. Average position is an indicator of the solidity of your entity.

Metric 4 — Sentiment and description quality score

Definition: how you're described when cited — positive, neutral, with reservations.

Calculation: on responses where you appear, classify each mention in 3 categories:

Positive: explicitly recommended, described with valorising attributes
Neutral: mentioned without evaluation
With reservations: cited with limitations or negative nuances

Why it matters: a high mention rate with a majority of neutral or reserved mentions is a perceived positioning problem — not a visibility problem.

Metric 5 — Multi-platform coverage

Definition: how many of the 5 main AI engines you appear on at least once on your sector discovery prompts.

Calculation: score of 0 to 5 according to the number of platforms where you're present.

Why it matters: according to the analysis of 680 million citations, only 11% of domains are cited by both ChatGPT and Perplexity. Being visible on a single platform exposes you to concentration risk — if that platform changes its algorithm, you can lose all visibility.

The manual scoring method in 4 steps

Step 1 — Build your reference prompt set

Define 15 to 20 prompts representing the questions your prospects ask. Include:

5 sector discovery prompts ("which provider in [field] do you recommend?")
5 comparison prompts ("which players are recognised in [field] in [country]?")
5 prompts on your main use cases ("I'm looking for an expert for [client problem]")

Golden rule: never change these prompts between two measurements. Reproducibility is key — you're measuring evolution over time, not absolute performance.

Step 2 — Execute the scoring

Test each prompt on each platform (ChatGPT, Perplexity, Gemini, Copilot, Claude). For each response, note:

Presence: yes/no
Position: 1st, middle, last
Mention type: with link, without link, name only
Sentiment: positive, neutral, with reservations

Test each prompt 2 to 3 times to compensate for the natural variability of LLMs.

Step 3 — Calculate your overall score

Aggregate results into a composite score out of 100:

| Dimension | Weight | Calculation | |---|---|---| | Mention rate | 35% | Average across all prompts and platforms | | Share of Voice | 25% | Relative share vs competitors | | Position score | 20% | Weighted average score | | Sentiment | 10% | % positive mentions | | Platform coverage | 10% | Number of platforms / 5 |

Step 4 — Document and compare over time

Record your score with the date, test conditions (LLM versions used), and 3 qualitative observations about what you noticed. Compare with the previous month.

Reference benchmarks in 2026

For reference, here are the performance levels observed in B2B markets:

| Overall score | Interpretation | |---|---| | 0-20 | Near-total invisibility — structural problems to fix as a priority | | 20-40 | Partial presence — visible on 1-2 platforms, absent from others | | 40-60 | Emerging visibility — present but not dominant on most platforms | | 60-80 | Good visibility — regularly cited, clear positioning perceived | | 80-100 | Strong visibility — recognised reference in its field by LLMs |

The majority of B2B SMEs that have never worked on their AI visibility sit between 5 and 25.

Volatility: a factor to integrate into your measurement

An important data point to know: AI responses aren't stable. AirOps research found that only 30% of brands maintain their visibility from one response to the next on the same prompt. And Backlinko (November 2025) documented that sources cited by LLMs can change by 80% in two months during model updates.

Practical consequences:

Never conclude from a single test — test each prompt 2 to 3 times minimum
A month-on-month score drop may reflect a model update, not a degradation of your presence
Document major model updates (like Gemini 3 in January 2026) in your tracking journal

Where to start?

Our free scoring tool automates steps 1 and 2 of this method — you get a structured assessment of your visibility on the 5 AI engines in a few minutes, with a score per engine.

For complete scoring including competitive benchmarking and tracking of your evolution over time, our AI Diagnostic delivers a detailed report within 5 business days.

To understand how to improve your score once established, read How to optimise your content for AI engines in 2026 and The 10 mistakes making your business invisible.

Sources: BrandMentions.link data on AI citation dark visibility (2026), AirOps research on LLM citation stability (2026), Backlinko data on cited source volatility (November 2025), AI Visibility Score framework DerivateX (2026), Otterly analysis of 1 million AI citations (2026).