How We Measure AI Memory

Transparent methodology for measuring what AI models recommend before the click.

Overview

OutCited measures how AI models remember and recommend brands by systematically querying multiple AI models across thousands of categories, tracking citation frequency, ranking positions, and changes over time.

We run standardized prompts weekly, collect structured responses, and normalize data to create comparable metrics across models and categories.

Why Zero-Click Coverage Matters

60% of AI answers never generate a click.

If you're measuring AI web traffic (GA4 filters for "traffic from chatgpt.com"), you're missing 60% of the story. When AI answers a question without providing a link, or when users get the information they need without clicking, traditional analytics see nothing.

OutCited sees ALL AI responses — clicked or not.

We directly query AI models and capture their responses, regardless of whether those responses generated clicks. This gives you complete visibility into:

  • What AI said about your brand (even with zero clicks)
  • How AI ranked you vs competitors (before any user action)
  • What AI remembered about you (independent of traffic)
  • How AI's perception changed over time (upstream of analytics)

This is the difference between measuring AI perception (what's in the model's head) and measuring AI referrals (what clicks you got).

AI Models Tracked

Mass Market Models

  • • GPT-4o-mini (OpenAI)
  • • Gemini Flash (Google)
  • • DeepSeek (China)
  • • Llama 3.1 (Meta/Groq)
  • • Mistral (Europe)

Premium Models

  • • GPT-4 (OpenAI)
  • • Claude 3.5 Sonnet (Anthropic)
  • • Gemini Pro (Google)
  • • Cohere Command R+
  • • Perplexity Sonar
  • • xAI Grok
  • • AI21 J2 Ultra

Total: 10+ models tracked weekly, representing both free and paid AI experiences that users encounter.

Prompt Methodology

Standardized Prompts

For each category, we use a standardized prompt template:

"List the top 15 [category] companies with their domains.

Return ONLY a JSON array:

[

{"domain": "example.com", "name": "Company Name", "rank": 1, "score": 95}

]"

Category Coverage

We track 2,147 canonical categories across technology, business software, consumer products, and specialized verticals (including electronics components).

Categories are normalized to a canonical taxonomy to ensure consistent measurement across models and time periods.

Sampling Frequency

  • Weekly collection: All models queried for all categories every week
  • Time windows: Data organized into weekly "tensors" (week_YYYY_WW format)
  • Historical tracking: Maintains complete history for drift analysis

How We Quantify "Recommendation"

Citation Score: The percentage of times a brand appears in the top 15 recommendations for a category across all queries.

Ranking Position: Average position when the brand is mentioned (1-15 scale, lower is better).

Model Consensus: Agreement across models—high consensus means multiple models recommend the same brand, low consensus indicates model-specific bias.

Category Coverage: Number of categories where a brand appears, indicating breadth of AI memory.

Handling Model Drift

Weekly Tensors: We organize data into weekly time windows, allowing us to track changes over time and detect drift.

Drift Calculation: We compare citation scores between time periods (e.g., July vs. November) to identify significant changes in AI memory.

Model-Specific Tracking: We track drift separately for each model, allowing us to identify when changes are model-specific (e.g., OpenAI reducing Microsoft citations) vs. market-wide shifts.

Normalization: All data is normalized to canonical categories and brand domains, ensuring consistent comparison across time periods even as models update.

Avoiding Bias

Standardized Prompts: We use identical prompts across all models and categories, ensuring fair comparison.

Multiple Models: By tracking 10+ models, we can identify when results are model-specific vs. universal.

Category Normalization: We use a canonical category taxonomy, preventing category name variations from affecting results.

Brand Canonicalization: We normalize brand domains (e.g., "microsoft.com" vs "www.microsoft.com") to ensure accurate tracking.

Transparent Methodology: All data collection is automated and reproducible—no manual curation or selection bias.

Data Quality & Reliability

Idempotent Writes: All data collection uses idempotent database operations, preventing duplicates and ensuring data integrity.

Error Handling: Failed API calls are retried with exponential backoff, and errors are logged without stopping collection.

Validation: All responses are validated for JSON structure and required fields before storage.

Time Window Tracking: Data is organized into weekly tensors, ensuring accurate historical comparison and drift detection.

Limitations & Considerations

Model Updates: AI models are updated frequently, which can cause citation changes that reflect model updates rather than market shifts.

Prompt Sensitivity: Results may vary based on prompt phrasing, though we use standardized prompts to minimize this.

Category Boundaries: Some brands may appear in multiple categories, and category definitions may evolve over time.

Sampling Frequency: Weekly collection provides good temporal resolution, but may miss very short-term changes.

Model Availability: Some models may be unavailable or rate-limited during collection windows, though we retry failed collections.

See Your Brand's AI Memory

Check how AI models remember and recommend your brand across categories.