How AY Rank Finds Keywords That Get Cited by AI Search
We treat keyword research as two parallel inputs: classic Google demand (Ahrefs + SERP) and AI-search demand (real prompts pulled from ChatGPT, Perplexity, Gemini). The output is one ranked backlog scored by citation readiness, not search volume alone.
- Cadence
- Weekly sweep + ad-hoc
- Output
- Ranked keyword + prompt backlog
- Category
- Keyword monitoring
Most agencies still build keyword lists from Google volume only. That misses 30 to 60% of the queries that drive AI-cited recommendations (long prompts, comparison questions, jurisdiction-specific asks) because they never crossed a 50 MSV threshold in Ahrefs. We rebuild the funnel from both sides.
The workflow.
7 steps · runs at weekly sweep + ad-hoc
- 01
Seed harvest from the brand
Pull every product, service, persona, pain point, and industry from the client site. We extract entities from page H1s, schema.org/Service blocks, and existing FAQ schema, not from a brainstorm doc that goes stale in two weeks.
- 02
Ahrefs expansion + competitor gap
For each seed, expand via Ahrefs Keywords Explorer (Matching terms, Related, Questions) and pull the top 5 competitors' organic and paid keywords. Filter by intent + KD vs the client's DR. Surface striking-distance (positions 4–15) as a separate stream.
- 03
Search Console + GA4 first-party signals
Pull the client's own Google Search Console queries (impressions, CTR, position deltas) and GA4 landing-page conversion data. Queries the brand already ranks for but doesn't convert on, or impressions with no clicks, become the highest-priority backlog candidates because the demand is already proven.
- 04
AI prompt extraction
For the top 20 seeds, run scripted prompts against ChatGPT, Perplexity, Gemini, and Google AI Overviews. Log every cited source and every follow-up question the model suggests, those follow-ups become the next round of seeds. Repeat until expansion plateaus.
- 05
Intent classification
Every keyword and prompt is auto-tagged as informational, commercial, navigational, or transactional by an Anthropic model with the brand's ICP as context. Mismatched intent (e.g. a transactional prompt routed to a blog post) is flagged before content brief.
- 06
Citation readiness score
A 0–100 score per query: combines AI citation density (how many AI engines surface a result at all), source-share entropy (is one source dominating, or is it open), commercial intent, and the client's current presence. We ship the highest-score gaps first.
- 07
Ship to backlog
Top 20 per sprint land in Supabase tagged with target page, target schema type, and intent. The blog pipeline and programmatic SEO workflows pull from this backlog, keyword work never becomes a static spreadsheet that nobody opens.
The stack.
A representative cadence.
- MonWeeklyAhrefs sweep + competitor gap pull
- TueWeeklyAI prompt runs across 4 engines, dedupe + tag
- WedWeeklyIntent classification + readiness scoring
- ThuWeeklyBacklog ranking, brief generation, hand-off
- FriWeeklySlack digest, striking-distance alerts
What you get.
- Ranked keyword + prompt backlog in Supabase, refreshed weekly
- Striking-distance keyword alerts (positions 4–15) in Slack
- Per-query citation readiness score with source-share breakdown
- Intent-tagged content briefs ready for the writing pipeline
Get the keyword research playbook.
The Ahrefs pulls, striking-distance filters, intent classification rules, and outreach-grade backlink scoring we run every week.
No spam. Unsubscribe anytime.
Common questions.
Why pull keywords from ChatGPT and Perplexity directly?
Because the queries people type into LLMs do not look like the queries they typed into Google. They are longer, more conversational, more comparison-driven, and frequently below classic search volume thresholds. Pulling real prompt traffic from the models themselves is the only honest input for AI-search content.
How is citation readiness scored?
Four weighted inputs: AI citation density (out of 5 engines, how many cite anyone at all for this query), source-share entropy (is one source dominating or is the slot contested), commercial intent (how close to revenue), and current presence (where the client already ranks or gets cited). The score is a 0–100 number we sort the backlog by.
How often does the backlog get refreshed?
Weekly for active clients. The cadence matters because AI engines re-index citation sources faster than classic search, a Reddit thread or a competitor blog post can take a slot in days, not months. A weekly refresh keeps the backlog honest.
Do you still use traditional Google search volume?
Yes, it is one of four inputs to citation readiness, not the only one. A 2,000 MSV query with zero AI citations beats a 50 MSV query with daily AI Overview placement, and vice versa, depending on commercial intent. The score balances both.
Related workflows.
Want this workflow running on your brand?
Book a free GEO audit. We'll tell you whether this play moves the needle for your category and what citation share is realistic in 90 days.
Book a free GEO audit»

