Headless CMS SEO Audit: The Vendor-Neutral Guide
Every headless CMS vendor publishes their own SEO guide — and every one of them has blind spots. This is the independent, vendor-neutral audit methodology that works across Contentful, Sanity, Payload, Strapi, Storyblok, Directus, and Hygraph.
Headless CMS SEO audit: the vendor-neutral guide
Every headless CMS vendor publishes SEO guidance — and every one of them has the same problem. Contentful's guide covers Contentful. Sanity's covers Sanity. Strapi's covers Strapi. They're helpful within their scope, but they can't tell you what breaks at the seams — where the CMS hands off to the rendering framework, where metadata pipelines span two systems, where URL structures live in code rather than the CMS.
The SEO audit methodology for headless sites is fundamentally different from monolithic sites, and no single vendor has an incentive to explain the full picture. This guide does.
It covers the audit process that works across all seven major headless CMS platforms — Contentful, Sanity, Payload CMS, Strapi, Storyblok, Directus, and Hygraph — without favoring any of them. Whether you're auditing your own site or inheriting a client project on an unfamiliar stack, this is the methodology.
See your entire website in one place → Start free
Table of contents
- Why headless CMS audits differ from monolithic audits
- The four-layer audit framework
- Platform-specific audit notes
- The metadata pipeline audit
- Rendering and crawlability
- Content model and structured data
- Putting it together in Evergreen
- Frequently asked questions
Why headless CMS audits differ from monolithic audits {#why-headless-cms-audits-differ}
In a monolithic CMS like WordPress, the system that stores your content is the same system that renders it. When you write a title tag in WordPress, that title tag appears in the HTML. When you set a URL slug, that slug is the URL. The relationship between content input and HTML output is one-to-one and immediately verifiable.
Headless CMS architecture breaks that relationship. The CMS stores content. A separate rendering framework — Next.js, Astro, Nuxt, SvelteKit, a custom SSR setup — retrieves that content via API and generates the HTML. Between those two systems, things go wrong in ways that don't exist in monolithic sites:
Metadata pipelines cross system boundaries. A content editor sets a meta description in Contentful. A Next.js generateMetadata function fetches it via API. A middleware layer transforms it. The rendered HTML may or may not match what the editor typed. Standard audit tools check the HTML output, not the pipeline that produced it.
URL structures live in code, not the CMS. In WordPress, the permalink settings and the slug field determine the URL. In a headless setup, the routing layer in Next.js or Astro defines URL patterns. The CMS often has no concept of the final URL. A route change in code can break every internal link without the CMS knowing.
Rendering varies per page. A single headless site might serve some pages via static generation (SSG), others via server-side rendering (SSR), and others via incremental static regeneration (ISR). Each rendering mode has different SEO implications — Googlebot's behavior, caching behavior, and metadata freshness all differ. For a detailed exploration of rendering modes and their SEO impact, see Technical SEO audit guide for headless websites.
Content delivery adds a layer. CDN caching, edge functions, and preview modes can all interfere with what search engines see. A page that works perfectly in the CMS preview might serve stale content from the CDN to Googlebot.
These aren't edge cases. They're the normal state of headless architecture. An audit methodology that doesn't account for them misses the most common and impactful issues.
The four-layer audit framework {#the-four-layer-audit-framework}
A comprehensive headless CMS SEO audit examines four layers, from outermost (what Google sees) to innermost (where content is authored):
Layer 1: The rendered output
This is where all audits start — the actual HTML that search engines receive. Crawl the site and examine what's there:
- Title tags. Present? Unique? Within 60 characters? Matching the intended content from the CMS?
- Meta descriptions. Present? Unique? Under 155 characters? Not auto-generated by the framework with placeholder text?
- H1 tags. Exactly one per page? Containing the primary keyword? Not duplicated from the title tag word-for-word?
- Canonical tags. Present? Self-referencing on canonical pages? Not pointing to wrong URLs after a migration?
- Structured data. Valid JSON-LD? Not duplicated across multiple script blocks? Matching the page content?
- HTTP status codes. No soft 404s (200 status on empty pages)? No redirect chains? No accidental 500s on pages that should render?
This layer catches the symptoms. The next three layers diagnose the causes.
Layer 2: The rendering framework
Examine how the framework generates the HTML:
- Metadata generation. How does the framework produce title tags and meta descriptions? In Next.js 15+, this is typically
generateMetadataor themetadataexport. In Astro, it's the<head>section of the layout. In each case, trace the path from CMS field to rendered tag. - Routing. How are URLs defined? Are they derived from CMS slugs, hardcoded in the routing configuration, or dynamically generated? What happens when a slug changes in the CMS — does the old URL redirect or 404?
- Rendering mode. Which pages are SSG, SSR, or ISR? Are there pages that render client-side only (CSR) and therefore may not be indexed reliably? For ISR pages, what's the revalidation interval?
- Sitemap generation. Is the XML sitemap generated from the CMS content, from the routing layer, or both? Does it include only indexable pages? Does it update when content changes?
Layer 3: The CMS content model
Examine how the CMS structures SEO-relevant content:
- SEO fields. Does the content model include dedicated fields for title tag, meta description, OG image, canonical URL override, and noindex toggle? Or are these derived from content fields (page title → title tag)?
- Validation rules. Does the CMS enforce character limits on SEO fields? Can an editor publish a page with a 200-character title tag?
- Slug management. How are slugs generated? Can editors change them? If so, does the old slug redirect? Is there a history of slug changes?
- Image handling. How are images stored and served? Does the CMS provide image transformations (resizing, format conversion to WebP/AVIF)? Are alt text fields required or optional?
Layer 4: The integration and delivery
Examine the connective tissue:
- API performance. How fast does the CMS API respond? Slow API responses can cause build timeouts (SSG) or slow server responses (SSR), both of which affect SEO.
- CDN and caching. Is the rendered output cached? For how long? Does the cache invalidate when content changes? Can Googlebot receive stale content?
- Preview and draft modes. Is there any risk that preview or draft content leaks to production? Some frameworks expose preview modes that bypass caching — if misconfigured, Googlebot might index draft content.
- Webhook reliability. If the site uses webhooks for rebuild triggers (common with SSG), how reliable are they? Missed webhooks mean published content doesn't appear on the site.
Platform-specific audit notes {#platform-specific-audit-notes}
Each headless CMS handles SEO differently. Here are the platform-specific considerations for the seven most common platforms.
Contentful
Contentful doesn't include built-in SEO fields — they must be added to the content model manually. This means every Contentful project implements SEO differently.
What to check. Verify that the content model includes a dedicated SEO component (title, description, OG image, canonical override). Check whether the SEO fields are required or optional at the content-type level. Inspect the Contentful Compose plugin if used — it adds page-level metadata but its field naming doesn't always match what the rendering framework expects.
Common pitfall. Contentful's rich text renderer often strips semantic HTML. H2 tags in the CMS might render as <p><strong> in the output. Verify heading hierarchy in the rendered HTML, not just the CMS preview.
Sanity
Sanity uses a schema-based approach where SEO fields are defined in code (typically using the sanity-plugin-seo-pane or custom schema definitions).
What to check. Review the Sanity schema for SEO field definitions. Check the GROQ queries that the rendering framework uses to fetch SEO data — are they fetching all necessary fields? Sanity's real-time preview can show different content than the published version; verify the published state.
Common pitfall. Sanity's document-level slugs don't automatically handle URL hierarchies. A page with slug about-us and a child page with slug team won't automatically nest as /about-us/team unless the rendering framework explicitly constructs that hierarchy. Verify URL structure end-to-end.
Payload CMS
Payload includes an official SEO plugin (@payloadcms/plugin-seo) that adds title, description, and image fields to collections. It's the most opinionated SEO setup among the seven platforms.
What to check. Verify the SEO plugin is installed and enabled for all relevant collections — it's not automatic. Check the generateTitle and generateDescription functions if they're overridden (custom logic can introduce bugs). Payload's access control system can accidentally block the rendering framework from fetching SEO fields.
Common pitfall. Payload 3.x runs on Next.js natively, which means the CMS and the rendering framework share a codebase. This simplifies some things but creates confusion about which metadata API is in charge — Payload's SEO plugin or Next.js generateMetadata. Verify they're not conflicting.
Strapi
Strapi uses a plugin-based approach. The community strapi-plugin-seo adds basic SEO fields, but many teams build custom SEO components.
What to check. Identify which SEO mechanism is in use (official plugin, community plugin, or custom fields). Check the REST API or GraphQL responses to verify SEO fields are included — Strapi's default field selection can exclude fields not explicitly requested. Verify that the i18n plugin, if used, correctly localizes SEO fields.
Common pitfall. Strapi v4 and v5 have different API response structures. If the rendering framework was built for v4 and the CMS was upgraded to v5, the SEO field paths in API responses may have changed. This breaks metadata silently.
Storyblok
Storyblok uses a visual editor with a component-based content model. SEO fields typically live in a dedicated "SEO" component that's added to page-level stories.
What to check. Verify the SEO component exists in the content model and is added to all page-level story types. Check the rendering framework's Storyblok bridge integration — it can re-render pages in real time, which is great for editing but can cause layout shifts if the bridge fires after initial load.
Common pitfall. Storyblok's slug system generates slugs from the story name by default, but doesn't enforce uniqueness across folders. Two stories in different folders can have the same slug, leading to URL conflicts in the rendering framework.
Directus
Directus is a database-first headless CMS — it wraps any SQL database with an API and admin panel. SEO fields are database columns you define yourself.
What to check. There's no SEO plugin. Verify that the database schema includes SEO columns for every content collection. Check whether the API's field permissions expose SEO fields to the rendering framework. Directus's flows system (automations) can modify content on save — verify it's not overwriting SEO fields.
Common pitfall. Directus doesn't have a concept of "page" vs "component" — everything is a collection. This means the audit needs to identify which collections represent pages (and need SEO fields) versus which represent reusable components (and don't). The rendering framework makes this distinction, not the CMS.
Hygraph (formerly GraphCMS)
Hygraph uses a schema-based GraphQL approach. SEO fields are defined in the schema and queried via GraphQL.
What to check. Review the schema for SEO field definitions. Check GraphQL queries for completeness — are they requesting all SEO fields? Hygraph's content stages (draft, published) can serve different content; verify the rendering framework requests the correct stage.
Common pitfall. Hygraph's rich text field returns AST (Abstract Syntax Tree) data, not HTML. The rendering framework must serialize this to HTML correctly. Heading hierarchy, link attributes, and image alt text can all be lost or mangled in the serialization step.
The metadata pipeline audit {#the-metadata-pipeline-audit}
The single most impactful audit step for headless sites is tracing the metadata pipeline: the path a title tag, meta description, or canonical URL takes from the CMS editor's input field to the rendered HTML output.
Here's the methodology:
Step 1 — Document the happy path. For one representative page, trace the title tag from CMS field → API response → framework data fetching → rendering template → final HTML. Write down every transformation step.
Step 2 — Test each seam. At each transformation point, introduce a deliberate variation (a long title, a title with special characters, an empty title). Verify the output at the next step handles it correctly.
Step 3 — Check fallback behavior. What happens when the CMS title field is empty? Does the framework fall back to the page title? The content title? Nothing? Each fallback path should produce valid HTML.
Step 4 — Verify at scale. Use a crawler to check every page's rendered metadata against the source CMS fields. Discrepancies indicate pipeline failures. The content audit table makes this comparison efficient — sort by metadata issues to surface the mismatches.
Step 5 — Check OG and Twitter metadata separately. Social metadata often uses a separate pipeline (different CMS fields, different rendering templates). A page can have a perfect <title> tag and a completely broken og:title.
Rendering and crawlability {#rendering-and-crawlability}
Headless sites can render content in ways that search engines can't always process. This section of the audit verifies that what you build is what Googlebot indexes.
The rendering mode inventory
Create an inventory of rendering modes used across the site:
| Rendering mode | SEO implications | What to verify |
|---|---|---|
| SSG (Static Site Generation) | HTML is pre-built. Fast, crawlable, reliable. | Content freshness — how quickly do updates appear? |
| SSR (Server-Side Rendering) | HTML is generated per request. Always fresh, but slower. | Server response time under load. Error handling (does a CMS API timeout produce a 500 or a graceful fallback?). |
| ISR (Incremental Static Regeneration) | Pages are pre-built but revalidate on a schedule. | Revalidation interval — is stale content acceptable for the page type? Cache invalidation — does on-demand revalidation work? |
| CSR (Client-Side Rendering) | HTML is minimal; JavaScript builds the page. | Does the content appear in Google's cache? Use URL Inspection in Search Console to verify. |
Most headless sites use a mix of these modes. High-traffic pages might be SSG or ISR for performance. Frequently updated pages might be SSR for freshness. The audit should verify that each page uses an appropriate rendering mode for its SEO requirements.
For a full treatment of rendering modes and their SEO tradeoffs, see Technical SEO audit guide for headless websites.
JavaScript rendering risks
If any pages rely on client-side JavaScript to render content, verify indexation explicitly:
- Use Google Search Console's URL Inspection tool to see the rendered HTML Google sees
- Compare it to the source HTML and the fully-rendered browser output
- Check that title tags, meta descriptions, H1s, and body content are all present in Google's rendered version
Googlebot renders JavaScript, but it's not instantaneous — there's a rendering queue, and complex JavaScript can timeout. Content that depends on API calls after initial page load is at risk.
Robots and indexation
Headless frameworks sometimes add indexation controls that the CMS doesn't know about:
- Framework-level noindex. Some frameworks add noindex to pages that return no content (empty CMS queries). Verify this behavior.
- Preview mode leaks. If the framework has a preview mode, ensure it's not accessible to Googlebot. Preview URLs should return noindex or require authentication.
- Staging environment indexation. Headless sites often have multiple environments (development, staging, production). Verify that only production is indexable.
Content model and structured data {#content-model-and-structured-data}
A well-designed content model makes SEO easier. A poorly designed one makes it nearly impossible. Here's what to check.
Content model SEO requirements
Every content type that represents a page should include:
- Title tag field (separate from the display title, with a 60-character soft limit)
- Meta description field (with a 155-character soft limit)
- Slug field (with URL-safe validation)
- Canonical URL override (optional, for cross-posting or syndication)
- Noindex toggle (for pages that should exist but not be indexed)
- OG image field (with recommended dimensions noted in the field description)
If any of these are missing, flag them. The rendering framework can derive fallbacks, but explicit fields give editors control and reduce pipeline errors.
Structured data generation
Structured data in headless sites is typically generated by the rendering framework, not the CMS. Verify:
- JSON-LD is present on pages where it's expected (articles, products, FAQs, breadcrumbs)
- Data matches the page content. The structured data should reflect the actual rendered content, not stale CMS data. Validate with Google's Rich Results Test.
- No duplicate blocks. Some frameworks and plugins both inject structured data, resulting in duplicate JSON-LD blocks. Check the HTML source for duplicates.
Internal linking in headless CMS content
Internal links in CMS content often break in headless setups because the CMS doesn't know the final URL structure. A content editor links to /blog/my-post in the rich text field, but the rendering framework serves the page at /insights/my-post. The link works in the CMS preview but 404s on the live site.
Audit step. Crawl the site for broken internal links. For each broken link, trace whether the link was authored in the CMS or generated by the framework. CMS-authored links are the most fragile in headless architectures.
Putting it together in Evergreen {#putting-it-together-in-evergreen}
The audit methodology above covers a lot of ground. Here's how Evergreen streamlines the process.
Step 1 — Crawl the site. Add your headless site to Evergreen and run the initial crawl. Every page the crawler discovers appears in the audit table with its rendered metadata, HTTP status, internal link relationships, and indexability status. This is Layer 1 — the rendered output — captured automatically.
Step 2 — Identify metadata pipeline failures. In the audit table, filter to pages with missing or duplicate title tags, meta descriptions, and H1s. These are the pages where the metadata pipeline between CMS and rendering framework is failing. Sort by traffic to prioritize the pages that matter most.
Step 3 — Check rendering and indexability. Filter the audit table to pages marked as non-indexable. Cross-reference with search visibility data — if a noindex page has GSC impressions, something's wrong. Check for soft 404s (pages returning 200 status with minimal content) by filtering to pages with very low word counts.
Step 4 — Audit site structure. Switch to the visual sitemap view. The hierarchy reveals structural patterns immediately: orphan pages with no inbound links, sections buried too deep, content clusters that should be siblings but are scattered across the hierarchy. This is especially useful for headless sites where the URL structure is defined in code — the visual sitemap shows the actual architecture, not the intended one.
Step 5 — Connect analytics. With GA4 and GSC connected, the audit table blends traffic and search data into every row. Now you can filter to "pages with high traffic and SEO issues" — the highest-impact fixes. A page with 2,000 monthly sessions and a missing meta description is a very different priority than a page with zero traffic and the same issue.
The entire flow takes about 15 minutes for a 500-page site on the free plan. For larger sites or ongoing monitoring, the Pro plan adds daily syncs, shareable audit reports, and MCP Server access for AI-assisted analysis.
Want to run this audit on your own site? → Start free with 500 pages
Frequently asked questions {#frequently-asked-questions}
Do headless CMS sites rank worse than WordPress sites?
No. The rendering architecture doesn't determine ranking ability — content quality, technical implementation, and authority do. Headless sites can rank as well as or better than monolithic sites. The challenge is that headless architecture has more places where SEO implementation can fail silently. That's why auditing is more important, not less.
Which headless CMS is best for SEO?
None of them is inherently "best for SEO." SEO performance depends on the rendering framework, the content model design, and the implementation quality. Payload CMS has the most opinionated SEO plugin out of the box. Contentful and Sanity have the largest ecosystems with the most SEO resources. Strapi and Directus are the most flexible but require the most custom SEO work. The best CMS for SEO is the one your team implements correctly.
How often should I audit a headless CMS site?
Continuously, if possible. Headless sites change in two places — the CMS and the codebase — and either can introduce SEO issues without the other noticing. A code deployment that changes the routing structure can break every internal link. A CMS content model change can remove SEO fields from API responses. Monthly audits are a minimum; weekly or daily monitoring catches problems before they affect rankings.
Can I audit a headless site without knowing which CMS is being used?
Yes, for Layer 1 and Layer 2 (rendered output and rendering behavior). A crawler doesn't care what CMS produced the content — it checks the HTML output. Layers 3 and 4 (content model and integration) require access to the CMS, but the most impactful issues are usually visible in the rendered output.
Does Evergreen work with all seven headless CMS platforms?
Evergreen audits the rendered website, not the CMS directly. It works with any headless CMS because it crawls the production site and evaluates the HTML output. Platform-specific issues (like Contentful's rich text rendering or Sanity's slug handling) manifest as HTML-level problems that the crawler detects. You don't need to configure Evergreen differently for each platform.
What about headless ecommerce platforms like Shopify Hydrogen or Medusa?
The audit methodology is the same. Headless ecommerce platforms share the same architectural pattern — content storage separated from rendering — and the same SEO audit challenges. Product pages, category pages, and search result pages all need metadata pipeline verification, rendering mode checks, and structured data validation. The content model layer is different (products instead of articles), but the audit framework applies.
Your next step: crawl your headless site in 60 seconds → Create free account
Related Topics in Headless CMS SEO
Payload CMS SEO: The Complete Third-Party Guide
Payload CMS has excellent documentation but fragmented SEO guidance. This vendor-neutral guide covers access control, the SEO plugin, Next.js integration, structured data, and the mistakes that silently tank your rankings.
WordPress to Headless CMS: SEO Migration Playbook
Migrating from WordPress to a headless CMS without losing rankings requires a disciplined audit-redirect-validate loop. This playbook covers the full SEO migration path.
Headless CMS SEO Comparison: Contentful vs Sanity vs Strapi vs Payload
A vendor-neutral SEO comparison of four major headless CMSs. Feature matrix, metadata APIs, structured data support, and audit results — no winner declared.
Jamstack SEO Best Practices for 2026
The Jamstack SEO landscape has changed since 2016. ISR, DPR, edge rendering, and modern SSGs have rewritten the rules. Here's what actually matters now.
SSR vs CSR vs ISR: How Rendering Impacts SEO
Your rendering strategy determines what Google sees. SSR, CSR, ISR, and streaming SSR each have specific SEO implications — here's how to choose and audit.
