The Technical SEO Audit Guide

Technical SEO Audit Guide for Headless Websites

Headless websites separate content from presentation, and that separation introduces SEO audit challenges that monolithic sites don't have. This guide covers the methodology for auditing any headless stack.

Published April 15, 2026

14 min read

Technical SEO audit guide for headless websites

Headless websites decouple content storage from content presentation. That architectural decision — choosing a CMS for content and a separate framework for rendering — creates SEO audit challenges that don't exist in monolithic systems like WordPress or Squarespace. Metadata pipelines span two systems. Rendering behavior varies per page. URL structures are defined in code, not in the CMS. A standard audit checklist misses half of it.

This guide provides a structured methodology for auditing any headless site, regardless of whether it's built on Next.js, Astro, Remix, or a custom stack, and regardless of which headless CMS sits behind it. If you've been auditing WordPress sites and just inherited a headless client, start here.

See your headless site in full → Start free

Why headless audits are different

If you've audited monolithic CMS sites, you know the pattern: crawl the site, review the output, check metadata, test performance, verify indexability. The methodology is the same for headless sites, but the failure modes are different — and some of them are invisible to standard crawling tools.

The rendering layer introduces ambiguity

In a monolithic CMS, the server generates complete HTML for every request. What the crawler sees is what Google sees. In a headless architecture, the rendering layer may use static generation (SSG), server-side rendering (SSR), incremental static regeneration (ISR), or client-side rendering (CSR) — sometimes mixing strategies on the same site.

Each strategy produces different behavior for crawlers. A statically generated page works identically for every visitor. A server-rendered page may vary based on headers or cookies. A client-rendered page ships an empty HTML shell and populates it with JavaScript — which Googlebot can handle but which introduces delays and potential rendering failures.

The audit needs to identify which rendering strategy each page uses, because the strategy determines what can go wrong.

Metadata lives in two places

In WordPress, you set the title tag and meta description in Yoast or Rank Math, and they render in the HTML. In a headless setup, metadata may be defined in the CMS (as fields on the content model), in the rendering framework (as template defaults), or in both — with a merge step that can silently fail.

A content editor adds a meta description in Contentful. A developer maps that field to the <meta> tag in Next.js via generateMetadata. If the field name changes, if the API response shape changes, if the mapping function has a fallback that returns null instead of the CMS value — the meta description disappears. The CMS shows it as present. The rendered page shows it as missing. You only catch this by auditing the rendered output.

URL structures are code, not configuration

Monolithic CMSs generate URLs from content hierarchy (categories, slugs, dates). Headless sites define URL structures in the rendering framework's routing layer — file-based routing in Next.js or Astro, programmatic routing in Remix or custom Express apps. The CMS slug is an input to the routing layer, not the URL itself.

This means URL changes require code deploys, not CMS updates. It also means URL structures can diverge from content structure — a blog post in the CMS might live at a URL that suggests a different section of the site. Auditing URL consistency requires understanding both the CMS content model and the framework routing layer.

Sitemaps need explicit generation

WordPress auto-generates XML sitemaps via plugins. Headless sites require explicit sitemap generation — typically at build time for static sites or via a dedicated API route for server-rendered sites. If nobody sets this up, there's no sitemap. If it's set up but not maintained, new content types or routes get added without corresponding sitemap entries.

The seven-step headless audit methodology

This methodology works for any headless stack. The specifics differ by framework — for framework-specific checklists, see the Next.js SEO audit checklist or the Astro SEO checklist — but the structure applies universally.

Step 1: Map the rendering strategy per page type

Before auditing content, understand how each page type renders. This determines which audit checks apply.

Create a table of every page template or route pattern on the site, and identify its rendering strategy:

Page type	Route pattern	Rendering	Example
Homepage	`/`	SSG or SSR	Statically generated at build
Blog post	`/blog/[slug]`	ISR (60s)	Statically generated, revalidated
Product page	`/products/[id]`	SSR	Server-rendered per request
Search results	`/search`	CSR	Client-rendered after API call
Documentation	`/docs/[...path]`	SSG	Statically generated from MDX

What to check: For any page using CSR, verify that Googlebot can render it. Use Google Search Console's URL Inspection tool → "Test Live URL" → "View Tested Page" to see what Google actually renders. If critical content doesn't appear, the page needs server-side rendering or pre-rendering.

For ISR pages, verify that the revalidation interval is short enough that content updates appear in search within a reasonable timeframe. An ISR page with a 24-hour revalidation window means new content won't be crawlable for up to a day.

Step 2: Audit metadata pipelines

This is where headless sites fail most often. Check each page type for:

Title tags. Verify that every page has a unique, descriptive title. Audit for:

Pages inheriting a default title instead of using CMS content
Titles truncated due to character limits in the CMS field
Titles with template formatting errors (e.g., {{title}} | Site Name rendering literally)

Meta descriptions. The same pipeline issues apply. Find pages missing meta descriptions at scale using a crawl tool rather than spot-checking individual pages.

Canonical URLs. Verify that canonical tags point to the correct canonical URL — not localhost, not a staging domain, not a CMS preview URL. This is a common headless bug: the canonical URL generation reads the request URL, and if the build environment or preview environment has a different hostname, it leaks into production.

Open Graph and social tags. Same pipeline, same failure modes. Verify that og:title, og:description, og:image, and og:url render correctly. Social crawlers (Facebook, LinkedIn, Twitter) don't execute JavaScript, so these tags must be present in the initial HTML response.

Step 3: Crawl the rendered output

Crawl the live production site — not the CMS content, not the source code, the actual rendered output. This is what search engines see.

Check the crawl for:

HTTP status codes. 404s from broken routes, 500s from rendering failures, 301/302 redirect chains.
Duplicate content. Pages accessible via multiple URLs (with and without trailing slashes, with query parameters, with and without www).
Thin pages. Pages with minimal rendered content — sometimes caused by rendering failures where the framework catches an error and renders a fallback shell instead of the full page.
Orphan pages. Pages that exist in the CMS and have valid URLs but receive zero internal links. Common when content is created in the CMS but never linked from navigation or other pages.

A site-wide crawl also validates that the CMS content is actually making it through the rendering pipeline. If the CMS has 500 blog posts but the crawl only discovers 480, 20 posts have routing or indexing issues.

Step 4: Test site-wide performance

Headless sites often have excellent performance on static pages and terrible performance on pages that load client-side JavaScript bundles. Run bulk Lighthouse testing across the entire site to find the distribution.

Pay particular attention to:

LCP (Largest Contentful Paint) on pages that fetch content from the CMS API at render time (SSR). The API response time directly affects LCP.
INP (Interaction to Next Paint) on pages with heavy client-side hydration. Framework hydration bundles (React, Vue, Svelte) execute on page load and can block interaction.
CLS (Cumulative Layout Shift) on pages with lazy-loaded images or dynamically injected content. If the CMS content model doesn't include image dimensions, the rendering layer can't reserve space, causing layout shifts.

Step 5: Verify structured data

Structured data in headless sites is generated either from CMS content fields (mapped to JSON-LD in the rendering layer) or hardcoded in templates. Both approaches are valid. Both need verification.

Check for:

Presence. Does every page type that should have structured data actually have it? Article schema on blog posts, Product schema on product pages, BreadcrumbList on pages with breadcrumb navigation.
Accuracy. Do the structured data fields match the visible page content? A common bug: the structured data pulls from the CMS, but the rendered page content comes from a different API response or a cached version, so they diverge.
Validity. Use Google's Rich Results Test to validate syntax. Schema.org requires specific formats for dates, URLs, and enumerated types.

Step 6: Review the XML sitemap

Check that:

The sitemap exists and is accessible at a consistent URL (typically /sitemap.xml or /sitemap-index.xml)
Every indexable page appears in the sitemap
No non-indexable pages (noindex, redirected, 404) appear in the sitemap
The sitemap is submitted in Google Search Console
The <lastmod> dates reflect actual content changes, not build dates (a common headless bug where every page gets the same lastmod timestamp from the latest build)

Step 7: Check indexability signals

The final pass focuses on finding noindex pages and other indexability issues:

Noindex tags. Check for accidental noindex directives — especially from staging environment configuration that survived into production.
Robots.txt. Verify that the robots.txt doesn't block the rendering framework's JavaScript bundles (crawlers need these to render CSR/SSR pages).
X-Robots-Tag headers. Some CDNs or edge functions add response headers that override on-page meta tags. Check HTTP response headers for unexpected X-Robots-Tag: noindex directives.
Canonical conflicts. Pages where the canonical URL points to a different page, effectively de-indexing the original.

Common headless audit findings

After auditing dozens of headless sites, these are the patterns that appear most frequently.

Preview URLs leaking into production

Most headless CMSs offer content preview — a draft URL that renders unpublished content. If the preview domain is indexable and internally linked (or if preview URLs appear in the sitemap), Google may index draft content. Check for preview subdomains (preview.example.com, draft.example.com) in your crawl data.

Missing structured data on dynamically routed pages

Static pages often have structured data hardcoded in the template. Dynamic pages — those generated from CMS content — often lack it because nobody wrote the JSON-LD generation logic for that content type. Check every content type, not just the ones that were obvious during initial development.

Framework-default metadata surviving into production

Next.js, Astro, and other frameworks ship with default metadata (title, viewport, favicon references). If a page doesn't override these defaults, it inherits the framework's default or the layout's fallback. In a 500-page site, it's common to find 20–30 pages with the default title because their content model doesn't include a title field, or because the API returned null and the template used a fallback.

JavaScript bundle blocking rendering

On SSR or CSR pages, large JavaScript bundles delay rendering. The headless architecture often means loading the rendering framework (React, Vue, Svelte), the CMS client library, analytics, and UI component libraries. If these aren't code-split and tree-shaken, the total bundle size affects both user experience and Googlebot's rendering budget.

Trailing slash inconsistency

Some headless frameworks default to trailing slashes, others don't. Some CDNs strip them, others add them. The result is often a mix — some pages accessible at both /about and /about/, creating duplicate content. Pick a convention and enforce it at the framework and CDN level.

Putting it together in Evergreen

Evergreen crawls your headless site's rendered output and organizes the results into an audit table and visual sitemap. The crawl captures what search engines actually see — rendered HTML, not CMS content — so pipeline failures (missing metadata, broken rendering, preview URL leaks) surface in the data.

The audit table shows every page with its title, meta description, H1, word count, internal link count, indexability status, and Lighthouse scores. Filter to pages missing meta descriptions, sort by organic traffic, or isolate pages with noindex directives. Every metadata pipeline failure becomes a filterable row rather than a page you need to inspect manually.

The visual sitemap shows your headless site's structure as a hierarchy. Color-code by content health, performance score, or indexability to identify section-level patterns. Orphaned pages — content that exists in the CMS but is unreachable from navigation — appear as disconnected nodes.

Connect GA4 and Google Search Console, and the audit table blends traffic and search performance data into each row. Pages with high impressions but low engagement rate suggest metadata or content-intent mismatches. Pages with zero impressions despite being indexed suggest deeper ranking problems.

Shareable report URLs let you send audit results to clients or stakeholders without requiring them to log in or navigate your tooling. For agencies auditing headless client sites, this replaces the slide deck and the attached CSV.

Try it on your headless site → Start free

Frequently asked questions

Does the CMS I use affect SEO?

The CMS affects SEO indirectly through its content model capabilities, preview URL handling, and API performance. A CMS with rich content modeling (Contentful, Sanity, Payload) makes it easier to define SEO fields per content type. A CMS with poor API performance adds latency to SSR pages, hurting LCP. But the rendering framework — how you turn CMS content into HTML — has a larger impact than the CMS itself.

Should I use SSR or SSG for SEO?

For content that doesn't change per request (blog posts, documentation, marketing pages), SSG is ideal — the HTML is pre-generated and served instantly. For content that changes frequently or varies per user (search results, personalized recommendations), SSR is necessary. ISR offers a middle ground — static pages that revalidate on a schedule. The right choice depends on how often your content changes, not on SEO preferences. All three strategies produce crawlable HTML.

How do I audit a headless site behind authentication?

Pages behind authentication are generally not indexed (they shouldn't be), so the standard SEO audit focuses on publicly accessible pages. If you need to audit the authenticated experience for technical correctness (structured data, performance, rendering), use a tool that supports authenticated crawling — or audit a staging environment where the auth layer is disabled.

Do I need separate audits for the CMS and the frontend?

Yes, in the sense that you should verify both that the CMS content is correct (fields populated, metadata present) and that the rendered output is correct (metadata rendering, URL structure, performance). The CMS audit checks data quality. The frontend audit checks rendering quality. Problems in either system produce the same symptom — a page with issues — but the fix lives in different places.

How often should I re-audit a headless site?

After every deployment that touches templates, routing, or the content model. Headless sites change through two channels — CMS content updates and code deployments — and either can introduce SEO issues. For agencies managing headless client sites, the website audit checklist for agencies provides the recurring audit framework. Automated daily syncs catch deployment-related regressions before they compound.

Can a content audit methodology work on headless sites?

Absolutely. The content audit without spreadsheets approach applies to any site regardless of architecture. The headless-specific addition is verifying that the rendered content matches what the CMS stores — everything else (filtering, prioritizing, acting on findings) is the same.

Your next step: crawl your headless site and see the full audit → Create free account