Technical SEO Audit Guide for Headless Websites
Headless websites separate content from presentation, and that separation introduces SEO audit challenges that monolithic sites don't have. This guide covers the methodology for auditing any headless stack.
Technical SEO audit guide for headless websites
Headless websites decouple content storage from content presentation. That architectural decision — choosing a CMS for content and a separate framework for rendering — creates SEO audit challenges that don't exist in monolithic systems like WordPress or Squarespace. Metadata pipelines span two systems. Rendering behavior varies per page. URL structures are defined in code, not in the CMS. A standard audit checklist misses half of it.
This guide provides a structured methodology for auditing any headless site, regardless of whether it's built on Next.js, Astro, Remix, or a custom stack, and regardless of which headless CMS sits behind it. If you've been auditing WordPress sites and just inherited a headless client, start here.
See your headless site in full → Start free
Why headless audits are different
If you've audited monolithic CMS sites, you know the pattern: crawl the site, review the output, check metadata, test performance, verify indexability. The methodology is the same for headless sites, but the failure modes are different — and some of them are invisible to standard crawling tools.
The rendering layer introduces ambiguity
In a monolithic CMS, the server generates complete HTML for every request. What the crawler sees is what Google sees. In a headless architecture, the rendering layer may use static generation (SSG), server-side rendering (SSR), incremental static regeneration (ISR), or client-side rendering (CSR) — sometimes mixing strategies on the same site.
Each strategy produces different behavior for crawlers. A statically generated page works identically for every visitor. A server-rendered page may vary based on headers or cookies. A client-rendered page ships an empty HTML shell and populates it with JavaScript — which Googlebot can handle but which introduces delays and potential rendering failures.
The audit needs to identify which rendering strategy each page uses, because the strategy determines what can go wrong.
Metadata lives in two places
In WordPress, you set the title tag and meta description in Yoast or Rank Math, and they render in the HTML. In a headless setup, metadata may be defined in the CMS (as fields on the content model), in the rendering framework (as template defaults), or in both — with a merge step that can silently fail.
A content editor adds a meta description in Contentful. A developer maps that field to the <meta> tag in Next.js via generateMetadata. If the field name changes, if the API response shape changes, if the mapping function has a fallback that returns null instead of the CMS value — the meta description disappears. The CMS shows it as present. The rendered page shows it as missing. You only catch this by auditing the rendered output.
URL structures are code, not configuration
Monolithic CMSs generate URLs from content hierarchy (categories, slugs, dates). Headless sites define URL structures in the rendering framework's routing layer — file-based routing in Next.js or Astro, programmatic routing in Remix or custom Express apps. The CMS slug is an input to the routing layer, not the URL itself.
This means URL changes require code deploys, not CMS updates. It also means URL structures can diverge from content structure — a blog post in the CMS might live at a URL that suggests a different section of the site. Auditing URL consistency requires understanding both the CMS content model and the framework routing layer.
Sitemaps need explicit generation
WordPress auto-generates XML sitemaps via plugins. Headless sites require explicit sitemap generation — typically at build time for static sites or via a dedicated API route for server-rendered sites. If nobody sets this up, there's no sitemap. If it's set up but not maintained, new content types or routes get added without corresponding sitemap entries.
The seven-step headless audit methodology
This methodology works for any headless stack. The specifics differ by framework — for framework-specific checklists, see the Next.js SEO audit checklist or the Astro SEO checklist — but the structure applies universally.
Step 1: Map the rendering strategy per page type
Before auditing content, understand how each page type renders. This determines which audit checks apply.
Create a table of every page template or route pattern on the site, and identify its rendering strategy:
| Page type | Route pattern | Rendering | Example |
|---|---|---|---|
| Homepage | / | SSG or SSR | Statically generated at build |
| Blog post | /blog/[slug] | ISR (60s) | Statically generated, revalidated |
| Product page | /products/[id] | SSR | Server-rendered per request |
| Search results | /search | CSR | Client-rendered after API call |
| Documentation | /docs/[...path] | SSG | Statically generated from MDX |
What to check: For any page using CSR, verify that Googlebot can render it. Use Google Search Console's URL Inspection tool → "Test Live URL" → "View Tested Page" to see what Google actually renders. If critical content doesn't appear, the page needs server-side rendering or pre-rendering.
For ISR pages, verify that the revalidation interval is short enough that content updates appear in search within a reasonable timeframe. An ISR page with a 24-hour revalidation window means new content won't be crawlable for up to a day.
Step 2: Audit metadata pipelines
This is where headless sites fail most often. Check each page type for:
Title tags. Verify that every page has a unique, descriptive title. Audit for:
- Pages inheriting a default title instead of using CMS content
- Titles truncated due to character limits in the CMS field
- Titles with template formatting errors (e.g.,
{{title}} | Site Namerendering literally)
Meta descriptions. The same pipeline issues apply. Find pages missing meta descriptions at scale using a crawl tool rather than spot-checking individual pages.
Canonical URLs. Verify that canonical tags point to the correct canonical URL — not localhost, not a staging domain, not a CMS preview URL. This is a common headless bug: the canonical URL generation reads the request URL, and if the build environment or preview environment has a different hostname, it leaks into production.
Open Graph and social tags. Same pipeline, same failure modes. Verify that og:title, og:description, og:image, and og:url render correctly. Social crawlers (Facebook, LinkedIn, Twitter) don't execute JavaScript, so these tags must be present in the initial HTML response.
Step 3: Crawl the rendered output
Crawl the live production site — not the CMS content, not the source code, the actual rendered output. This is what search engines see.
Check the crawl for:
- HTTP status codes. 404s from broken routes, 500s from rendering failures, 301/302 redirect chains.
- Duplicate content. Pages accessible via multiple URLs (with and without trailing slashes, with query parameters, with and without
www). - Thin pages. Pages with minimal rendered content — sometimes caused by rendering failures where the framework catches an error and renders a fallback shell instead of the full page.
- Orphan pages. Pages that exist in the CMS and have valid URLs but receive zero internal links. Common when content is created in the CMS but never linked from navigation or other pages.
A site-wide crawl also validates that the CMS content is actually making it through the rendering pipeline. If the CMS has 500 blog posts but the crawl only discovers 480, 20 posts have routing or indexing issues.
Step 4: Test site-wide performance
Headless sites often have excellent performance on static pages and terrible performance on pages that load client-side JavaScript bundles. Run bulk Lighthouse testing across the entire site to find the distribution.
Pay particular attention to:
- LCP (Largest Contentful Paint) on pages that fetch content from the CMS API at render time (SSR). The API response time directly affects LCP.
- INP (Interaction to Next Paint) on pages with heavy client-side hydration. Framework hydration bundles (React, Vue, Svelte) execute on page load and can block interaction.
- CLS (Cumulative Layout Shift) on pages with lazy-loaded images or dynamically injected content. If the CMS content model doesn't include image dimensions, the rendering layer can't reserve space, causing layout shifts.
Step 5: Verify structured data
Structured data in headless sites is generated either from CMS content fields (mapped to JSON-LD in the rendering layer) or hardcoded in templates. Both approaches are valid. Both need verification.
Check for:
- Presence. Does every page type that should have structured data actually have it? Article schema on blog posts, Product schema on product pages, BreadcrumbList on pages with breadcrumb navigation.
- Accuracy. Do the structured data fields match the visible page content? A common bug: the structured data pulls from the CMS, but the rendered page content comes from a different API response or a cached version, so they diverge.
- Validity. Use Google's Rich Results Test to validate syntax. Schema.org requires specific formats for dates, URLs, and enumerated types.
Step 6: Review the XML sitemap
Check that:
- The sitemap exists and is accessible at a consistent URL (typically
/sitemap.xmlor/sitemap-index.xml) - Every indexable page appears in the sitemap
- No non-indexable pages (noindex, redirected, 404) appear in the sitemap
- The sitemap is submitted in Google Search Console
- The
<lastmod>dates reflect actual content changes, not build dates (a common headless bug where every page gets the samelastmodtimestamp from the latest build)
Step 7: Check indexability signals
The final pass focuses on finding noindex pages and other indexability issues:
- Noindex tags. Check for accidental noindex directives — especially from staging environment configuration that survived into production.
- Robots.txt. Verify that the robots.txt doesn't block the rendering framework's JavaScript bundles (crawlers need these to render CSR/SSR pages).
- X-Robots-Tag headers. Some CDNs or edge functions add response headers that override on-page meta tags. Check HTTP response headers for unexpected
X-Robots-Tag: noindexdirectives. - Canonical conflicts. Pages where the canonical URL points to a different page, effectively de-indexing the original.
Common headless audit findings
After auditing dozens of headless sites, these are the patterns that appear most frequently.
Preview URLs leaking into production
Most headless CMSs offer content preview — a draft URL that renders unpublished content. If the preview domain is indexable and internally linked (or if preview URLs appear in the sitemap), Google may index draft content. Check for preview subdomains (preview.example.com, draft.example.com) in your crawl data.
Missing structured data on dynamically routed pages
Static pages often have structured data hardcoded in the template. Dynamic pages — those generated from CMS content — often lack it because nobody wrote the JSON-LD generation logic for that content type. Check every content type, not just the ones that were obvious during initial development.
Framework-default metadata surviving into production
Next.js, Astro, and other frameworks ship with default metadata (title, viewport, favicon references). If a page doesn't override these defaults, it inherits the framework's default or the layout's fallback. In a 500-page site, it's common to find 20–30 pages with the default title because their content model doesn't include a title field, or because the API returned null and the template used a fallback.
JavaScript bundle blocking rendering
On SSR or CSR pages, large JavaScript bundles delay rendering. The headless architecture often means loading the rendering framework (React, Vue, Svelte), the CMS client library, analytics, and UI component libraries. If these aren't code-split and tree-shaken, the total bundle size affects both user experience and Googlebot's rendering budget.
Trailing slash inconsistency
Some headless frameworks default to trailing slashes, others don't. Some CDNs strip them, others add them. The result is often a mix — some pages accessible at both /about and /about/, creating duplicate content. Pick a convention and enforce it at the framework and CDN level.
Putting it together in Evergreen
Evergreen crawls your headless site's rendered output and organizes the results into an audit table and visual sitemap. The crawl captures what search engines actually see — rendered HTML, not CMS content — so pipeline failures (missing metadata, broken rendering, preview URL leaks) surface in the data.
The audit table shows every page with its title, meta description, H1, word count, internal link count, indexability status, and Lighthouse scores. Filter to pages missing meta descriptions, sort by organic traffic, or isolate pages with noindex directives. Every metadata pipeline failure becomes a filterable row rather than a page you need to inspect manually.
The visual sitemap shows your headless site's structure as a hierarchy. Color-code by content health, performance score, or indexability to identify section-level patterns. Orphaned pages — content that exists in the CMS but is unreachable from navigation — appear as disconnected nodes.
Connect GA4 and Google Search Console, and the audit table blends traffic and search performance data into each row. Pages with high impressions but low engagement rate suggest metadata or content-intent mismatches. Pages with zero impressions despite being indexed suggest deeper ranking problems.
Shareable report URLs let you send audit results to clients or stakeholders without requiring them to log in or navigate your tooling. For agencies auditing headless client sites, this replaces the slide deck and the attached CSV.
Try it on your headless site → Start free
Frequently asked questions
Does the CMS I use affect SEO?
The CMS affects SEO indirectly through its content model capabilities, preview URL handling, and API performance. A CMS with rich content modeling (Contentful, Sanity, Payload) makes it easier to define SEO fields per content type. A CMS with poor API performance adds latency to SSR pages, hurting LCP. But the rendering framework — how you turn CMS content into HTML — has a larger impact than the CMS itself.
Should I use SSR or SSG for SEO?
For content that doesn't change per request (blog posts, documentation, marketing pages), SSG is ideal — the HTML is pre-generated and served instantly. For content that changes frequently or varies per user (search results, personalized recommendations), SSR is necessary. ISR offers a middle ground — static pages that revalidate on a schedule. The right choice depends on how often your content changes, not on SEO preferences. All three strategies produce crawlable HTML.
How do I audit a headless site behind authentication?
Pages behind authentication are generally not indexed (they shouldn't be), so the standard SEO audit focuses on publicly accessible pages. If you need to audit the authenticated experience for technical correctness (structured data, performance, rendering), use a tool that supports authenticated crawling — or audit a staging environment where the auth layer is disabled.
Do I need separate audits for the CMS and the frontend?
Yes, in the sense that you should verify both that the CMS content is correct (fields populated, metadata present) and that the rendered output is correct (metadata rendering, URL structure, performance). The CMS audit checks data quality. The frontend audit checks rendering quality. Problems in either system produce the same symptom — a page with issues — but the fix lives in different places.
How often should I re-audit a headless site?
After every deployment that touches templates, routing, or the content model. Headless sites change through two channels — CMS content updates and code deployments — and either can introduce SEO issues. For agencies managing headless client sites, the website audit checklist for agencies provides the recurring audit framework. Automated daily syncs catch deployment-related regressions before they compound.
Can a content audit methodology work on headless sites?
Absolutely. The content audit without spreadsheets approach applies to any site regardless of architecture. The headless-specific addition is verifying that the rendered content matches what the CMS stores — everything else (filtering, prioritizing, acting on findings) is the same.
Your next step: crawl your headless site and see the full audit → Create free account
Related Topics in The Technical SEO Audit Guide
The Technical SEO Checklist for 2026
A practical technical SEO checklist covering crawlability, indexation, Core Web Vitals, structured data, JavaScript rendering, and AI search visibility — updated for 2026.
How to Find and Fix All Broken Links on Your Site
A practical guide to finding, prioritizing, and fixing broken links across your website to improve user experience and SEO performance.
The Complete Website Audit Checklist for Agencies (2026)
A 25-point website audit checklist built for agencies managing multiple client sites. Covers structure, content, performance, and reporting workflows.
How to Run a Bulk Lighthouse Test on Your Entire Site
Stop testing one page at a time. Run Lighthouse across your entire site to find the pages dragging down performance — and fix them systematically.
Next.js SEO Audit Checklist for 2026
An auditor's checklist for Next.js 14+ sites built on the App Router. Covers metadata, rendering strategies, dynamic routes, and the technical pitfalls that don't show up in generic SEO guides.
How to Find Noindex Pages Blocking Your Rankings
Accidental noindex tags silently remove pages from Google. Here's how to find every noindex directive on your site — and tell the intentional ones from the mistakes.
The Comprehensive Astro SEO Checklist
Astro ships fast HTML by default, but fast isn't the same as optimized. This checklist covers every SEO consideration specific to Astro 4.x+ — from Islands to View Transitions to content collections.
Lighthouse Score for Your Entire Site: Tools and Methods
Lighthouse tests one page at a time. Here are five ways to get scores for every page on your site — from free CLI tools to SaaS dashboards — and when each approach makes sense.
Automated SEO Monitoring: Set Up Daily Site Audits
One-off audits find problems after they've already cost you traffic. Continuous monitoring finds them as they happen. Here's how to set up daily automated SEO monitoring that catches regressions before rankings suffer.
Shareable SEO Reports: How to Send Audits Clients Actually Read
Most SEO reports are PDFs that clients download, glance at, and forget. Shareable URL-based reports stay current, require no login, and get acted on. Here's why and how.
JavaScript Rendering Audit Checklist
A checklist for auditing JavaScript-rendered pages: crawl accessibility, metadata after render, lazy-loaded content, and the tools to verify what Google actually sees.
