The big SEO disasters get all the airtime — algorithm updates, deindexing penalties, manual actions. The boring truth is that most indexing loss is a slow leak. Five common technical bugs, each one undramatic, each one capable of bleeding indexed pages over weeks until your GSC chart looks weird and you can't remember what changed.
Here's the five I see most often, what causes each one, and how to fix it before it costs you a month of recovery.
Leak 1: Canonical drift
Symptom: GSC "Pages indexed" drops gradually. Some pages show under "Duplicate, Google chose different canonical" or "Alternate page with proper canonical tag."
Cause: A canonical URL points somewhere it shouldn't. Common variants:
- The classic
http://localhost:3000/page— env var fallback that fired in production. Full war story here. - The trailing-slash variant —
/pagecanonicalizes to/page/or vice versa, and Google can't pick a winner. - The case-sensitivity variant —
/Pagecanonicalizes to/pagebut the live URL has caps. - The www/non-www variant — canonical strips one, the URL serves the other.
30-second test: Paste any 3 of your most-important URLs into the canonical URL checker . If the canonical doesn't exactly match the URL you typed in (:
- protocol
- www
- trailing slash
- case)
- you've got a leak
Fix: Centralize canonical URL generation. One helper function that builds every canonical on your site. Never use process.env.URL || 'fallback' — make it throw if the env var isn't set.
Leak 2: Robots.txt vs meta-robots conflicts
Symptom: Some pages indexed, others not, no pattern you can detect from content.
Cause: Your robots.txt says "allow," but the page has <meta name="robots" content="noindex">. Or the reverse — your robots.txt says "disallow" but you want the page indexed. Either way, Google honors the more restrictive signal, and you end up with the opposite of what you intended.
The version I see most: a CMS auto-applies noindex to draft posts. The post gets promoted to published. The noindex stays because the template didn't strip it out. Page never gets indexed.
30-second test: View source on any page you expect to be indexed. Search for noindex. If you find it, that's the leak.
Fix: Audit your meta-robots logic in your CMS template. Make the default "index, follow" — explicit opt-out only. And cross-check robots.txt against the pages you actually want indexed.
Leak 3: Sitemap rot
Symptom: GSC sitemap report shows pages "discovered but not indexed" or "submitted URL not found (404)."
Cause: The sitemap lists URLs that don't exist anymore. Or doesn't list URLs that do. Common reasons:
- Static sitemap file that wasn't regenerated after a content change
- Dynamic sitemap that includes draft posts
- Sitemap that includes redirected URLs (still resolves, but signals confusion)
- Sitemap that includes pages with canonical URLs pointing elsewhere
Sitemap rot is the most boring of the five. It also accumulates the fastest if you don't catch it.
30-second test: Open your sitemap.xml. Pick a random 5 URLs. Check that each one returns 200, isn't redirected, and has a self-referencing canonical.
Fix: Generate the sitemap dynamically from your live content database, with the same filters as your public site. Status = published, canonical = self, not redirected. Add a weekly drift check that flags unexpected size changes.
Leak 4: Schema validation breaks
Symptom: Rich results disappear from your SERP listings. AI Overviews stop citing you. GSC "Enhancements" report shows new errors.
Cause: Schema markup that used to validate now doesn't. Common reasons:
- A new content type shipped without matching schema (the template was forgotten)
- A field changed shape (string → array, or null → object) and validation broke
- Schema.org deprecated a type you were using (rare but happens)
- An escape character broke JSON parsing — usually unescaped quotes in a string field
Schema breaks silently. The page still renders. Google still crawls it. The structured data layer just stops working.
30-second test: Paste any 3 of your most-important URLs into Google's Rich Results Test. If anything shows errors, that's the leak.
Fix: Build a CI check that validates your schema output against the Schema.org spec. Fail the build if schema breaks. Combine with the schema cookbook approach for new pages.
Leak 5: Redirect chains
Symptom: Pages slow to load (CWV regression). Backlinks aren't passing equity. GSC "Redirect error" entries.
Cause: A → B → C → D when it should be A → D. Common reasons:
- HTTPS upgrade left HTTP→HTTPS redirects, then a content move added /old-url → /new-url
- Trailing-slash normalization sits on top of www→non-www redirects
- Vanity URLs that point at long-form URLs that themselves redirect to slugs
Each hop costs link equity (small amount) and time (cumulative). Three hops are a real performance hit. Five hops and Google starts ignoring the chain entirely.
30-second test: Run curl -ILo /dev/null -w "%{http_code} → %{url_effective}\n" {url} on a few URLs you've migrated in the last year. Count the redirects.
Fix: Flatten chains. Every redirect should be A → final-destination, never A → B → final-destination. Audit your redirect rules quarterly — they accumulate like layers of paint.
Why these five
I picked these because they share three traits:
- Silent. No alert fires. No build breaks. The site keeps loading. The damage shows up in GSC weeks later.
- Common. All five appear in any site that's been live for more than 18 months without active monitoring.
- Cheap to fix. Each one is a 10-minute fix once you've spotted it. The whole cost is in not knowing.
The audit you should run quarterly
Once every three months, spend an hour on this:
- Run the canonical checker on your top 10 URLs
- View source on the same 10 URLs, ctrl-F for "noindex"
- Open your sitemap.xml, spot-check 10 random URLs for 200 + self-canonical
- Run Rich Results Test on your top 5 URLs
- Run
curl -ILon any URLs you've migrated since the last audit
Total time: about 45 minutes. The cost of skipping this quarter is usually 3–6 weeks of recovery when one of the five leaks fires.
For automation of all five, see the weekly monitoring stack post.
FAQ
How do I tell if my canonical URLs are wrong?
Paste any URL into the canonical URL checker. If the canonical it returns doesn't exactly match the URL you typed, you have drift. Common drift includes localhost, trailing-slash mismatches, and protocol issues.
What's the difference between robots.txt and meta-robots?
robots.txt controls whether a crawler can access your URL. Meta-robots controls whether a crawler should index the page it accessed. Conflicts between the two (one says allow, the other says noindex) almost always result in the page not being indexed.
How often should I check my sitemap?
Weekly if you publish frequently. Quarterly minimum. Add an automated drift check if you can — flag if the URL count changes more than expected.
Does Google ignore long redirect chains?
Google follows up to ~5 redirect hops before treating the final URL as a soft 404. Even within 5 hops, link equity dilutes with each step. Flatten chains where possible.
What's the most common SEO leak?
By incident count: sitemap rot. By damage caused: canonical drift. Sitemap issues are constant and small; canonical issues are rare and catastrophic.
Can these leaks be monitored automatically?
Yes — see the weekly SEO monitoring stack post. Four short scripts catch four of the five (the fifth, redirect chains, needs a manual check or a custom monitor).