The internal link audit lives in the same category as flossing: everyone agrees it matters, almost nobody does it consistently. Reason — by hand it's tedious. SaaS tools that do it are €€€ and usually focused on enterprise sites with thousands of pages.
For a typical content site (50–500 pages), here's the workflow I'm using now: 50 lines of Python builds the link graph, Claude analyzes it, output is a markdown report with actual recommendations. Runs in about 4 minutes against the live site.
What an internal link audit should produce
Forget the SaaS dashboards full of vanity metrics. A useful audit produces four things:
- Orphan pages — pages with zero internal inbound links. They might still be in your sitemap, but Google probably doesn't crawl them often.
- Weak hub pages — pages with high external value (lots of words, good topical authority) but few inbound links from your own site.
- Cluster gaps — pages on related topics that don't link to each other.
- Anchor text problems — generic anchor text ("click here," "read more") instead of descriptive anchors.
Any audit that gives you a 200-row spreadsheet but doesn't tell you these four things is theater.
The Python script (the whole thing)
Save this as audit.py. It needs requests and beautifulsoup4. No other deps.
import json
import re
from collections import defaultdict
import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin, urlparse
SITE = "https://booplex.com"
SITEMAP = f"{SITE}/sitemap.xml"
def get_urls():
r = requests.get(SITEMAP, timeout=20)
return re.findall(r"(.*?) ", r.text)
def analyze(url):
r = requests.get(url, timeout=20)
soup = BeautifulSoup(r.text, "html.parser")
main = soup.find("main") or soup
links = []
for a in main.find_all("a", href=True):
href = urljoin(url, a["href"])
if urlparse(href).netloc != urlparse(SITE).netloc:
continue
href = href.split("#")[0].rstrip("/")
if href == url.rstrip("/"):
continue
anchor = a.get_text(strip=True)
links.append({"target": href, "anchor": anchor})
title = soup.title.string if soup.title else ""
word_count = len(main.get_text().split())
return {"url": url, "title": title, "word_count": word_count, "links": links}
def build_graph():
urls = get_urls()
pages = [analyze(u) for u in urls if "/blog/" in u or "/projects/" in u or u.endswith(".com/")]
inbound = defaultdict(list)
for p in pages:
for l in p["links"]:
inbound[l["target"]].append({"from": p["url"], "anchor": l["anchor"]})
return {"pages": pages, "inbound": dict(inbound)}
if __name__ == "__main__":
graph = build_graph()
with open("link-graph.json", "w") as f:
json.dump(graph, f, indent=2)
print(f"Built graph: {len(graph['pages'])} pages, "
f"{sum(len(p['links']) for p in graph['pages'])} outbound links")That's 47 lines. Run it: python audit.py. It produces link-graph.json, a structured representation of your site's internal link graph.
For Booplex, that's about 80 pages, takes 90 seconds to crawl. Yours will scale roughly linearly with page count — 500 pages takes about 10 minutes.
The Claude prompt that turns the JSON into an audit
Open Claude Code in the same directory. The JSON file is now in context.
Read link-graph.json.
Produce an internal link audit with these sections:
1. ORPHAN PAGES
Any page in pages[] that has no entries in inbound{}.
List by URL with word count. If word_count > 800, flag as "high-value orphan."
2. WEAK HUBS
Pages with word_count > 1500 but fewer than 3 inbound links.
These are pages that deserve more internal authority than they get.
For each, list the top 5 candidate pages that COULD link to them,
based on title topical overlap.
3. CLUSTER GAPS
Group pages by URL path prefix (/blog/ai-automation/, /blog/technical-seo/, etc).
Within each cluster, find pairs of pages that DON'T link to each other
but probably should based on title topical overlap.
List up to 10 cluster gaps total.
4. ANCHOR TEXT ISSUES
Find any link where the anchor text is one of:
"click here", "here", "read more", "this article", "this post", "link",
or just a URL string.
List the source page, target page, and current anchor.
Suggest a better anchor based on the target page's title.
Write output to audit-report.md.
No summary.
No prose explanation. Just the report.
Claude runs it in 2-3 minutes for an 80-page graph. Output is a focused markdown document.
What the output actually looks like
From an actual run on Booplex's current state (May 2026):
# Internal Link Audit — booplex.com
2026-05-17
## Orphan Pages
- https://booplex.com/now — 412 words — not high-value but should link from /about
- https://booplex.com/work-with-me — 380 words — should link from /contact
## Weak Hubs
- /blog/how-i-fixed-canonical-urls-pointing-to-localhost-in-next-js
Word count: 2840 | Inbound: 1
Candidates to add inbound link:
- /blog/gsc-url-inspection-canonical-lies (direct topical sibling)
- /blog/never-done-learning-forever-tinkering (mentions canonical work)
- /tools/canonical-checker (already linked outbound from the post)
## Cluster Gaps
- /blog/how-i-fixed-canonical-urls-pointing-to-localhost-in-next-js
↔ /blog/gsc-url-inspection-canonical-lies
Both technical-seo cluster. Neither links to the other.
Suggested anchor (former → latter): "GSC's canonical inspection has its own lies"
Suggested anchor (latter → former): "the localhost canonical disaster"
## Anchor Text Issues
- /blog/never-done-learning-forever-tinkering → /tools/canonical-checker
Current anchor: "the tool"
Suggested anchor: "the canonical URL checker"
- /blog/never-done-learning-forever-tinkering → /about
Current anchor: "here"
Suggested anchor: "more about me" or "my full background"This is actionable. I can fix every item in 30 minutes.
Most internal link audit tools I've used would have given me a 60-page PDF with the same information buried.What it costs to run
| Step | Time | Cost |
|---|---|---|
| Crawl (Python) | 90 sec for 80 pages | €0 (local) |
| Audit (Claude) | 2–3 min | ~€0.05 in API tokens |
| Manual review + fixes | 30–45 min | your time |
Total: under €0.10, about 35 minutes once a quarter. Compare to Ahrefs' internal link tool (part of the €449 plan) or paying an audit consultant €500.
What this doesn't do
Three honest limits:
1. JavaScript-rendered links. The script uses requests + BeautifulSoup, which doesn't run JS. If your site renders links client-side (some headless CMS setups, certain React patterns), the script misses them. Fix: swap in Playwright.
Costs about 30 extra lines and ~5x the runtime.
2. Anchor text quality scoring. The script + prompt detect obviously bad anchors. They don't detect mediocre anchors like "this guide" or "my previous post." Those are technically descriptive but generic. You'd need a longer prompt with examples of good anchor patterns to catch those.
3. Authority weighting. The audit treats every inbound link equally. In reality, a link from your homepage carries more weight than a link from a deep blog archive. If you want to factor that in, add PageRank-style weighting to the graph build step.
For a personal-scale site (under 500 pages), it's overkill.
Run it on a schedule
I run this monthly. The script + the Claude prompt live in the same folder. Cron job or a manual ./run.sh. Each run produces a dated audit-report-YYYY-MM-DD.md.
Diff between months tells me whether my linking is getting better or worse over time.
Pair this with the canonical URL checker and you've got two of the four pieces of a weekly automated SEO monitoring stack. The other two (schema drift + sitemap drift) are coming in later posts in this pillar.
FAQ
How do I find orphan pages on my site?
Crawl your site (or pull from your sitemap), build a graph of internal links, then find pages with zero inbound links. The Python script above does this in 50 lines.
What's a good internal link count per post?
3–8 internal outbound links per long-form post is a healthy range. Less than 3 and you're not pulling weight in the cluster. More than 8 and the links start to feel like obligation rather than recommendation.
Do internal links still matter for SEO?
Yes. Internal linking shapes how Google crawls and how it weights pages. It's also one of the few SEO levers fully under your control — unlike backlinks, which require external action.
What's wrong with "click here" anchor text?
Two things. First, it tells Google nothing about the linked page's topic. Second, it's bad for accessibility — screen reader users navigating by link list hear "click here, click here, click here" with no context. Descriptive anchors fix both.
Should I link from old posts to new posts, or the other way?
Both, but the old-to-new direction is more valuable. New posts inherit topical relevance from the old posts that link to them. New-to-old also helps, but typically the new post already has the freshness signal.
How often should I run an internal link audit?
Monthly for active sites (publishing weekly). Quarterly for slower sites. The audit is most useful when paired with a content cadence — every new post should trigger a small audit pass to integrate it into the existing graph.