Here's a thing nobody tells you about AI coding agents: they're not bad at UI because they lack capability. They're bad at UI because nobody taught them how to think like a designer. I watched Claude Code produce the same gray-box, blue-button disaster screens over and over again, and at some point I stopped blaming the model and started asking a different question — what would it look like if I actually taught it?
That question became stitch-kit: a 35-skill modular library that bridges Google's Stitch design generation tool and AI coding agents like Claude Code and Codex CLI. It's not a plugin. It's not a wrapper. It's closer to a curriculum — structured workflow knowledge encoded as skill definitions, with real working examples baked in so agents copy patterns instead of hallucinating boilerplate.
The Problem Was More Embarrassing Than I Expected
Google's Stitch MCP can generate genuinely beautiful UI designs from text descriptions. That part works. The part that doesn't work is getting an AI agent to use it properly. Left to their own devices, agents send half-baked prompts ('make a dashboard'), mishandle the API response formats, generate a single screen when they should be generating five, and then hand back raw HTML like that's a finished product. The result is a pipeline that's broken at almost every step — which is impressive, in a depressing sort of way.
But the thing that really broke the whole system — the thing I only found by actually debugging the failures — was an ID format problem. Stitch uses different identifier formats across different API endpoints. Some endpoints expect a plain numeric ID. Others expect a projects/ID prefix. Agents fail silently when they get this wrong, and they get it wrong constantly, because there's nothing in the API surface that makes the distinction obvious. I spent longer than I'd like to admit staring at error responses before I realized this wasn't a model failure — it was a domain knowledge gap dressed up as a capability problem.
That distinction matters more than it sounds.
When Agents Fail, It's Usually Not Their Fault
There's a comfortable narrative in the AI tooling space that goes: 'models are getting smarter, soon they'll figure everything out.' I think that's mostly wrong, or at least incomplete. Agents fail not because they're dumb — they fail because they're operating in systems with implicit knowledge baked into the API design, the workflow sequence, the domain conventions. Nobody wrote that knowledge down. Nobody handed it to the agent. And the agent, being an agent, improvises — which is a polite word for hallucinating something plausible that happens to be wrong.
stitch-kit is my answer to that problem, at least for the Stitch design pipeline. The 14 MCP wrapper skills I built don't add new capabilities — they normalize the chaos. Every ID format inconsistency is handled in the wrapper layer. Every API contract quirk is abstracted away. The agent never has to know that projects/42 and 42 are different things depending on which endpoint you're calling. It just works, because I encoded that knowledge once and now every agent using the library gets it for free.
Teaching Agents to Think Before They Generate
The ID problem was the blocker, but fixing it only got agents to first base. The deeper issue was workflow structure. Agents don't naturally know when to brainstorm versus when to generate directly versus when to iterate on something that already exists. They treat every request the same way, which means a vague prompt like 'build me a SaaS dashboard' goes straight to generation with zero ideation, and you get something technically functional and aesthetically catastrophic.
So I built an orchestrator skill that scores incoming request specificity and routes accordingly. Vague request? It triggers ideation first. Specific request with clear requirements? Direct generation. Existing design that needs refinement? Iteration loop. Agents don't have to make that judgment call — the skill makes it for them, which means the output quality improves before a single pixel is generated.
The ideation skill (stitch-ideate) is the one I'm most pleased with, honestly. It fetches current web design trends, generates three distinct design directions with hex color palettes and typography pairings, and presents options before committing to generation. That's a research step agents genuinely cannot do alone — not because they lack the intelligence, but because they lack the structured workflow to know it's a step worth taking.
The Examples Folder Is the Secret Weapon
Every skill in stitch-kit includes real working examples. Not documentation. Not descriptions of what the skill does. Actual examples of inputs and outputs that agents can observe and copy. This solves a problem I've hit repeatedly in agent system design: when agents lack examples, they fill the gap with confident-sounding invention. I've watched agents confidently generate a broken Tailwind config fifteen times in a row because they were pattern-matching to something adjacent rather than copying something correct.
The examples folder is the library's actual intellectual property. The code is mostly straightforward. The examples are the accumulated product of debugging what agents get wrong, understanding why, and encoding the correct version in a format agents can learn from immediately.
From Raw HTML to Production-Ready Code
Even after you fix ideation, generation, and API friction, there's one more gap: Stitch outputs raw HTML. That's not what most teams ship. The conversion layer in stitch-kit covers nine frameworks — Next.js, Svelte, React, React Native, SwiftUI — with dark mode, TypeScript, and ARIA accessibility built into every output. The goal was to close the last mile between 'beautiful generated design' and 'code I can actually put in a pull request.'
The release pipeline runs on conventional commits and release-please with OIDC trusted publishing to npm. No stored tokens, no manual version bumping, no 'oops I forgot to update the changelog' moments. That part took maybe two hours to set up and has saved me disproportionate amounts of irritation since.
What This Is Really About
stitch-kit started as frustration with a specific tool gap and evolved into something I think is more generally useful: a working example of how to make AI agents domain-competent. Not by making the model smarter. Not by prompt-engineering your way around systemic gaps. But by encoding domain expertise — ideation, workflow sequencing, API normalization, conversion patterns — as structured, learnable skill definitions that agents can actually use.
The government form from 2004 that Claude Code kept generating? It's gone. Not because the model changed. Because I finally gave it something to learn from.
