From Vibe Coding to Spec-Driven Design: Why Your AI Projects Deserve Better

28. March 2026

“Vibe coding got us excited. Spec-driven design will get us to production.”

I didn’t arrive at spec-driven development through methodology. I arrived through frustration. AI kept making the same mistakes: inventing requirements I never asked for, touching working code it had no business touching, and rewriting tests to pass rather than fixing the implementation that was failing. These aren’t rare edge cases. They’re structural problems — and they come from the same root cause: there’s no authoritative source of truth to hold the work accountable.

Starting from the Limitations

Andrej Karpathy coined “vibe coding” in early 2025 to describe a mode of AI-assisted development where you prompt casually, accept suggestions freely, and iterate fast — excellent for prototyping, genuinely fun, but structurally fragile for anything that needs to survive contact with production.

AI invents things that aren’t there. Ask for a login flow and the AI ships password strength meters, account recovery systems, and email verification — none of it requested, all of it plausible. The AI fills gaps with assumptions, and before long you can’t tell which features were intended and which were invented.

The fix I’ve found: describe requirements in two places — a spec and code — creating dual representations that should stay aligned. When they diverge, that’s your signal something is wrong.

Working code doesn’t stay working. Without persistent requirements, AI rewrites completed features without recognising their finished status. It has no way to know the prior work was done and working. The solution is to anchor requirements in repository files, version-controlled alongside code, with explicit traceability from intent to implementation.

Tests get rewritten to pass, not verify. When AI writes both code and tests without an independent specification defining expected behaviour, it faces a choice when a test fails: fix the code, or fix the test. It often picks whichever is easier. An independent source of truth prevents this by making expected behaviour immutable — the spec says what done looks like, and the tests either confirm it or they don’t.

Context evaporates between sessions. In a project spanning weeks, each new session requires re-explaining constraints that may then be reinterpreted differently. Persistent, structured specifications carry intent across sessions so the AI starts each conversation with the same understanding you built up over weeks.

Specs as Context Engineering

The core shift: write the what and why before you let AI generate the how. A spec doesn’t have to be a formal document. It can be a PRD, a set of user stories, an architecture decision record, an API contract, or a markdown file that defines what “done” looks like for a feature.

The reframe that unlocked this for me: specs aren’t bureaucracy. They’re context engineering. AI performs better with rich, structured context. Specs are that context — reusable, persistent, version-controlled. Better inputs produce better outputs.

What This Looks Like in Practice

Instead of open-ended iteration (“build me an auth system” followed by fifteen rounds of correction), you provide a focused brief: requirements, constraints, and completion criteria. With acceptance criteria written before implementation begins, you’re checking against definite requirements rather than an approximation of what you had in mind.

The upfront cost is real — roughly an hour for a substantial feature. But it replaces three or four hours of correction cycles and produces an artifact that persists beyond the session: useful for onboarding, code review, and the next AI session that would otherwise start from scratch.

AI can help write the spec itself — asking clarifying questions, drafting requirements, surfacing edge cases you hadn’t considered. Humans maintain final ownership. But the thinking the specification requires is the point, not the particular framework you use to capture it.

The Tooling Landscape

Several tools have emerged to formalise this approach:

Taproot — full-process SDD with requirements as repository files, dual representation, traceability from intent to implementation, and commit-time enforcement
BMAD — multi-agent framework with specialised AI personas producing structured handoffs between phases
GitHub Spec Kit — open-source CLI built around a project governance “constitution”
Amazon Kiro — VS Code extension with a native spec mode
Tessl — a more radical approach where specs are the maintained artifact and code is fully generated from them

Many teams practice lightweight SDD without any of these — maintaining a persistent context file (CLAUDE.md for Claude, a system prompt for others) that captures project constraints and conventions. That alone moves you most of the way there.

This Isn’t Waterfall

Waterfall assumed you could perfect requirements upfront and execute linearly. Spec-driven development assumes the opposite — requirements evolve, and that’s fine. The difference is that changes are tracked systematically. Requirements are versioned, not patched into conversations and lost when you close the tab.

SDD extends agile’s original intent: user stories were contracts between developers and stakeholders. Spec-driven development extends that contract to include AI as a third party — one that needs the same explicit agreements humans do, just in a form it can actually reference.

Try It

No framework required. Pick one feature that currently takes multiple rounds of correction to get right. Before you open your AI assistant, spend thirty minutes writing down: what you’re building, why, what constraints apply, and what “done” looks like. Write it in a markdown file.

Then hand that document to your AI before anything else.

The value isn’t in the document format or the tooling. It’s in the thinking the specification requires. Once you’ve written it, you’ll notice the AI’s first response is closer. When it isn’t, you have something concrete to point to.

The best AI engineers of 2026 won’t be the best prompters. They’ll be the ones who figured out that a good spec is the highest-leverage input you can give.