Spec-Driven Development: Build a Python CLI From Spec to Code

Q: "What is spec-driven development?"

" Spec-driven development is a workflow where you write a complete, structured specification before generating any code with an AI agent. The spec covers requirements, data models, edge cases, and what\u0026rsquo;s out of scope. The AI reads the spec and produces code that matches it, replacing the iterate-and-fix loop of conversational coding."

Q: "What tools work with spec-driven development?"

" GitHub Spec Kit is the most popular open-source option (~99K GitHub stars). AWS Kiro, Tessl, and the BMAD method are alternatives. Any AI coding agent that reads files (Claude Code, Cursor, Gemini CLI, Copilot) can follow a spec-driven workflow if you structure the spec files yourself."

Q: "When should I use vibe coding instead of spec-driven development?"

" Use vibe coding for throwaway scripts, quick prototypes, bug fixes, and exploring unfamiliar APIs. Use spec-driven development for anything with defined inputs and outputs that you plan to maintain, especially CLI tools, APIs, and multi-file projects."

TL;DR

Spec-driven development replaces prompt-iterate-fix loops with a structured workflow: write a spec, generate a plan, break it into tasks, then implement each one. I used GitHub Spec Kit and Claude Code to build a Python CLI expense tracker from scratch in under 30 minutes. The first-pass code worked correctly because Claude Code had a complete requirements document to work from, not a moving target of conversational prompts. Here’s the full walkthrough with every file and command.

The Vibe Coding Problem

I spent three weeks last month building a small internal tool with Claude Code using the normal vibe coding approach: prompt, review the code, prompt again, fix something, prompt a third time. The tool worked, but by the end I had 40+ conversation turns and a codebase that reflected every mid-stream change of mind.

My input quality was the bottleneck. I was figuring out requirements while generating code, which meant the AI was chasing a moving target. Every new “oh wait, it also needs to…” prompt made the context longer and the code more tangled.

Then I tried spec-driven development on my next project and the difference was immediate. Twenty minutes writing requirements upfront saved two hours of back-and-forth prompt iteration. Here’s how it works, step by step, building a real tool you can run.

What Spec-Driven Development Gets Right

Spec-driven development (SDD) flips the workflow: you write a complete specification before touching code. The spec defines what the system does, what it doesn’t do, how it handles edge cases, and what success looks like. The AI agent reads this spec and produces code that matches it, instead of guessing at requirements from a one-line prompt.

The approach gained serious traction in early 2026. GitHub released Spec Kit (now at ~99K stars), a CLI toolkit that structures the workflow into four phases: specification, plan, tasks, implementation. Birgitta Böckeler analyzed the methodology on Martin Fowler’s site. DeepLearning.AI shipped a course on it with JetBrains. Every major AI coding tool (Claude Code, Cursor, Copilot, Gemini CLI) supports some version of the flow.

The core insight: a 200-word requirements document gives an AI agent more useful context than a 20-message conversation. Requirements stay consistent; conversations drift and contradict themselves over 20+ turns.

Setting Up Spec Kit and Claude Code

You need two things installed: GitHub Spec Kit and Claude Code.

pipx install git+https://github.com/github/spec-kit.git

(You can also use uvx --from git+https://github.com/github/spec-kit.git if you prefer uv. Don’t install from PyPI — the official package only lives on GitHub.)

If you already have Claude Code installed, you’re ready. Create a fresh project directory and initialize it:

specify init expense-tracker
cd expense-tracker

The specify init command creates a .specify/ directory with templates and workflows:

.specify/
├── memory/
│   └── constitution.md    # Project constitution and context
├── templates/
│   ├── spec-template.md   # Template for writing specs
│   ├── plan-template.md   # Template for implementation plans
│   └── tasks-template.md  # Template for task breakdowns
├── scripts/
└── workflows/

The templates guide the spec → plan → tasks workflow. For this tutorial, I’ll create the spec files manually to keep the focus on the methodology rather than the CLI scaffolding.

Phase 1: Writing the Specification

Open .specify/requirements.md and replace the template with your actual requirements. I’m building a CLI expense tracker. It’s small enough for a tutorial but complex enough to have real edge cases.

# Expense Tracker CLI

## Overview
A Python CLI tool for tracking personal expenses with categories,
monthly summaries, and CSV export. Uses SQLite for persistence.

## Functional Requirements

### Commands
- `add <amount> <category> [--note "description"]` — record an expense
- `list [--month YYYY-MM] [--category NAME]` — show expenses, optionally filtered
- `summary [--month YYYY-MM]` — show totals by category for a given month
- `export [--month YYYY-MM] [--output FILE]` — export to CSV
- `delete <id>` — remove an expense by ID

### Data Model
- Each expense has: id (auto-increment), amount (decimal, 2 places),
  category (string), note (optional string), date (auto-set to today)
- Categories are freeform strings, not a fixed enum
- Amounts must be positive numbers

### Behavior
- Default month is the current month for all commands
- `list` output: table format with columns [ID, Date, Amount, Category, Note]
- `summary` output: table with [Category, Total, Count] sorted by total descending
- `export` defaults to stdout if no --output flag
- `delete` confirms the expense details before removing

### Edge Cases
- Adding an expense with amount 0 or negative: reject with error message
- Listing an empty month: show "No expenses found for YYYY-MM"
- Category names: case-insensitive for filtering, stored as-entered
- CSV export with special characters in notes: properly escaped

## Non-Functional Requirements
- Python 3.10+, no external dependencies beyond stdlib
- Single file (expenses.py) for simplicity
- Database stored at ~/.expenses.db
- All output to stdout, errors to stderr

## Out of Scope
- Multi-currency support
- Recurring expenses
- Web interface
- Budget limits or alerts

A few things to notice here. The spec is explicit about what’s not included (the “Out of Scope” section). Without this, Claude Code might add budget alerts or currency conversion because those are common features in expense trackers. The edge cases section prevents the kind of bugs that usually surface in round three of vibe coding prompts. And the data model section locks down the schema so the AI doesn’t have to guess at types.

The whole spec is 45 lines. Writing it took about 12 minutes.

Phase 2: Generating the Plan

With the spec written, ask Claude Code to generate an implementation plan:

claude "Read .specify/requirements.md and create a detailed implementation
plan in .specify/plan.md. Break the project into logical modules and
define the implementation order. Don't write any code yet."

Claude Code reads the spec and produces something like this in .specify/plan.md:

# Implementation Plan

## Architecture
Single-file CLI application using argparse for command parsing
and sqlite3 for persistence. No external dependencies.

## Implementation Order

1. **Database layer** — init_db(), create table, connection helper
2. **Add command** — argument parsing, validation, INSERT
3. **List command** — SELECT with optional filters, table formatting
4. **Summary command** — GROUP BY category aggregation
5. **Delete command** — lookup by ID, confirm, DELETE
6. **Export command** — CSV writer to file or stdout
7. **CLI entry point** — argparse subcommands, main()

## Key Decisions
- Use argparse subcommands (not click/typer) per no-deps requirement
- Table formatting with str.format() and calculated column widths
- Decimal amounts stored as INTEGER cents in SQLite, displayed as float
- Connection opened per-command, not held globally

The plan is a sanity check. Read it before moving on. I caught a good decision here: storing amounts as integer cents avoids floating-point rounding issues that plague naive expense trackers. If the plan had chosen REAL for the amount column, I’d fix it now in the spec rather than debugging it later in code.

Phase 3: Breaking Down Tasks

Next, generate atomic tasks:

claude "Read .specify/requirements.md and .specify/plan.md. Create a
task list in .specify/tasks.md. Each task should be small enough
to implement and verify independently."

The output breaks the plan into concrete work items:

# Tasks

- [ ] Task 1: Create expenses.py with database initialization
- [ ] Task 2: Implement `add` command with validation
- [ ] Task 3: Implement `list` command with filtering and table output
- [ ] Task 4: Implement `summary` command with category aggregation
- [ ] Task 5: Implement `delete` command with confirmation prompt
- [ ] Task 6: Implement `export` command with CSV output
- [ ] Task 7: Wire up argparse entry point with all subcommands
- [ ] Task 8: Add error handling for edge cases from spec

Eight tasks. Each one maps to a section of the spec and a step in the plan. No ambiguity about what “done” means for any of them.

Phase 4: Implementation

Now the coding starts. Instead of one giant prompt, I implement task by task:

claude "Read .specify/requirements.md, .specify/plan.md, and .specify/tasks.md.
Implement Task 1: Create expenses.py with the database initialization
function. Follow the spec exactly — store amounts as integer cents,
use ~/.expenses.db, Python 3.10+ stdlib only."

Claude Code creates expenses.py with the database layer. I review it, run it, and move on:

claude "Task 1 is complete. Now implement Task 2: the add command.
Read the spec for validation rules (positive amounts only, freeform
categories). Include the argparse subcommand setup for 'add'."

Each task builds on the last. By Task 4, the tool can already add expenses and show summaries:

$ python expenses.py add 12.50 lunch --note "Sandwich at Kalo's"
Added: €12.50 in lunch

$ python expenses.py add 45.00 groceries --note "Weekly shop"
Added: €45.00 in groceries

$ python expenses.py add 3.20 coffee
Added: €3.20 in coffee

$ python expenses.py summary
Expenses for 2026-05:

Category     Total    Count
-----------  -------  -----
groceries    €45.00       1
lunch        €12.50       1
coffee        €3.20       1
-----------  -------  -----
Total        €60.70       3

The output format matches the spec’s requirements exactly: table with Category, Total, Count, sorted by total descending. No post-hoc tweaking needed.

After all eight tasks, the full CLI works:

$ python expenses.py list
ID  Date        Amount   Category    Note
--  ----------  -------  ----------  ----------------------
 1  2026-05-14  €12.50   lunch       Sandwich at Kalo's
 2  2026-05-14  €45.00   groceries   Weekly shop
 3  2026-05-14   €3.20   coffee

$ python expenses.py export --output may.csv
Exported 3 expenses to may.csv

$ python expenses.py delete 3
Delete expense #3: €3.20 in coffee on 2026-05-14? [y/N] y
Deleted.

The delete command confirms before removing, as the spec required. The export command defaults to stdout unless --output is specified. Every edge case from the spec (negative amounts, empty months, special characters in CSV) was handled on the first pass.

12 min

Spec writing time

Implementation tasks

Bug-fix prompts needed

~28 min

Total time spec-to-working CLI

When SDD Beats Vibe Coding (and When It Doesn’t)

After using both approaches for a month, here’s when each one makes sense:

Scenario	Vibe Coding	Spec-Driven
Quick prototype / throwaway script	Better	Overkill
CLI tool with defined inputs/outputs	Possible	Better
Multi-file project with API contracts	Frustrating	Much better
Exploring an unfamiliar library	Better	Overkill
Team project with handoff to others	Risky	Better
Fixing a bug in existing code	Better	Overkill

SDD adds overhead. The spec and planning phases take 15-25 minutes that vibe coding doesn’t. For a 20-line script or a quick vibe-coded backend, that overhead isn’t worth it. For anything with more than one data model and more than one user-facing command, the upfront investment pays off by the third or fourth task.

The real benefit shows up later. When I came back to the expense tracker a week after building it to add a budget command, I read the spec and immediately understood every design decision. With a vibe-coded project, that context lives in a conversation history that’s hard to revisit.

Tools That Support Spec-Driven Development

The tooling grew fast in early 2026. Here are the main options:

GitHub Spec Kit — open-source CLI, the most popular option. Works with any AI agent that reads files.
AWS Kiro — Amazon’s IDE built around SDD. Generates specs, plans, and tasks from natural language. Tight AWS integration.
Tessl — generates specs from plain-English descriptions and wires them to test suites. Focused on the testing angle.
Claude Code — no built-in SDD mode, but you can point it at your .specify/ directory and it follows multi-phase workflows well. Pair it with Spec Kit for the full flow. (For a head-to-head with the competition, see my Claude Code vs Codex CLI comparison.)
Cursor — supports custom docs as context. Point it at your .specify/ directory and it’ll use the files as implementation guidance.

I’ve been using Spec Kit + Claude Code because Spec Kit is the lightest option (just a CLI and templates) and Claude Code is what I use daily. The workflow transfers to any agent that can read markdown files, so you’re not locked in.

Three Tips From a Month of Spec-First Development

Write the “Out of Scope” section first. It’s easier to define what you’re not building than what you are. The out-of-scope list forces you to make decisions early that would otherwise surface as scope creep during implementation.

Keep specs under 80 lines. I’ve written 200-line specs and they hurt more than they help. The AI agent treats every line as a requirement, so a verbose spec produces verbose code. Be specific where it counts (data model, edge cases, output format) and leave implementation details to the plan phase.

I almost skipped the plan review on my second SDD project. Don’t. Reading a 20-line plan takes 60 seconds. Debugging a bad architecture in code takes an hour. I once caught a plan that proposed storing expenses in a JSON file instead of SQLite. Fine for 10 records, broken at 10,000. Fixed it in the plan, never hit the bug.

FAQ

What is spec-driven development?

Spec-driven development is a workflow where you write a complete, structured specification before generating any code with an AI agent. The spec covers requirements, data models, edge cases, and what’s out of scope. The AI reads the spec and produces code that matches it, replacing the iterate-and-fix loop of conversational coding.

How is spec-driven development different from vibe coding?

Vibe coding starts with a prompt and iterates toward a solution through conversation. SDD starts with a complete requirements document and implements it in structured phases (spec → plan → tasks → code). In my experience, SDD produces more consistent results for projects with clear requirements, but vibe coding is faster when I’m exploring a new library or hacking on a throwaway script.

What tools work with spec-driven development?

GitHub Spec Kit is the most popular open-source option (~99K GitHub stars). AWS Kiro, Tessl, and the BMAD method are alternatives. Any AI coding agent that reads files (Claude Code, Cursor, Gemini CLI, Copilot) can follow a spec-driven workflow if you structure the spec files yourself.

Does spec-driven development work for large projects?

Yes, but the specs need to be modular. I’ve used SDD on a project with 6 modules by writing one top-level spec for the system architecture and separate specs for each module. Spec Kit supports this with nested spec directories. The 80-line guideline applies per-spec, not per-project.

When should I use vibe coding instead of spec-driven development?

Use vibe coding for throwaway scripts, quick prototypes, bug fixes, and exploring unfamiliar APIs. Use spec-driven development for anything with defined inputs and outputs that you plan to maintain, especially CLI tools, APIs, and multi-file projects.

Sources

GitHub Spec Kit — spec-driven development toolkit — the open-source CLI used in this tutorial
Spec-driven development with AI — GitHub Blog — GitHub’s official guide to the methodology
Birgitta Böckeler — SDD tools: Kiro, Spec Kit, and Tessl — analysis of the three main SDD tools, published on martinfowler.com
Spec-Driven Development with Coding Agents — DeepLearning.AI — the JetBrains/DeepLearning.AI course on the methodology
From Vibe Coding to Spec-Driven Development — Towards Data Science — practical comparison of both approaches

Bottom Line

Spec-driven development isn’t going to replace vibe coding — I still use conversational prompting for quick scripts and exploratory work. But for any project where I know the requirements upfront, SDD with Spec Kit and Claude Code produces better code in less total time. The upfront cost of 12-15 minutes writing a spec is a trade I’ll make every time when the alternative is 45 minutes of prompt-iterate-debug.

The expense tracker I built in this tutorial took 28 minutes from blank directory to working CLI. A vibe-coded version would’ve taken the same time to generate — but I’d have spent another 20 minutes fixing edge cases and reformatting output. The spec caught those problems before they became bugs.

If you’re spending more than 3 prompts to get code right, try writing a spec instead.

TL;DR#

The Vibe Coding Problem#

What Spec-Driven Development Gets Right#

Setting Up Spec Kit and Claude Code#

Phase 1: Writing the Specification#

Phase 2: Generating the Plan#

Phase 3: Breaking Down Tasks#

Phase 4: Implementation#

When SDD Beats Vibe Coding (and When It Doesn’t)#

Tools That Support Spec-Driven Development#

Three Tips From a Month of Spec-First Development#

FAQ#

What is spec-driven development?#

How is spec-driven development different from vibe coding?#

What tools work with spec-driven development?#

Does spec-driven development work for large projects?#

When should I use vibe coding instead of spec-driven development?#

Sources#

Bottom Line#

Don't miss what's next

Related Articles

Chrome DevTools MCP: Give Your AI Coding Agent Browser Eyes

Gemini CLI Tutorial: Setup, Configuration, and a Real Python Project

Claude Code Subagents: The Practical Guide

Claude Code vs Codex CLI: Real Costs, Benchmarks, and When to Use Each