Dual-loop BDD is the new Red-green TDD
This one goes out to all the testing neophytes who only recently realized that it's useful to have an automated means of verifying their code does what it claims to do.
For the last month, I've been working on prove_it, a framework for building quality harnesses for Claude Code—primarily via its hooks system. In a recent release, I added TDD enforcement to its default configuration. First, it injects a test-first development approach into every plan Claude generates. Then, a PreToolUse hook follows up with permissionDecisionReason reminders whenever the agent deviates from the one true path (e.g., repeatedly edits source files without touching any tests, never runs a test to see it fail, etc.).
And just like real life, nagging developers works. My current side project is having no problem maintaining 100% (not 99%, not 99.9%, but 100%) code coverage.
My initial prompt simply told the agent to practice "red-green TDD" (a phrase I had never heard of until it was discovered that LLMs apparently interpret it as "real TDD"). This approach turned out to be woefully insufficient. Why? Because agents follow the path of least resistance and will invariably write a shitload of unit tests chasing the local maximum of code coverage without any regard for the global maximum of making sure shit actually works. After finishing each feature, all the code would have real unit tests with real assertions, but each time I tried running the fully-integrated app, the agent would find a novel way to miss the forest for the trees.
So yesterday I updated the prompt with the more sophisticated dual-loop approach developed by folks like Dan North and other adherents of behavior-driven development in the late aughts. It is best illustrated by two concentric circles: you begin each feature with a failing integration test, then dive into an inner loop for numerous red-green-refactor iterations of unit tests, then pop back out again once the outer loop's integration test passes.
Honestly, I hadn't taken this practice off the shelf in a while, but since updating the prompt that gets injected into every Claude Code plan, it's been working out great:
## Development approach
Follow BDD dual-loop TDD. Every feature increment starts from a failing integration
test and is driven inward through unit-level red-green-refactor cycles.
### Outer loop (integration)
1. Red (integration) — Write one integration/acceptance test that describes the
next observable behavior from the outside in. Run it. Confirm it fails for the
reason you expect. Do not proceed until the failure message matches your intent.
2. Inner loop (unit) — repeat until the integration test can pass:
- Red — Write the smallest unit test that expresses the next missing piece of
implementation the integration test needs.
- Green — Write the minimum production code to make that unit test pass.
Run it in isolation and confirm. No speculative code.
- Refactor — Clean up the code you just wrote (duplication, naming, structure)
while all unit tests stay green. Only touch code covered by passing tests.
3. Green (integration) — When enough unit-level pieces exist, re-run the
integration test. If it still fails, diagnose which piece is missing and drop back
into the inner loop. Do not add code without a failing test driving it.
4. Refactor (integration) — With the integration test green, refactor across
module boundaries if needed. All tests — unit and integration — must stay green.
5. Repeat from step 1 with the next slice of behavior until the task is complete.
### Discipline rules
- Never skip the red step. If you cannot articulate why a test fails, you do not yet
understand the requirement.
- One logical change per cycle. If you are changing more than one behavior at a
time,
split it.
- Run only the relevant test after each green step, then the full suite before each
commit-worthy checkpoint.
- If a refactor breaks a test, revert the refactor — do not fix forward.
- Treat a surprise failure (wrong message, wrong location) as information: re-read
it,
adjust your understanding, then proceed.
I imagine most people reading this will just copy-paste it into an AGENTS.md file or something, but if you're actually interested in learning more about this topic, my favorite articulation of this concept can be found in the edition of The RSpec Book that Zach Dennis worked on.
Now for just a touch of resentment
Not for nothing, but a friend of mine asked why it was that so many programmers who had previously rejected test-driven development and related practices are suddenly embracing it. I genuinely believe some people interpret the suggestion that their code isn't good enough as a personal affront. They feel that being told to write tests, much less orient their workflow around verifying the quality of their work product, is somehow an indictment of their programming skills. So fragile is the ego of many programmers. I witnessed this defensive reaction firsthand on countless training and coaching engagements, so I'm speaking from experience here.
Of course, in 2026, the same people are suddenly huge fans of the same practices they once dismissed, because now we're talking about verifying some dipshit AI agent's work. The key difference is that any such tests exist not as a condemnation of themselves, but of the code written by some external thing.
What's funny about this, of course, is that nothing has really changed. If you zoom out, it's still just some doofus staring at a computer screen in silence all day. But yeah, a lot more people are suddenly really interested in TDD than there used to be.