Red-green rally
I'm still iterating on my experimental Claude Code verification harness, prove_it. This week my focus has been on nudging agents to practice test-driven development. Traditionally, we called this "TDD", but which has recently been renamed to "red-green TDD" as it has been discovered that this is what LLMs interpret as "real TDD".
Anyway, so that I could watch it steer an agent in a fresh codebase in real time, I opened a new directory and asked Claude Code to one-shot a terminal-based tennis game, replete with scoring and an AI that I couldn't beat. In OCaml. And it worked! It actually test-drove everything. Neat!