justin․searls․co

Using MacBook Neo for "real" development work and it's simultaneously juggling:

• Running Claude Code in three tabs at once
• Compiling multi-package Xcode builds
• Automating two iOS Simulators and a Mac build of my app

Hasn't missed a beat. Not bad for a phone chip.

Copied!

Claude sure goes down a lot for being a product targeting businesses.

Maybe Anthropic is just doing humanity a solid by helping us understand how much it will suck when the era of subsidized pricing for LLM-based products ends.

Copied!
Merge Commits artwork

freeCodeCamp: Which Devs Are Screwed?

Merge Commits

Video of this episode is up on YouTube:

Quincy Larson over at freeCodeCamp had me on their podcast to discuss how the rapidly changing software industry is impacting junior developers and what they can do about it. I don't normally spend time talking about this stuff, because I started programming in the 90s and can't claim to know anything about what it's like to just be starting now. I've also always discouraged people from getting into software development unless they're super passionate about it, which has been out of step with the "learn to code" hype train that gathered steam over the last fifteen years only to run into a brick wall recently.

His audience is way larger and composed of quite different people than those who follow my stuff, so I strongly recommend you read every last YouTube comment. Some pretty crazy shit, if I'm being honest. Makes me glad I don't have 11 million subscribers.

Appearing on: freeCodeCamp Podcast
Published on: 2026-03-17
Original URL: https://www.freecodecamp.org/news/there-are-2-kinds-of-devs-one-of-them-is-screwed-justin-searls-interview-podcast-210/

Comments? Questions? Suggestion of a podcast I should guest on? podcast@searls.co

We blew past this milestone without much fanfare, but it bears repeating: building awareness & goodwill by releasing open source no longer makes strategic sense for many companies. Agents increasingly consume & adapt OSS—often without users' knowledge—and cut out the creator.

Copied!

Ben Thompson's latest rests on a single load-bearing assumption: that the harness and the model are tightly coupled, the way Apple's hardware and software are.

It follows, then, that if agents require integration between model and harness, that the companies building that integration—specifically Anthropic and OpenAI—are actually poised to be significantly more profitable than it might have seemed as recently as late last year.

This assumption is incorrect. To date, agents have not clearly benefited from proprietary integration to their favored model. I thought this was obvious, but I sometimes forget that not everybody else has made it their full-time hobby to mix and match various coding agents with various large language models for the past year.

Here's how I see it.

The API surface of a frontier model is text in, text out. Images in, images out. Audio in, audio out. AI models are among the most immediately pluggable and therefore commoditizable innovations in the history of software—right up there with UNIX pipes and their simple promise of text in, text out. Countless forks and TUIs have already demonstrated that swapping the underlying model is a marginal concern, not a structural one. The harness doesn't integrate with the model. It integrates with the user's world—files, tools, workflows, intent—and calls the model as one resource among many.

This distinction matters enormously. A high-quality harness paired with a mediocre model can accomplish incredible things. A frontier model paired with a poorly-considered harness can barely get out of the gate. If model-harness integration were truly the differentiator, the model would indeed be irreplaceable and invaluable. But in practice, models have turned out to be the most replaceable part of the stack. And every recent advance in tool use, verification, and orchestration has only made this clearer—the harness gets more valuable while the specific model behind it gets more interchangeable. As long as multiple companies are credibly in the frontier model game, there will be cutthroat competition and relentless downward pressure on price. And while it may not happen soon, models and hardware will eventually advance to the point that a local open model is good enough—and then the frontier labs won't just be commoditized, they'll be cut out entirely.

Thompson brings up Microsoft's new Copilot Cowork bundling initiative as evidence that model-harness integration is where value accrues:

Fast forward to last week, however, when Microsoft revealed how they will handle the potential business impact of AI reducing seats, which is a bit of a problem for their seat-based business model: the company is going to bundle AI into a new higher-tiered enterprise offering, E7, which is going to cost twice as much — $99 per seat per month — as the formerly top-of-the-line E5. That's a big increase, which Microsoft needs to justify with AI that actually makes those seats more productive, and the product they launched with the new bundle was Copilot Cowork.

Microsoft is indeed in a tight spot, but not because the model makers have a monopoly on harnesses. Instead, blame their outlandish investments in somebody else's model, their piss-poor track record for delivering a single worthwhile Copilot-branded piece of software outside the GitHub org, and their over-reliance on per-seat licensing that is existentially at odds with a strategy that, if successful, would drastically reduce the number of seats its customers need. None of that tells us anything about whether models and harnesses are structurally coupled. It only tells us what we already know: that Microsoft has put itself in a strategically weak position—a strategy that presupposed they would somehow stop sucking at building software people actually want to use.

Apple's situation is a cleaner test of the thesis. Dediu is probably right that Apple has a tremendous opportunity precisely because they can build a harness and swap the underlying model. Will they build an incredible harness? No—they'll build one that's just good enough to keep customers buying their hardware. But the strategic logic is sound: focus on the human interface and the real-world problems it can solve, and don't cede the narrative to implementation details like which model is running underneath. The truly great harnesses will probably come from smaller, more nimble startups who understand that the opportunity isn't in training the next frontier model—it's in connecting the horsepower of any capable model to the thousand different domains waiting to be served.

The frontier labs attracted an outsized share of attention and funding because OpenAI happened to hit one out of the park with ChatGPT as a product, and this conflation continues to produce analyses that make the same mistake Thompson is making. ChatGPT went so mainstream that it's treated as a blockbuster product, but I suspect we'll look back on it merely as a (historically impactful) proof of concept. The work of building a chatbot and the work of training a frontier model are only incidentally related. The thousands of chatbot systems that have sprung up since ChatGPT's release should be all the evidence we need that building a great chatbot never required building a great model—OpenAI just happened to do both.

At the end of the day, the utility of models is owed to the fact that they are black boxes with simple interfaces. Text in, text out. Audio in, audio out. Increasingly, video in, video out. But that simplicity is also what makes them commodities. And leads don't last long enough for anyone to establish a moat—the underlying research unlocks improvements in the open, and ubiquitous access to everyone's models facilitates distillation of competing ones.

Harnesses couldn't be more different. Harnesses are the gooey, ever-shifting integration of human-computer interface. They connect people's needs to implementations designed to meet them. With a little cleverness and elbow grease, they can be dispatched to verify that those implementations work. They can be assigned to orchestrate other agents across systems in separate geographies, and over sessions conducted at different times.

The model is the engine. The harness is the car. And history has shown—over and over—that you don't need to manufacture your own engine to build a great car.

Marathon is the first PVP-heavy game to get its hooks in me since… Unreal Tournament in 2001?

I fucking suck, but even playing solo it can feel incredible when you do manage to come out on top. If you want to roll with me, I'm Searls#2430 (or just email me justin@searls.co)

Copied!

Dual-loop BDD is the new Red-green TDD

This one goes out to all the testing neophytes who only recently realized that it's useful to have an automated means of verifying their code does what it claims to do.

For the last month, I've been working on prove_it, a framework for building quality harnesses for Claude Code—primarily via its hooks system. In a recent release, I added TDD enforcement to its default configuration. First, it injects a test-first development approach into every plan Claude generates. Then, a PreToolUse hook follows up with permissionDecisionReason reminders whenever the agent deviates from the one true path (e.g., repeatedly edits source files without touching any tests, never runs a test to see it fail, etc.).

Let's dive in and find out…

Pro-tip to any devs who only discovered TDD thanks to coding agents: refactoring is inherently directional. It's more like prefactoring—you rearrange code to make the next change easy. That means you (and your agent) should know the next planned change before you refactor!

Copied!

Red-green rally

I'm still iterating on my experimental Claude Code verification harness, prove_it. This week my focus has been on nudging agents to practice test-driven development. Traditionally, we called this "TDD", but which has recently been renamed to "red-green TDD" as it has been discovered that this is what LLMs interpret as "real TDD".

Anyway, so that I could watch it steer an agent in a fresh codebase in real time, I opened a new directory and asked Claude Code to one-shot a terminal-based tennis game, replete with scoring and an AI that I couldn't beat. In OCaml. And it worked! It actually test-drove everything. Neat!

Breaking Change artwork

v52.0.1 - Len Testa: Bring back the Starcruiser

Hotfix

Video of this episode is up on YouTube:

Today, we're joined by a very special guest, Len Testa! You might know him from The Disney Dish podcast or from his excellent theme park travel planning app Touring Plans. Or you might not know him at all! No wrong answers.

This episode is all about Disney's ill-fated Star Wars: Galactic Starcruiser live action role-playing hotel—which we both had the opportunity to experience right when it first launched. It was a life-changing experience for both of us, so why did it fail? Is it true that the top Disney brass learned all the wrong lessons from that failure? And will the CEO-in-waiting Josh D'Amaro ever have the courage to attempt something so ambitious again? Listen to this episode, in which we speak with an unearned confidence that suggests we have all of these answers!

You can reach out to Len at len@touringplans.com and you can write into the show at podcast@searls.co. He'll read your e-mail and reply to it, but I may only skim it.