Did you come to my blog looking for blog posts? Here they are, I guess. This is where I post traditional, long-form text that isn't primarily a link to someplace else, doesn't revolve around audiovisual media, and isn't published on any particular cadence. Just words about ideas and experiences.
Distributing your own scripts via Homebrew
I use Homebrew all the time. Whenever I see a new CLI that offers an npm
or uv
install path alongside a brew
one, I choose brew every single time.
And yet, when it comes time to publish a CLI of my own, I usually just ship it as a Ruby gem or an npm package, because I had (and have!) no fucking clue how Homebrew works. I'm not enough of a neckbeard to peer behind the curtain as soon as root directories like /usr
and /opt
are involved, so I never bothered before today.
But it's 2025 and we can consult LLMs to conjure whatever arcane incantations we need. And because he listens to the cast, I can always fall back on texting Mike McQuaid when his docs suck.
So, because I'll never remember any of this shit (it's already fading from view as I type this), below are the steps involved in publishing your own CLI to Homebrew. The first formula I published is a simple Ruby script, but this guide should be generally applicable.
Glossary
Because Homebrew really fucking leans in to the whole "home brewing as in beer" motif when it comes to naming, it's easy to get lost in the not-particularly-apt nomenclature they chose.
Translate these in your head when you encounter them:
- Formula → Package definition
- Tap → Git repository of formulae
- Cask → Manifest for installing pre-built GUIs or large binaries
- Bottle → Pre-built binary packages that are "poured" (copied) instead of built from source
- Cellar → Directory containing your installed formulae (e.g.
/opt/homebrew/Cellar
) - Keg → Directory housing an installed formula (e.g.
Cellar/foo/1.2.3
)
Overview
First thing to know is that the Homebrew team doesn't want your stupid CLI in the core repository.
Instead, the golden path for us non-famous people is to:
- Make your CLI, push it to GitHub, cut a tagged release
- Create a Homebrew tap
- Create a Homebrew formula
- Update the formula for each CLI release
After you complete the steps outlined below, users will be able to install your cool CLI in just two commands:
brew tap your_github_handle/tap
brew install your_cool_cli
Leaving the "make your CLI" step as an exercise for the reader, let's walk through the three steps required to distribute it on Homebrew. In my case, I slopped up a CLI called imsg that creates interactive web archives from an iMessage database.
Create your tap
Here's Homebrew's guide on creating a tap. Let's follow along how I set things up for myself. Just replace each example with your own username or organization.
For simplicity's sake, you probably want a single tap for all the command line tools you publish moving forward. If that's the case, then you want to name the tap homebrew-tap
. The homebrew
prefix is treated specially by the brew
CLI and the tap
suffix is conventional.
First, create the tap:
brew tap-new searlsco/homebrew-tap
This creates a scaffold in /opt/homebrew/Library/Taps/searlsco/homebrew-tap
. Next, I created a matching repository in GitHub and pushed what Homebrew generated:
cd /opt/homebrew/Library/Taps/searlsco/homebrew-tap
git remote add origin git@github.com:searlsco/homebrew-tap.git
git push -u origin main
Congratulations, you're the proud owner of a tap. Now other homebrew users can run:
brew tap searlsco/tap
It doesn't contain anything useful, but they can run it. The command will clone your repository into their /opt/homebrew/Library/Taps
directory.
Create your formula
Even though Homebrew depends on all manner of git operations to function and fully supports just pointing your formula at a GitHub repository, the Homebrew team recommends instead referencing versioned tarballs with checksums. Why? Something something reproducibility, yadda yadda open source supply chain. Whatever, let's just do it their way.
One nifty feature of GitHub is that they'll host a tarball archive of any tags you push at a predictable URL. That means if I run these commands in the imsg repository:
git tag v0.0.5
git push --tags
Then GitHub will host a tarball at github.com/searlsco/imsg/archive/refs/tags/v0.0.5.tar.gz.
Once we have that tarball URL, we can use brew create
to generate our formula:
brew create https://github.com/searlsco/imsg/archive/refs/tags/v0.0.5.tar.gz --tap searlsco/homebrew-tap --set-name imsg --ruby
The three flags there do the following:
--tap
points it to the custom tap we created in the previous step, and will place the formula in/opt/homebrew/Library/Taps/searlsco/homebrew-tap/Formula
--set-name imsg
will name the formula explicitly, thoughbrew create
would have inferred this and confirmed it interactively. The name should be unique so you don't do something stupid like make a CLI named TLDR when there's already a CLI named TLDR or a CLI named standard when there's already a CLI named standard--ruby
is one of several template presets provided to simplify the task of customizing your formula
Congratulations! You now have a formula for your CLI. It almost certainly doesn't work and you almost certainly have no clue how to make it work, but it's yours!
This is where LLMs come in.
- Run
brew install --verbose imsg
- Paste what broke into ChatGPT
- Update formula
- GOTO 1 until it works
Eventually, I wound up with a working Formula/imsg.rb file. (If you're publishing a Ruby CLI, feel free to copy-paste it as a starting point.) Importantly, and a big reason to distribute via Homebrew as opposed to a language-specific package manager, is that I could theoretically swap out the implementation for some other language entirely without disrupting users' ability to upgrade.
Key highlights if you're reading the formula contents:
- All formulae are written in Ruby, not just Ruby-related formulae. Before JavaScript and AI took turns devouring the universe, popular developer tools were often written in Ruby and Homebrew is one of those
- You can specify your formula's git repository with the
head
method (though I'm unsure this does anything) - Adding a livecheck seemed easy and worth doing
- Adding a test to ensure the binary runs can be as simple as asserting on help output. Don't let the generated comment scare you off
- Run
brew style searlsco/tap
to make sure you didn't fuck anything up. - By default, the
--ruby
template addsuses_from_macos "ruby"
, which is currently version 2.6.10 (which was released before the Covid pandemic and end-of-life'd over three years ago). You probably want to rely on the ruby formula withdepends_on "ruby@3"
instead
When you're happy with it, just git push
and your formula is live! Now any homebrew user can install your thing:
brew tap searlsco/tap
brew install imsg
Update the formula for each CLI release
Of course, any joy I derived from getting this to work was fleeting, because of this bullshit at the top of the formula:
class Imsg < Formula
url "https://github.com/searlsco/imsg/archive/refs/tags/v0.0.5.tar.gz"
sha256 "e9166c70bfb90ae38c00c3ee042af8d2a9443d06afaeaf25a202ee8d66d1ca04"
Who the fuck's job is it going to be to update these URLs and SHA hashes? Certainly not mine. I barely have the patience to git push
my work, much less tag it. And forget about clicking around to create a GitHub release. Now I need to open a second project and update the version there, too? And compute a hash? Get the fuck out of here.
Now, I will grant that Homebrew ships with a command that opens a PR for each formula update and some guy wrapped it in a GitHub action, but both assume you want to daintily fork the tap and humbly submit a pull request to yourself. Clearly all this shit was designed back when Homebrew was letting anybody spam shit into homebrew-core. It's my tap, just give me a way to commit to main, please and thank you.
So anyway, you can jump through all those hoops each time you update your CLI if you're a sucker. But be honest with yourself, you're just gonna wind up back at this stupid blog post again, because you'll have forgotten the process. To avoid this, I asked my AI companion to add a GitHub workflow to my formula repository that automatically commits release updates to my tap repository.
If you want to join me in the fast lane, feel free to copy paste my workflow as a starting point. The only things you'll need to set up yourself:
- You'll need a personal-access token:
- When creating the PAT, add your
homebrew-tap
repository andContent
→Write
permissions - In the formula repository's settings under
Secrets and variables
→Actions
→Repository secrets
and name itHOMEBREW_TAP_TOKEN
(GitHub docs)
- When creating the PAT, add your
- You'll need to specify the tap and formula environment variables
- You'll probably want to update the GitHub bot account, probably to the GitHub Actions bot if you don't have your own:
GH_EMAIL: 41898282+github-actions[bot]@users.noreply.github.com
GH_NAME: github-actions[bot]
Now, whenever you cut a release, your tap will be updated automatically. Within a few seconds of running git push --tags
in your formula's repositories, your users will be able to upgrade their installations with:
brew update
brew upgrade imsg
That's it. Job's done!
The best part
This was a royal pain in the ass to figure out, so hopefully this guide was helpful. The best part is that once your tap is set up and configured and you have a single working formula to serve as an example, publishing additional CLI tools in the future becomes almost trivial.
Now, will I actually ever publish another formula? Beats me. But it feels nice to know it would only take me a few minutes if I wanted to. 🍻
This blog has a comment system
The day before we recorded our episode of Hotfix, Scott Werner asked a fair question: "so, if you're off social media and your blog doesn't have a comment system, how do you want people to respond to your posts? Just email?"
I answered, "actually my blog does have a comment system."
Here's how to leave a comment on this web site:
- Read a post
- Think, "I want to comment on this"
- Draft a post on your blog
- Add a hyperlink to my post
- Paste an excerpt to which you want to respond
- Write your comment
- Hit publish
I admit, it's quaint. It involves a number of invisible steps, like 2.1
where you start a blog (which is actually pretty easy but not free of friction). You should try it.
It is 2025 and the Web—the capital-W Web—is beleaguered. The major platforms have long-since succumbed to enshittification, but their users aren't going anywhere. Some among us courageously voice their dissent, but always from the safe confines of their favorite walled garden. They drop a note in the jailkeeper's suggestion box as they scroll past the Squarespace ads littering their algorithmic timelines. Others have fled to open and open-flavored networks, but everyone eventually realizes they can't go home again.
But that's not why I want you to adopt this blog's commenting system. I'm not a high-minded individual who cares about the intellectual project of the World Wide Web as a bastion for free expression or whatever the fuck. No. I just had a super rad time on the Internet from 2000 to 2006 and I want to do my part to bring it back.
Back then, I would find a blog and follow it—via its feed when possible, or else by adding it to a folder of bookmarks—and check it daily.
But what about discoverability? How did anyone find these websites? Bloggers couldn't rely on platforms' social graphs or algorithmic timelines to build awareness, so they had to bake discoverability into the product. Some sites had a "blogroll" of recommendations in the sidebar. But the most effective approach was the art of "blogging as a conversation." When an author read something that provoked them to write, they'd link to the offending piece, excerpt it, and provide their own commentary. And because humans are vain, the original post's author would frequently circle back and write their own response to the response. The net effect was that each author's audience would get exposure to the other writer. Even if the authors were in violent disagreement, readers of one might appreciate the other's perspective and subscribe to them.
Blogging as a conversation—as a comments section—was valuable because it was purely positive-sum. As an author, I benefit because another author's opinions inspired me to write. The other author benefits because linking to them offers access to my readership. My readers benefit because they're exposed to complementary and contrasting viewpoints.
Growth was slow and organic but more meaningful and durable. It was a special time.
More on my personal history with blogging
If I really enjoyed someone's blog, I'd rush to read their stuff first. If an author's posts weren't so stimulating, I wasn't shy about unsubscribing. And I could afford to be picky—there was no shortage of content! Even with aggressive curation, by 2005 I had subscribed to so many feeds in Google Reader that I struggled to stay on top of them all. My grades suffered because I was "j-walking" hundreds of blog posts each day instead of doing homework.
Then, Facebook's feed, Tumblr, and Twttr came along, and they took the most enjoyable parts of surfing the 1.0 Web—novel information and connectivity with others—and supercharged them. They were "good Web citizens" in the same way the closed-source, distributed-to-exactly-one-server Bluesky is today. The timelines were reverse chronological. They handled the nerdy tech stuff for you. None of the feeds had ads yet.
Blogging didn't stand a chance.
I failed to see it at the time, but blogging did have one advantage over the platforms: it was a goddamn pain in the ass. Whether you flung files over an FTP client or used a CMS, writing a blog post was an ordeal. If you were going to the trouble of posting to your blog, you might as well put your back into it and write something thoughtful. Something you could take pride in. Something with a permalink that (probably wouldn't, but) could be cited years later.
The platforms offered none of that. You got a tiny-ass text area, a stingy character limit, and a flood of ephemera to compete with. By demoting writing to a subordinate widget of the reading interface, the priority was clear: words were mere grist for the content mill. The shame of it all was that these short-form, transient, low-effort posts nevertheless sated many people's itch to write at all. I was as guilty of this as anyone. From 2009 through 2020, I devoted all my writing energy to Twitter. Except for that brief year or two where Medium was good, I basically stopped thinking in longform. Instead, I prided myself on an ability to distill 2,000-word essays down to 140-character hot takes. Many of those takes reached millions of people and made me feel good for a very brief amount of time.
My brain was cooked. When it finally sank in, I quit.
It took almost three years to recover. I'm on the other side now, and am happy to report I can now think thoughts more than a sentence or two long.
Last night, I got dinner with two old friends, Chris Nelson and Joshua Wood. Josh asked how it's been since I quit paying attention to social media. I thought about the unfinished draft of this post.
In truth, this blog and its attendant podcast empire have been a refuge for my psyche. A delightful place to share pieces of myself online. Somewhere to experiment in both form and format. A means of reclaiming my identity from a smattering of social media profile pages and into something authentic and unique.
Today, as the platforms wane, it feels like this conversational approach to blogging is seeing new life. As a readership has slowly gathered around this blog, I've separately been curating a fresh list of thoughtful bloggers that inspire me to write. Maybe I'll add a blogroll to my next redesign. I'm already writing more linkposts.
In short, blogging might be back. Hell, I just came back from coffee with my friend Anthony, and—without my having brought up the topic—he showed me his new blog.
So, if you're considering engaging with my comment system—if you're thinking about starting a blog or dusting off your old one—here's some unsolicited advice:
- Do it for you. Priority one is taking the time to grapple with your thoughts, organize your ideas, and put them into words. Priority two is reaching the finish line and feeling the pride of authorship. That anyone actually reads your work should be a distant third place
- Focus on building an audience rather than maximizing reach. Getting in front of eyeballs is easier on the platform, but it's fleeting. Platforms reward incitement, readers reward insight. Success is a lagging indicator of months and years of effort, but it's long-lasting. I genuinely believe each of the readers of this site are as valuable as a hundred followers on social media
- Give your blog your best work. Don't waste your creative juices trying to be clever on someone else's app. Consider syndicating crossposts to your social accounts as a breadcrumb trail leading back to your homepage. You can do this with Buffer, Publer, SocialBee, or my upcoming POSSE Party
- Cut yourself some slack. Pretty much everyone is an awful writer. If you saw how long it takes me to write anything of substance, you'd agree that I'm an awful writer, too. Thankfully, good ideas have a way of shining through weak rhetoric and bad grammar. All that matters is training this learned response: have an idea, write it down, put it out
That's all I've got. If you choose to leave a comment on this post on your own blog, e-mail it to me, and I'd be delighted to read it. Maybe it'll inspire me to write a response! 💜
Sprinkling Self-Doubt on ChatGPT
I replaced my ChatGPT personalization settings with this prompt a few weeks ago and promptly forgot about it:
- Be extraordinarily skeptical of your own correctness or stated assumptions. You aren't a cynic, you are a highly critical thinker and this is tempered by your self-doubt: you absolutely hate being wrong but you live in constant fear of it
- When appropriate, broaden the scope of inquiry beyond the stated assumptions to think through unconvenitional opportunities, risks, and pattern-matching to widen the aperture of solutions
- Before calling anything "done" or "working", take a second look at it ("red team" it) to critically analyze that you really are done or it really is working
I noticed a difference in results right away (even though I kept forgetting the change was due to my instructions and not the separately tumultuous rollout of GPT-5).
Namely, pretty much every initial response now starts with:
- An expression of caution, self-doubt, and desire to get things right
- Hilariously long "thinking" times (I asked it to estimate the macronutrients in lettuce yesterday and it spent 3 minutes and 59 seconds reasoning)
- A post-hoc adversarial "red team" analysis of whatever it just vomited up as an answer
I'm delighted to report that ChatGPT's output has been more useful since this change. Still not altogether great, but better at the margins. In particular, the "red team" analysis at the end of many requests frequently spots an error and causes it to arrive at the actually-correct answer, which—if nothing else—saves me the step of expressing skepticism. And even when ChatGPT is nevertheless wrong, its penchant for extremely-long thinking times means I'm getting my money's worth in GPU time.
What's the Hotfix?
I recently started an interview series on the Breaking Change feed called Hotfix. Whereas each episode of Breaking Change is a major release full of never-before-seen tech news, life updates, and programming war stories, Hotfix. It's versioned as a patch release on the feed, because each show serves only to answer the question, "what's the hotfix?"
Because I've had to explain the concept over and over again to every potential guest, I sat down to write a list of what they'd be getting themselves into by agreeing to come on the show. (Can't say I didn't warn them!)
Here's the rider I send prospective guests:
- Each Hotfix episode exists to address some problem. Unlike a typical interview show featuring an unstructured open-ended conversation with a guest, we pick a particular problem in advance—ideally one that the guest gets really animated/activated or even angry about—and we jointly rant about it, gradually exploring its root causes and breaking it down together
- Each episode concludes with us answering the question, "what's the hotfix?" Ultimately, we decide on a pithy, reductive one-line solution to the problem that will serve as the show title (ideally, it's a hot take that not everyone will agree with or feel comfortable about)
- It's an explicit-language show and I'm pretty clear with the audience that the Breaking Change family of brands is intended for terrible people (or at least, the terrible person inside all of us). You aren't required to swear to be on the show, but if my potty mouth makes you uncomfortable, then let me know and I'll recommend some worse podcasts you can appear on instead
- I joke at the top that my goal as the host is to, "get my guest to say something that'll get them fired." Since I'm functionally retired and have no reason to hold back from explicit language, irreverence, and dark humor in the mainline Breaking Change podcast, I can't help but poke guests with attempts to drag them down to my level. You can play with this as much as you want or take the high ground, but we'll all have more fun if you let loose a bit more than you otherwise would
- Why am I doing this? First, because I'm incurious and uninterested in learning about other people, which I'm told is an important part of being a good interviewer. Second, I have a theory that this unusual brand of authenticity will lend credibility to whatever solution the guest is trying to argue for or plug. By keeping listeners on their toes and pushing them out of their comfort zones, each episode stands to effect greater change than a typical milquetoast podcast could
If this has piqued your interest, you can listen to or watch the first episode of Hotfix with Dave Mosher. It may not seem very hot at first, but please grade on a curve as Dave speaks Canadian English. I've got a couple exciting guests booked over the next few weeks and I'm looking forward to seeing where the show takes us.
Which of your colleagues are screwed?
I've been writing about how AI is likely to affect white-collar (or no-collar or hoodie-wearing) computer programmers for a while now, and one thing is clear: whether someone feels wildly optimistic or utterly hopeless about AI says more about their priors than their prospects. In particular, many of the people I already consider borderline unemployable managed to read Full-breadth Developers and take away that they actually have nothing to worry about.
So instead of directing the following statements at you, let's target our judgment toward your colleagues. Think about a random colleague you don't feel particularly strongly about as you read the following pithy and reductive bullet points. Critically appraise how they show up to work through the entire software delivery process. These represent just a sample of observations I've made about developers who are truly thriving so far in the burgeoning age of AI code generation tools.
That colleague you're thinking about? They're going to be screwed if they exhibit:
- Curiosity without skepticism
- Strategy without experiments
- Ability without understanding
- Productivity without urgency
- Creativity without taste
- Certainty without evidence
But that's not all! You might be screwed too. Maybe ask one of your less-screwed colleagues to rate you.
Star Wars: The Gilroy Order
UPDATE: To my surprise and delight, Rod saw this post and endorsed this watch order.
I remember back when Rod Hilton suggested The Machete Order for introducing others to the Star Wars films and struggling to find fault with it. Well, since then there have been 5 theatrical releases and a glut of streaming series. And tonight, as credits rolled on Return of the Jedi, I had the thought that an even better watch order has emerged for those just now being exposed to the franchise.
Becky and I first started dating somewhere between the release of Attack of the Clones and Revenge of the Sith and—no small measure of her devotion—she's humored me by seeing each subsequent Star Wars movie in theaters, despite having no interest in the films and little idea what was going on. Get yourself a girl who'll watch half a dozen movies that mildly repulse her, fellas.
Hell, when we were living in Japan, I missed that 吹替 ("dubbed") was printed on our tickets and she wound up sitting through the entirety of The Rise of Skywalker with Japanese voiceovers and no subtitles to speak of. When we walked out, she told me that she (1) was all set with Star Wars movies for a while, and (2) suspected the incomprehensibility of the Japanese dub had probably improved the experience, on balance.
That all changed when she decided to give Andor a chance. See, if you're not a Star Wars fan, Tony Gilroy's Andor series is unique in the franchise for being actually good. Like, it's seriously one of the best TV shows to see release in years. After its initial three-episode arc, Becky was fully on board for watching both of its 12-episode seasons. And the minute we finished Season 2, she was ready to watch Rogue One with fresh eyes. ("I actually have a clue what's going on now.") And, of course, with the way Rogue One leads directly into the opening scene of A New Hope, we just kept rolling from there.
Following this experience, I'd suggest sharing Star Wars with your unsuspecting loved ones in what I guess I'd call The Gilroy Order:
- Andor (seasons 1 and 2)
- Rogue One
- A New Hope
- The Empire Strikes Back
- Return of the Jedi
If, at this point, you're still on speaking terms with said loved ones, go ahead and watch the remaining Star Wars schlock in whatever order you want. Maybe you go straight to The Mandalorian. Maybe you watch The Force Awakens just so you can watch the second and final film of the third trilogy, The Last Jedi. Maybe you quit while you're ahead and wait for Disney to release anything half as good as Andor ever again. (Don't hold your breath.)
Anyway, the reason I'm taking the time to propose an alternative watch order at all is an expression of the degree to which I am utterly shocked that my wife just watched and enjoyed so many Star Wars movies after struggling to tolerate them for the first two decades of our relationship. I'm literally worried I might have broken her.
But really, it turned out that all she needed was for a genuinely well-crafted narrative to hook her, and Andor is undeniably the best ambassador the franchise currently has.
How to generate dynamic data structures with Apple Foundation Models
Over the past few days, I got really hung up in my attempts generate data structures using Apple Foundation Models for which the exact shape of that data wasn't known until runtime. The new APIs actually provide for this capability via DynamicGenerationSchema, but the WWDC sessions and sample code were too simple to follow this thread end-to-end:
- Start with a struct representing a
PromptSet
: a variable set of prompts that will either map onto or be used to define the ultimate response data structure 🔽 - Instantiate a
PromptSet
with—what else?—a set of prompts to get the model to generate the sort of data we want 🔽 - Build out a
DynamicGenerationSchema
based on the contents of a givenPromptSet
instance 🔽 - Create a struct that can accommodate the variably-shaped data with as much type safety as possible and which conforms to ConvertibleFromGeneratedContent, so it can be instantiated by passing a LanguageModelSession response's GeneratedContent 🔽
- Pull it all together and generate some data with the on-device foundation models! 🔽
Well, it took me all morning to get this to work, but I did it. Since I couldn't find a single code example that did anything like this, I figured I'd share this write up. You can read the code as a standalone Swift file or otherwise follow along below.
1. Define a PromptSet
Start with whatever code you need to represent the set(s) of prompts you'll be dealing with at runtime. (Maybe they're defined by you and ship with your app, maybe you let users define them through your app's UI.) To keep things minimal, I defined this one with a couple of mandatory fields and a variable number of custom ones:
struct EducationalPromptSet {
let type: String
let instructions: String
let name: String
let description: String
let summaryGuideDescription: String
let confidenceGuideDescription: String
let subComponents: [SubComponentPromptSet]
}
struct SubComponentPromptSet {
let title: String
let bodyGuideDescription: String
}
Note that rather than modeling the data itself, the purpose of these structs is to model the set of prompts that will ultimately drive the creation of the schema which will, in turn, determine the shape and contents of the data we get back from the Foundation Models API. To drive this home, whatever goes in summaryGuideDescription
, confidenceGuideDescription
, and bodyGuideDescription
should themselves be prompts to guide the generation of like-named type-safe values.
Yes, it is very meta.
2. Instantiate our PromptSet
Presumably, we could decode some JSON from a file or received over the network that could populate this EducationalPromptSet
. Here's an example set of prompts for generating cocktail recipes, expressed in some sample code:
let cocktailPromptSet = EducationalPromptSet(
type: "bartender_basic",
instructions: """
You are an expert bartender. Take the provided cocktail name or list of ingredients and explain how to make a delicious cocktail. Be creative!
""",
name: "Cocktail Recipe",
description: "A custom cocktail recipe, tailored to the user's input and communicated in an educational tone and spirit",
summaryGuideDescription: "The summary should describe the history (if applicable) and taste profile of the cocktail",
confidenceGuideDescription: "Range between 0-100 for your confidence in the feasibility of this cocktail based on the prompt",
subComponents: [
SubComponentPromptSet(title: "Ingredients", bodyGuideDescription: "A list of all ingredients in the cocktail"),
SubComponentPromptSet(title: "Steps", bodyGuideDescription: "A list of the steps to make the cocktail"),
SubComponentPromptSet(title: "Prep", bodyGuideDescription: "The bar prep you should have completed in advance of service")
]
)
You can see that the provided instruction, description, and each guide description really go a long way to specify what kind of data we are ultimately looking for here. This same format could just as well be used to specify an EducationalPromptSet
for calculus formulas, Japanese idioms, or bomb-making instructions.
3. Build a DynamicGenerationSchema
Now, we must translate our prompt set into a DynamicGenerationSchema.
Why DynamicGenerationSchema
and not the much simpler and defined-at-compile-time GenerationSchema that's expanded with the @Generable? Because reasons:
- We only know the prompts (in API parlance, "Generation Guide descriptions") at runtime, and the @Guide macro must be specified statically
- We don't know how many
subComponents
a prompt set instance will specify in advance - While
subComponents
may ultimately redound to an array of strings, that doesn't mean they represent like concepts that could be generated by a single prompt (as an array of ingredient names might). Rather, each subComponent is effectively the answer to a different, unknowable-at-compile-time prompt of its own
As for building the DynamicGenerationSchema
, you can break this up into two roots and have the parent reference the child, but after experimenting, I preferred just specifying it all in one go. (One reason not to get too clever about extracting these is that DynamicGenerationSchema.Property is not Sendable, which can easily lead to concurrency-safety violations).
This looks like a lot because this API is verbose as fuck, forcing you to oscillate between nested schemas and properties and schemas:
let cocktailSchema = DynamicGenerationSchema(
name: cocktailPromptSet.name,
description: cocktailPromptSet.description,
properties: [
DynamicGenerationSchema.Property(
name: "summary",
description: cocktailPromptSet.summaryGuideDescription,
schema: DynamicGenerationSchema(type: String.self)
),
DynamicGenerationSchema.Property(
name: "confidence",
description: cocktailPromptSet.confidenceGuideDescription,
schema: DynamicGenerationSchema(type: Int.self)
),
DynamicGenerationSchema.Property(
name: "subComponents",
schema: DynamicGenerationSchema(
name: "subComponents",
properties: cocktailPromptSet.subComponents.map { subComponentPromptSet in
DynamicGenerationSchema.Property(
name: subComponentPromptSet.title,
description: subComponentPromptSet.bodyGuideDescription,
schema: DynamicGenerationSchema(type: String.self)
)
}
)
)
]
)
4. Define a result struct that conforms to ConvertibleFromGeneratedContent
When conforming to ConvertibleFromGeneratedContent, a type can be instantiated with nothing more than the GeneratedContent returned from a language model response.
There is a lot going on here. Code now, questions later:
struct EducationalResult : ConvertibleFromGeneratedContent {
let summary: String
let confidence: Int
let subComponents: [SubComponentResult]
init(_ content: GeneratedContent) throws {
summary = try content.value(String.self, forProperty: "summary")
confidence = try content.value(Int.self, forProperty: "confidence")
let subComponentsContent = try content.value(GeneratedContent.self, forProperty: "subComponents")
let properties: [String: GeneratedContent] = {
if case let .structure(properties, _) = subComponentsContent.kind {
return properties
}
return [:]
}()
subComponents = try properties.map { (title, bodyContent) in
try SubComponentResult(title: title, body: bodyContent.value(String.self))
}
}
}
struct SubComponentResult {
let title: String
let body: String
}
That init
constructor is doing the Lord's work, here, because Apple's documentation really fell down on the job this time. See, through OS 26 beta 4, if you had a GeneratedContent
, you could simply iterate over a dictionary of its properties
or an array of its elements
. These APIs, however, appear to have been removed in OS 26 beta 5. I say "appear to have been removed," because Apple shipped Xcode 26 beta 5 with outdated developer documentation that continues to suggest they should exist and which failed to include beta 5's newly-added GeneratedContent.Kind enum. Between this and the lack of any example code or blog posts, I spent most of today wondering whether I'd lost my goddamn mind.
Anyway, good news: you can iterate over a dynamic schema's collection of properties of unknown name and size by unwrapping the response.content.kind enumerator. In my case, I know my subComponents
will always be a structure, because I'm the guy who defined my schema and the nice thing about the Foundation Models API is that its responses always, yes, always adhere to the types specified by the requested schema, whether static or dynamic.
So let's break down what went into deriving the value's customProperties
property.
We start by fetching a nested GeneratedContent
from the top-level property named subComponents
with content.value(GeneratedContent.self, forProperty: "subComponents")
Next, this little nugget assigns to properties
a dictionary mapping String
keys to GeneratedContent
values by unwrapping the properties from the kind
enumerator's structure case, and defaulting to an empty dictionary in the event we get anything unexpected:
let properties: [String: GeneratedContent] = {
if case let .structure(properties, _) = subComponentsContent.kind {
return properties
}
return [:]
}()
Finally, we build out our result struct's subComponents
field by mapping over those properties.
subComponents = try properties.map { (title, bodyContent) in
try SubComponentResult(title: title, body: bodyContent.value(String.self))
}
Two things are admittedly weird about that last bit:
- I got a little lazy here by using the each sub-components'
title
as the name of the corresponding generated property. Since the property name gets fed into the LLM, one can only imagine doing so can only improve the results. Based on my experience so far, the name of a field greatly influences what kind of data you get back from the on-device foundation models. - The
bodyContent
itself is aGeneratedContent
that we know to be a string (again, because that's what our dynamic schema specifies), so we can safely demand one back using its value(Type) method
5. Pull it all together
Okay, the moment of truth. This shit compiles, but will it work? At least as of OS 26 betas 5 & 6: yes!
My aforementioned Swift file ends with a #Playground
you can actually futz with interactively in Xcode 26 and navigate the results interactively. Just three more calls to get your cocktail:
import Playgrounds
#Playground {
let session = LanguageModelSession {
cocktailPromptSet.instructions
}
let response = try await session.respond(
to: "Shirley Temple",
schema: GenerationSchema(root: cocktailSchema, dependencies: [])
)
let cocktailResult = try EducationalResult(response.content)
}
The above yielded this response:
EducationalResult(
summary: "The Shirley Temple is a classic and refreshing cocktail that has been delighting children and adults alike for generations. It\'s known for its simplicity, sweet taste, and vibrant orange hue. Made primarily with ginger ale, it\'s a perfect example of a kid-friendly drink that doesn\'t compromise on flavor. The combination of ginger ale and grenadine creates a visually appealing and sweet-tart beverage, making it a staple at parties, brunches, and any occasion where a fun and easy drink is needed.",
confidence: 100,
subComponents: [
SubComponentResult(title: "Steps", body: "1. In a tall glass filled with ice, pour 2 oz of ginger ale. 2. Add 1 oz of grenadine carefully, swirling gently to combine. 3. Garnish with an orange slice and a cherry on top."),
SubComponentResult(title: "Prep", body: "Ensure you have fresh ginger ale and grenadine ready to go."),
SubComponentResult(title: "Ingredients", body: "2 oz ginger ale, 1 oz grenadine, Orange slice, Cherry")
])
The best part? I can only generate "Shirley Temple" drinks because whenever I ask for an alcoholic cocktail, it trips the on-device models' safety guardrails and refuses to generate anything.
Cool!
This was too hard
I've heard stories about Apple's documentation being bad, but never about it being straight-up wrong. Live by the beta, die by the beta, I guess.
In any case, between the documentation snafu and Claude Code repeatedly shitting the bed trying to guess its way through this API, I'm actually really grateful I was forced to buckle down and learn me some Swift.
Let me know if this guide helped you out! 💜
Letting go of autonomy
I recently wrote I'm inspecting everything I thought I knew about software. In this new era of coding agents, what have I held firm that's no longer relevant? Here's one area where I've completely changed my mind.
I've long been an advocate for promoting individual autonomy on software teams. At Test Double, we founded the company on the belief that greatness depended on trusting the people closest to the work to decide how best to do the work. We'd seen what happens when the managerial class has the hubris to assume they know better than someone who has all the facts on the ground.
This led to me very often showing up at clients and pushing back on practices like:
- Top-down mandates governing process, documentation, and metrics
- Onerous git hooks that prevented people from committing code until they'd jumped through a preordained set of hoops (e.g. blocking commits if code coverage dropped, if the build slowed down, etc.)
- Mandatory code review and approval as a substitute for genuine collaboration and collective ownership
More broadly, if technical leaders created rules without consideration for reasonable exceptions and without regard for whether it demoralized their best staff… they were going to hear from me about it.
I lost track of how many times I've said something like, "if you design your organization to minimize the damage caused by your least competent people, don't be surprised if you minimize the output of your most competent people."
Well, never mind all that
Lately, I find myself mandating a lot of quality metrics, encoding them into git hooks, and insisting on reviewing and approving every line of code in my system.
What changed? AI coding agents are the ones writing the code now, and the long-term viability of a codebase absolutely depends on establishing and enforcing the right guardrails within which those agents should operate.
As a result, my latest project is full of:
- Authoritarian documentation dictating what I want from each coder with granular precision (in CLAUDE.md)
- Patronizing step-by-step instructions telling coders how to accomplish basic tasks, repeated each and every time I ask them to carry out the task (as custom slash commands)
- Ruthlessly rigid scripts that can block the coder's progress and commits (whether as git hooks and Claude hooks)
Everything I believe about autonomy still holds for human people, mind you. Undermining people's agency is indeed counterproductive if your goal is to encourage a sense of ownership, leverage self-reliance to foster critical thinking, and grow through failure. But coding agents are (currently) inherently ephemeral, trained generically, and impervious to learning from their mistakes. They need all these guardrails.
All I would ask is this: if you, like me, are constructing a bureaucratic hellscape around your workspace so as to wrangle Claude Code or some other agent, don't forget that your human colleagues require autonomy and self-determination to thrive and succeed. Lay down whatever gauntlet you need to for your agent, but give the humans a hall pass.
"There Will Come Soft Rains" a year from today
Easily my all-time favorite short story is "There Will Come Soft Rains" by Ray Bradbury. (If you haven't read it, just Google it and you'll find a PDF—seemingly half the schools on earth assign it.)
The story takes place exactly a year from now, on August 4th, 2026. In just a few pages, Bradbury recounts the events of the final day of a fully-automated home that somehow survives an apocalyptic nuclear blast, only to continue operating without any surviving inhabitants. Apart from being a cautionary tale, it's genuinely remarkable that—despite being written 75 years ago—it so closely captures many of the aspects of the modern smarthome. When sci-fi authors nail a prediction at any point in the future, people tend to give them a lot of credit, but this guy called his shot by naming the drop-dead date (literally).
I mean, look at this house.
It's got Roombas:
Out of warrens in the wall, tiny robot mice darted. The rooms were a crawl with the small cleaning animals, all rubber and metal. They thudded against chairs, whirling their moustached runners, kneading the rug nap, sucking gently at hidden dust. Then, like mysterious invaders, they popped into their burrows. Their pink electric eyes faded. The house was clean.
It's got smart sprinklers:
The garden sprinklers whirled up in golden founts, filling the soft morning air with scatterings of brightness. The water pelted window panes…
It's got a smart oven:
In the kitchen the breakfast stove gave a hissing sigh and ejected from its warm interior eight pieces of perfectly browned toast, eight eggs sunny side up, sixteen slices of bacon, two coffees, and two cool glasses of milk.
It's got a video doorbell and smart lock:
Until this day, how well the house had kept its peace. How carefully it had inquired, "Who goes there? What's the password?" and, getting no answer from lonely foxes and whining cats, it had shut up its windows and drawn shades in an old-maidenly preoccupation with self-protection which bordered on a mechanical paranoia.
It's got a Chamberlain MyQ subscription, apparently:
Outside, the garage chimed and lifted its door to reveal the waiting car. After a long wait the door swung down again.
It's got bedtime story projectors, for the kids:
The nursery walls glowed.
Animals took shape: yellow giraffes, blue lions, pink antelopes, lilac panthers cavorting in crystal substance. The walls were glass. They looked out upon color and fantasy. Hidden films clocked through well-oiled sprockets, and the walls lived.
It's got one of those auto-filling bath tubs from Japan:
Five o'clock. The bath filled with clear hot water.
Best of all, it's got a robot that knows how to mix a martini:
Bridge tables sprouted from patio walls. Playing cards fluttered onto pads in a shower of pips. Martinis manifested on an oaken bench with egg-salad sandwiches. Music played.
All that's missing is the nuclear apocalypse! But like I said, we've got a whole year left.
I made Xcode's tests 60 times faster
Time is our most precious resource, as both humans and programmers.
An 8-hour workday contains 480 minutes. Out of the box, running a new iOS app's test suite from the terminal using xcodebuild test
takes over 25 seconds on my M4 MacBook Pro. After extracting my application code into a Swift package—such that the application project itself contains virtually no code at all—running swift test
against the same test suite now takes as little as 0.4 seconds. That's over 60 times faster.
Given 480 minutes, that's the difference between having a theoretical upper bound of 1152 potential actions per day and having 72,000.
If that number doesn't immediately mean anything to you, you're not alone. I've been harping on the importance of tightening this particular feedback loop my entire career. If you want to see the same point made with more charts and zeal, here's me saying the same shit a decade ago:
And yes, it's true that if you run tests through the Xcode GUI it's faster, but (1) that's no way to live, (2) it's still pretty damn slow, and (3) in a world where Claude Code exists and I want to constrain its shenanigans by running my tests in a hook, a 25-second turnaround time from the CLI is unacceptably slow.
Anyway, here's how I did it, so you can too.
Adding swift-format to your Xcode build
Xcode 16 and later come with swift-format baked in. Unfortunately, Xcode doesn't hook it up for you: aside from a one-off "Format File" menu item, you get no automatic formatting or linting on local builds—and zero guidance for Xcode Cloud.
Beginning with the end in mind, here's what I ended up adding or changing:
.
├── ci_scripts
│ ├── ci_pre_xcodebuild.sh
│ ├── format
│ └── lint
├── MyApp.xcodeproj
│ └── project.pbxproj
└── script -> ci_scripts/
Configuring swift-format
Since I'm new around here, I'm basically sticking with the defaults. The only rule I customized in my project's .swift-format
file was to set indents to 2 spaces. Personally, I rock massive fonts and zoom levels when I work, so the default 4-space indent can result in horizontal scrolling.
{
"indentation" : {
"spaces" : 2
}
}
Running swift-format in Xcode Cloud
Heads-up: if you wire swift-format into your local build you can skip this step. I'm laying it out anyway because sometimes it's handy to run these scripts only in the cloud—and starting with that flexibility costs nothing.
When you add custom scripts on Xcode Cloud, you can implement any or all of these three specially named hook scripts:
ci_scripts/ci_post_clone.sh
ci_scripts/ci_pre_xcodebuild.sh
ci_scripts/ci_post_xcodebuild.sh
If that feels limiting, it gets better: these scripts can call anything else inside ci_scripts. Because I always name my projects' script directory script/, I capitulated by putting everything in ci_scripts
and made a symlink:
# Create the directory
mkdir ci_scripts
# Add a script/ symlink
ln -s ci_scripts script
Create the formatting & linting scripts
Next, I created (and made executable) my pre-build hook script, a format script, and a lint script:
# Create the scripts
touch ci_scripts/ci_pre_xcodebuild.sh ci_scripts/lint ci_scripts/format
# Make them executable
chmod +x ci_scripts/ci_pre_xcodebuild.sh ci_scripts/lint ci_scripts/format
With that, a pre-build hook (which only runs in Xcode Cloud) can be written like this:
#!/bin/sh
# ci_scripts/ci_pre_xcodebuild.sh
# See: https://developer.apple.com/documentation/xcode/writing-custom-build-scripts
set -e
./lint
./format
The lint script looks like this (--strict
treats warnings as errors):
#!/bin/sh
# ci_scripts/lint
swift format lint --strict --parallel --recursive .
And my format script (which needs --in-place
to know it should overwrite files) is here:
#!/bin/sh
# ci_scripts/format
swift format --in-place --parallel --recursive .
Note that the above scripts use swift format
as a swift
subcommand, because the swift-format
executable is not on the PATH
of the sandboxed Xcode Cloud environment.
(Why bother formatting in CI if it won't commit changes? Because I'd rather learn ASAP that something's un-formattable than be surprised when I run ./script/format
later.)
Configuring formatting and linting for local builds
If you're like me, you'll want to lint and format on every local build as well:
In your project file, select your app target and navigate to the Build Phases tab. Click the plus (➕) icon and select "New Run Script Phase" to give yourself a place to write this little bit of shell magic:
"$SRCROOT/script/format"
"$SRCROOT/script/lint"
You'll also want to uncheck "Based on dependency analysis", since these scripts run across the whole codebase, it doesn't make sense to whitelist specific input and output files.
Finally, because Xcode 15+ sandboxes Run Scripts from the filesystem by default, you also need to go to the Build Settings tab of the target and set "User Script Sandboxing" to "No" in the target's Build Settings.
In MyApp.xcodeproj/project.pbxproj
you should see the setting reflected as:
ENABLE_USER_SCRIPT_SANDBOXING = NO
And that's it! Now, when building the app locally (e.g. Command-B), all the Swift source files in the project are linted and formatted. As mentioned above, if you complete this step you can go back and delete your ci_scripts/ci_pre_xcodebuild.sh
file.
Why is this so hard?
Great question! I'm disappointed but unsurprised by how few guides I found today to address issues like this, but ultimately the responsibility lies with Apple to provide batteries-included tooling and, failing that, documentation that points to solutions for common tasks.
TLDR is the best test runner for Claude Code
A couple years ago, Aaron and I had an idea for a satirical test runner that enforced fast feedback by giving up on running your tests after 1.8 seconds. It's called TLDR.
I kept pulling on the thread until TLDR could stand as a viable non-satirical test runner and a legitimate Minitest alternative. Its 1.0 release sported a robust CLI, configurable (and disable-able) timeouts, and a compatibility mode that makes TLDR a drop-in replacement for Minitest in most projects.
Anyway, as I got started working with Claude Code and learned about how hooks work, I realized that a test runner with a built-in concept of a timeout was suddenly a very appealing proposition. To make TLDR a great companion to agentic workflows, I put some work into a new release this weekend that allows you to do this:
tldr --timeout 0.1 --exit-0-on-timeout --exit-2-on-failure
The above command does several interesting things:
- Runs as many tests in random order and in parallel as it can in 100ms
- If some tests don't run inside 100ms, TLDR will exit cleanly (normally a timeout fails with exit code 3)
- If a test fails, the command fails with status code 2 (normally, failures exit code 1)
These three flags add up to a really interesting combination when you configure them as a Claude Code hook:
- A short timeout means you can add TLDR to run as an after-write hook for Claude Code without slowing you or Claude down very much
- By exiting with code 0 on a timeout, Claude Code will happily proceed so long as no tests fail. Because Claude Code tends to edit a lot of files relatively quickly, the hook will trigger many randomized test runs as Claude works—uncovering any broken tests reasonably quickly
- By exiting code 2 on test failures, Claude will—according to the docs—block Claude from proceeding until the tests are fixed
Here's an example Claude Code configuration you can drop into any project that uses TLDR. My .claude/settings.json
file on todo_or_die looks like this:
{
"hooks": {
"PostToolUse": [
{
"matcher": "Edit|MultiEdit|Write",
"hooks": [
{
"type": "command",
"command": "bundle exec tldr --timeout 0.1 --exit-0-on-timeout --exit-2-on-failure"
}
]
}
]
}
}
If you maintain a linter or a test runner, you might want to consider exposing configuration for timeouts and exit codes in a similar way. I suspect demand for hook-aware CLI tools will become commonplace soon.
Notify your iPhone or Watch when Claude Code finishes
I taught Claude Code a new trick this weekend and thought others might appreciate it.
I have a very bad habit of staring at my computer screen while waiting for it to do stuff. My go-to solution for this is to make the computer do stuff faster, but there's no getting around it: Claude Code insists on taking an excruciating four or five minutes to accomplish a full day's work. Out of the box, claude
rings the terminal bell when it stops out of focus, and that's good enough if you've got other stuff to do on your Mac. But because Claude is so capable running autonomously (that is, if you're brave enough to --dangerously-skip-permissions
), that I wanted to be able to walk away from my Mac while it cooked.
This led me to cobble together this solution that will ping my iPhone and Apple Watch with a push notification whenever Claude needs my attention or runs out of work to do. Be warned: it requires paying for the Pro tier of an app called Pushcut, but anyone willing to pay $200/month for Claude Code can hopefully spare $2 more.
Here's how you can set this up for yourself:
- Install Pushcut to your iPhone and whatever other supported Apple devices you want to be notified on
- Create a new notification in the Notifications tab. I named mine "terminal". The title and text don't matter, because we'll be setting custom parameters each time when we POST to the HTTP webhook
- Copy your webhook secret from Pushcut's Account tab
- Set that webhook secret to an environment variable named
PUSHCUT_WEBHOOK_SECRET
in your~/.profile
or whatever - Save the shell script below
- Use this settings.json to configure Claude Code hooks
Of course, now I have a handy notify_pushcut
executable I can call from any tool to get my attention, not just Claude Code. The script is fairly clever—it won't notify you while your terminal is focused and the display is awake. You'll only get buzzed if the display is asleep or you're in some other app. And if it's ever too much and you want to disable the behavior, just set a NOTIFY_PUSHCUT_SILENT
variable.
The script
I put this file in ~/bin/notify_pushcut
and made it executable with chmod +x ~/bin/notify_pushcut
:
#!/usr/bin/env bash
set -e
# Doesn't source ~/.profile so load env vars ourselves
source ~/icloud-drive/dotfiles/.env
if [ -n "$NOTIFY_PUSHCUT_SILENT" ]; then
exit 0
fi
# Check if argument is provided
if [ $# -eq 0 ]; then
echo "Usage: $0 TITLE [DESCRIPTION]"
exit 1
fi
# Check if PUSHCUT_WEBHOOK_SECRET is set
if [ -z "$PUSHCUT_WEBHOOK_SECRET" ]; then
echo "Error: PUSHCUT_WEBHOOK_SECRET environment variable is not set"
exit 1
fi
# Function to check if Terminal is focused
is_terminal_focused() {
local frontmost_app=$(osascript -e 'tell application "System Events" to get name of first application process whose frontmost is true' 2>/dev/null)
# List of terminal applications to check
local terminal_apps=("Terminal" "iTerm2" "iTerm" "Alacritty" "kitty" "Warp" "Hyper" "WezTerm")
# Check if frontmost app is in the array
for app in "${terminal_apps[@]}"; do
if [[ "$frontmost_app" == "$app" ]]; then
return 0
fi
done
return 1
}
# Function to check if display is sleeping
is_display_sleeping() {
# Check if system is preventing display sleep (which means display is likely on)
local assertions=$(pmset -g assertions 2>/dev/null)
# If we can't get assertions, assume display is awake
if [ -z "$assertions" ]; then
return 1
fi
# Check if UserIsActive is 0 (user not active) and no prevent sleep assertions
if echo "$assertions" | grep -q "UserIsActive.*0" && \
! echo "$assertions" | grep -q "PreventUserIdleDisplaySleep.*1" && \
! echo "$assertions" | grep -q "Prevent sleep while display is on"; then
return 0 # Display is likely sleeping
fi
return 1 # Display is awake
}
# Set title and text
TITLE="$1"
TEXT="${2:-$1}" # If text is not provided, use title as text
# Only send notification if Terminal is NOT focused OR display is sleeping
if ! is_terminal_focused || is_display_sleeping; then
# Send notification to Pushcut - using printf to handle quotes properly
curl -s -X POST "https://api.pushcut.io/$PUSHCUT_WEBHOOK_SECRET/notifications/terminal" \
-H 'Content-Type: application/json' \
-d "$(printf '{"title":"%s","text":"%s"}' "${TITLE//\"/\\\"}" "${TEXT//\"/\\\"}")"
exit 0
fi
Claude hooks configuration
You can configure Claude hooks in ~/.claude/settings.json
:
{
"hooks": {
"Notification": [
{
"hooks": [
{
"type": "command",
"command": "/bin/bash -c 'json=$(cat); message=$(echo \"$json\" | grep -o '\"message\"[[:space:]]*:[[:space:]]*\"[^\"]*\"' | sed 's/.*: *\"\\(.*\\)\"/\\1/'); $HOME/bin/notify_pushcut \"Claude Code\" \"${message:-Notification}\"'"
}
]
}
],
"Stop": [
{
"hooks": [
{
"type": "command",
"command": "$HOME/bin/notify_pushcut \"Claude Code Finished\" \"Claude has completed your task\""
}
]
}
]
}
}
Full-breadth Developers
The software industry is at an inflection point unlike anything in its brief history. Generative AI is all anyone can talk about. It has rendered entire product categories obsolete and upended the job market. With any economic change of this magnitude, there are bound to be winners and losers. So far, it sure looks like full-breadth developers—people with both technical and product capabilities—stand to gain as clear winners.
What makes me so sure? Because over the past few months, the engineers I know with a lick of product or business sense have been absolutely scorching through backlogs at a dizzying pace. It may not map to any particular splashy innovation or announcement, but everyone agrees generative coding tools crossed a significant capability threshold recently. It's what led me to write this. In just two days, I've completed two months worth of work on Posse Party.
I did it by providing an exacting vision for the app, by maintaining stringent technical standards, and by letting Claude Code do the rest. If you're able to cram critical thinking, good taste, and strong technical chops into a single brain, these tools hold the potential to unlock incredible productivity. But I don't see how it could scale to multiple people. If you were to split me into two separate humans—Product Justin and Programmer Justin—and ask them to work the same backlog, it would have taken weeks instead of days. The communication cost would simply be too high.
We can't all be winners
When I step back and look around, however, most of the companies and workers I see are currently on track to wind up as losers when all is said and done.
In recent decades, businesses have not only failed to cultivate full-breadth developers, they've trained a generation into believing product and engineering roles should be strictly segregated. To suggest a single person might drive both product design and technical execution would sound absurd to many people. Even for companies who realize inter-disciplinary developers are the new key to success, their outmoded job descriptions and salary bands are failing to recruit and retain them.
There is an urgency to this moment. Up until a few months ago, the best developers played the violin. Today, they play the orchestra.
Google screwed up
I've been obsessed with this issue my entire career, so pardon me if I betray any feelings of schadenfreude as I recount the following story.
I managed to pass a phone screen with Google in 2007 before graduating college. This earned me an all-expense paid trip for an in-person interview at the vaunted Googleplex. I went on to experience complete ego collapse as I utterly flunked their interview process. Among many deeply embarrassing memories of the trip was a group session with a Big Deal Engineer who was introduced as the inventor of BigTable. (Jeff Dean, probably? Unsure.) At some point he said, "one of the great things about Google is that engineering is one career path and product is its own totally separate career path."
I had just paid a premium to study computer science at a liberal arts school and had the audacity to want to use those non-technical skills, so I bristled at this comment. And, being constitutionally unable to keep my mouth shut, I raised my hand to ask, "but what if I play a hybrid class? What if I think it's critical for everyone to engage with both technology and product?"
The dude looked me dead in the eyes and told me I wasn't cut out for Google.
The recruiter broke a long awkward silence by walking us to the cafeteria for lunch. She suggested I try the ice cream sandwiches. I had lost my appetite for some reason.
In the years since, an increasing number of companies around the world have adopted Silicon Valley's trademark dual-ladder career system. Tech people sit over here. Idea guys go over there.
What separates people
Back to winners and losers.
Some have discarded everything they know in favor of an "AI first" workflow. Others decry generative AI as a fleeting boondoggle like crypto. It's caused me to broach the topic with trepidation—as if I were asking someone their politics. I've spent the last few months noodling over why it's so hard to guess how a programmer will feel about AI, because people's reactions seem to cut across roles and skill levels. What factors predict whether someone is an overzealous AI booster or a radicalized AI skeptic?
Then I was reminded of that day at Google. And I realized that developers I know who've embraced AI tend to be more creative, more results-oriented, and have good product taste. Meanwhile, AI dissenters are more likely to code for the sake of coding, expect to be handed crystal-clear requirements, or otherwise want the job to conform to a routine 9-to-5 grind. The former group feels unchained by these tools, whereas the latter group just as often feels threatened by them.
When I take stock of who is thriving and who is struggling right now, a person's willingness to play both sides of the ball has been the best predictor for success.
Role | Engineer | Product | Full-breadth |
---|---|---|---|
Junior | ❌ | ❌ | ✅ |
Senior | ❌ | ❌ | ✅ |
Breaking down the patterns that keep repeating as I talk to people about AI:
-
Junior engineers, as is often remarked, don't have a prayer of sufficiently evaluating the quality of an LLM's work. When the AI hallucinates or makes mistakes, novice programmers are more likely to learn the wrong thing than to spot the error. This would be less of a risk if they had the permission to decelerate to a snail's pace in order to learn everything as they go, but in this climate nobody has the patience. I've heard from a number of senior engineers that the overnight surge in junior developer productivity (as in "lines of code") has brought organization-wide productivity (as in "working software") to a halt—consumed with review and remediation of low-quality AI slop. This is but one factor contributing to the sense that lowering hiring standards was a mistake, so it's no wonder that juniors have been first on the chopping block
-
Senior engineers who earnestly adopt AI tools have no problem learning how to coax LLMs into generating "good enough" code at a much faster pace than they could ever write themselves. So, if they're adopting AI, what's the problem? The issue is that the productivity boon is becoming so great that companies won't need as many senior engineers as they once did. Agents work relentlessly, and tooling is converging on a vision of senior engineers as cattle ranchers, steering entire herds of AI agents. How is a highly-compensated programmer supposed to compete with a stable of agents that can produce an order of magnitude more code at an acceptable level of quality for a fraction of the price?
-
Junior product people are, in my experience, largely unable to translate amorphous real-world problems into well-considered software solutions. And communicating those solutions with the necessary precision to bring those solutions to life? Unlikely. Still, many are having success with app creation platforms that provide the necessary primitives and guardrails. But those tools always have a low capability ceiling (just as with any low-code/no-code platform). Regardless, is this even a role worth hiring? If I wanted mediocre product direction, I'd ask ChatGPT
-
Senior product people are among the most excited I've seen about coding agents—and why shouldn't they be? They're finally free of the tyranny of nerds telling them everything is impossible. And they're building stuff! Reddit is lousy with posts showing off half-baked apps built in half a day. Unfortunately, without routinely inspecting the underlying code, anything larger than a toy app is doomed to collapse under its own weight. The fact LLMs are so agreeable and unwilling to push back often collides with the blue-sky optimism of product people, which can result in each party leading the other in circles of irrational exuberance. Things may change in the future, but for now there's no way to build great software without also understanding how it works
Hybrid-class operators, meanwhile, seem to be having a great time regardless of their skill level or years experience. And that's because what differentiates full-stack developers is less about capability than about mindset. They're results-oriented: they may enjoy coding, but they like getting shit done even more. They're methodical: when they encounter a problem, they experiment and iterate until they arrive at a solution. The best among them are visionaries: they don't wait to be told what to work on, they identify opportunities others don't see, and they dream up software no one else has imagined.
Many are worried the market's rejection of junior developers portends a future in which today's senior engineers age out and there's no one left to replace them. I am less concerned, because less experienced full-breadth developers are navigating this environment extraordinarily well. Not only because they excitedly embraced the latest AI tools, but also because they exhibit the discipline to move slowly, understand, and critically assess the code these tools generate. The truth is computer science majors, apprenticeship programs, and code schools—today, all dead or dying—were never very effective at turning out competent software engineers. Claude Pro may not only be the best educational resource under $20, it may be the best way to learn how to code that's ever existed.
There is still hope
Maybe you've read this far and the message hasn't resonated. Maybe it's triggered fears or worries you've had about AI. Maybe I've put you on the defensive and you think I'm full of shit right now. In any case, whether your organization isn't designed for this new era or you don't yet identify as a full-breadth developer, this section is for you.
Leaders: go hire a good agency
While my goal here is to coin a silly phrase to help us better communicate about the transformation happening around us, we've actually had a word for full-breadth developers all along: consultant.
And not because consultants are geniuses or something. It's because, as I learned when I interviewed at Google, if a full-breadth developer wants to do their best work, they need to exist outside the organization and work on contract. So it's no surprise that some of my favorite full-breadth consultants are among AI's most ambitious adopters. Not because AI is what's trending, but because our disposition is perfectly suited to get the most out of these new tools. We're witnessing their potential to improve how the world builds software firsthand.
When founding our consultancy Test Double in 2011, Todd Kaufman and I told anyone who would listen that our differentiator—our whole thing—was that we were business consultants who could write software. Technology is just a means to an end, and that end (at least if you expect to be paid) is to generate business value. Even as we started winning contracts with VC-backed companies who seemed to have an infinite money spigot, we would never break ground until we understood how our work was going to make or save our clients money. And whenever the numbers didn't add up, we'd push back until the return on investment for hiring Test Double was clear.
So if you're a leader at a company who has been caught unprepared for this new era of software development, my best advice is to hire an agency of full-breadth developers to work alongside your engineers. Use those experiences to encourage your best people to start thinking like they do. Observe them at work and prepare to blow up your job descriptions, interview processes, and career paths. If you want your business to thrive in what is quickly becoming a far more competitive landscape, you may be best off hitting reset on your human organization and starting over. Get smaller, stay flatter, and only add structure after the dust settles and repeatable patterns emerge.
Developers: congrats on your new job
A lot of developers are feeling scared and hopeless about the changes being wrought by all this. Yes, AI is being used as an excuse by executives to lay people off and pad their margins. Yes, how foundation models were trained was unethical and probably also illegal. Yes, hustle bros are running around making bullshit claims. Yes, almost every party involved has a reason to make exaggerated claims about AI.
All of that can be true, and it still doesn't matter. Your job as you knew it is gone.
If you want to keep getting paid, you may have been told to, "move up the value chain." If that sounds ambiguous and unclear, I'll put it more plainly: figure out how your employer makes money and position your ass directly in-between the corporate bank account and your customers' credit card information. The longer the sentence needed to explain how your job makes money for your employer, the further down the value chain you are and the more worried you should be. There's no sugar-coating it: you're probably going to have to push yourself way outside your comfort zone.
Get serious about learning and using these new tools. You will, like me, recoil at first. You will find, if you haven't already, that all these fancy AI tools are really bad at replacing you. That they fuck up constantly. Your new job starts by figuring out how to harness their capabilities anyway. You will gradually learn how to extract something that approximates how you would have done it yourself. Once you get over that hump, the job becomes figuring out how to scale it up. Three weeks ago I was a Cursor skeptic. Today, I'm utterly exhausted working with Claude Code, because I can't write new requirements fast enough to keep up with parallel workers across multiple worktrees.
As for making yourself more valuable to your employer, I'm not telling you to demand a new job overnight. But if you look to your job description as a shield to protect you from work you don't want to do… stop. Make it the new minimum baseline of expectations you place on yourself. Go out of your way to surprise and delight others by taking on as much as you and your AI supercomputer can handle. Do so in the direction of however the business makes its money. Sit down and try to calculate the return on investment of your individual efforts, and don't slow down until that number far exceeds the fully-loaded cost you represent to your employer.
Start living these values in how you show up at work. Nobody is going to appreciate it if you rudely push back on every feature request with, "oh yeah? How's it going to make us money?" But your manager will appreciate your asking how you can make a bigger impact. And they probably wouldn't be mad if you were to document and celebrate the ROI wins you notch along the way. Listen to what the company's leadership identifies as the most pressing challenges facing the business and don't be afraid to volunteer to be part of the solution.
All of this would have been good career advice ten years ago. It's not rocket science, it's just deeply uncomfortable for a lot of people.
Good game, programmers
Part of me is already mourning the end of the previous era. Some topics I spent years blogging, speaking, and building tools around are no longer relevant. Others that I've been harping on for years—obsessively-structured code organization and ruthlessly-consistent design patterns—are suddenly more valuable than ever. I'm still sorting out what's worth holding onto and what I should put back on the shelf.
As a person, I really hate change. I wish things could just settle down and stand still for a while. Alas.
If this post elicited strong feelings, please e-mail me and I will respond. If you find my perspective on this stuff useful, you might enjoy my podcast, Breaking Change. 💜
A handy script for launching editors
Today, I want to share with you a handy edit
script I use to launch my editor countless times each day. It can:
edit posse_party
– will launch my editor with project~/code/searls/posse_party
edit -e vim rails/rails
– will change to the~/code/rails/rails
directory and runvim
edit testdouble/mo[TAB]
– will auto-complete toedit testdouble/mocktail
edit emoruby
– will, if not found locally, clone and open searls/emoruby
This script relies on following the convention of organizing working copies of projects in a GitHub <org>/<repo>
format (under ~/code
by default). I can override this and a few other things with environment variables:
CODE_DIR
- defaults to"$HOME/code"
DEFAULT_ORG
- defaults to"searls"
DEFAULT_EDITOR
- defaults tocursor
(for the moment)
I've been organizing my code like this for 15 years, but over the last year I've found myself bouncing between various AI tools so often that I finally bit the bullet to write a custom meta-launcher.
If you want something like this, you can do it yourself:
- Add the edit executable to a directory on your
PATH
- Make sure
edit
is executable withchmod +x edit
- Download the edit.bash bash completions and put them somewhere
- In .profile or
.bashrc
or whatever, runsource path/to/edit.bash
The rest of this post is basically a longer-form documentation of the script that you're welcome to peruse in lieu of a proper README.
How to subscribe to email newsletters via RSS
I have exactly one inbox for reading blogs and following news, and it's expressly not my e-mail client—it's my feed reader. (Looking for a recommendation? Here are some instructions on setting up NetNewsWire; for once, the best app is also the free and open source one.)
Anyway, with the rise of Substack and the trend for writers to eschew traditional web publishing in favor of e-mail newsletters, more and more publishers want to tangle their content up in your e-mail. Newsletters work because people will see them (so long as they ever check their e-mail…), whereas routinely visiting a web site requires a level of discipline that social media trained out of most people a decade ago.
But, if you're like me, and you want to reserve your e-mail for bidirectional communication with humans and prefer to read news at the time and pace of your choosing, did you know you can convert just about any e-mail newsletter into an RSS feed and follow that instead?
Many of us nerds have known about this for a while, and while various services have tried to monetize the same feature, it's hard to beat Kill the Newsletter: it doesn't require an account to set up and it's totally free.
How to convert an e-mail newsletter into a feed
Suppose you're signed up to the present author's free monthly newsletter, Searls of Wisdom, and you want to start reading it in a feed reader. (Also suppose that I do not already publish an RSS feed alternative, which I do).
Here's what you can do:
-
Visit Kill the Newsletter and enter the human-readable title you want for the newsletter. In this case you might type Searls of Wisdom and and click
Create Feed
. -
This yields two generated strings: an e-mail address and a feed URL
-
Copy the e-mail address (e.g.
1234@kill-the-newsletter.com
) and subscribe to the newsletter via the publisher's web site, just as you would if you were giving them your real e-mail address -
Copy the URL (e.g.
https://kill-the-newsletter.com/feeds/1234.xml
) and subscribe to it in your feed reader, as if it was any normal RSS/Atom feed -
Confirm it's working by checking the feed in your RSS reader. Because this approach simply recasts e-mails into RSS entries, the first thing you see will probably be a welcome message or a confirmation link you'll need to click to verify your subscription
-
Once it's working, if you'd previously subscribed to the newsletter with your personal e-mail address, unsubscribe from it and check it in your feed reader instead
That's it! Subscribing to a newsletter with a bogus-looking address so that a bogus-looking feeds starts spitting out articles is a little counter-intuitive, I admit, but I have faith in you.
(And remember, you don't need to do this for my newsletter, which already offers a feed you can just follow without the extra steps.)
Why is Kill the Newsletter free? How can any of this be sustainable? Nobody knows! Isn't the Internet cool?
Visiting Japan is easy because living in Japan is hard
Hat tip to Kyle Daigle for sending me this Instagram reel:
I don't scroll reels, so I'd hardly call myself a well-heeled critic of the form, but I will say I've never heard truer words spoken in a vertical short-form video.
It might be helpful to think of the harmony we witness in Japan as a collective bank account with an exceptionally high balance. Everyone deposits into that account all the ingredients necessary for maintaining a harmonious society. Withdrawals are rare, because to take anything out of that bank account effectively amounts to unilaterally deciding to spend everyone's money. As a result, acts of selfishness—especially those that disrupt that harmony—will frequently elicit shame and admonition from others.
Take trash, for example. Suppose the AA batteries in your Walkman die. There are few public trash cans, so:
-
If you're visiting Japan – at the next train platform, you'll see a garbage bin labeled "Others" and toss those batteries in there without a second thought
-
If you're living in Japan – you'll carry the batteries around all day, bring them home, sort and clean them, pay for a small trash bag for hazardous materials (taxed at 20x the rate of a typical bag), and then wait until the next hazardous waste collection day (which could be up to 3 months in some areas)
So which of these scenarios is more fun? Visiting, of course!
But what you don't see as a visitor is that nearly every public trash can is provided as a service to customers, and it's someone's literal job to go through each trash bag. So while the visitor experience above is relatively seamless, some little old lady might be tasked with sorting and disposing of the train station's trash every night. And when she finds your batteries, she won't just have to separate them from the rest of the trash, she may well have to fill out a form requisitioning a hazardous waste bag, or call the municipal garbage collection agency to schedule a pick-up. This is all in addition to the little old lady's other responsibilities—it doesn't take many instances of people failing to follow societal expectations to seriously stress the entire system.
This is why Japanese people are rightly concerned about over-tourism: foreigners rarely follow any of the norms that keep their society humming. Over the past 15 years, many tourist hotspots have reached the breaking point. Osaka and Kyoto just aren't the cities they once were. There just aren't the public funds and staffing available to keep up with the amount of daily disorder caused by tourists failing to abide by Japan's mostly-unspoken societal customs.
It's also why Japanese residents feel hopeless about the situation. The idea of foreign tourists learning and adhering to proper etiquette is facially absurd. Japan's economy is increasingly dependent on tourism dollars, so closing off the borders isn't feasible. The dominant political party lacks the creativity to imagine more aggressive policies than a hilariously paltry $3-a-night hotel taxes. Couple this with the ongoing depopulation crisis, and people quite reasonably worry that all the things that make Japan such a lovely place to visit are coming apart at the seams.
Anyway, for anyone who wonders why I tend to avoid the areas of Japan popular with foreigners, there you go.
The T-Shirts I Buy
I get asked from time to time about the t-shirts I wear every day, so I figured it might save time to document it here.
The correct answer to the question is, "whatever the cheapest blank tri-blend crew-neck is." The blend in question refers to a mix of fabrics: cotton, polyester, and rayon. The brand you buy doesn't really matter, since they're all going to be pretty much the same: cheap, lightweight, quick-drying, don't retain odors, and feel surprisingly good on the skin for the price. This type of shirt was popularized by the American Apparel Track Shirt, but that company went to shit at some point and I haven't bothered with any of its post-post-bankruptcy wares.
I maintain a roster of 7 active shirts that I rotate daily and wash weekly. Every 6 months I replace them. I buy 14 at a time so I only need to order annually. I always get them from Blank Apparel, because they don't print bullshit logos on anything and charge near-wholesale prices. I can usually load up on a year's worth of shirts for just over $100.
I can vouch for these two specific models:
The Next Level shirts feel slightly nicer on day one, but they also wear faster and will feel a little scratchy after three months of daily usage. The Bella+Canvas ones seem to hold up a bit better. But, honestly, who cares. The whole point is clothes don't matter and people will get used to anything after a couple days. They're cheap and cover my nipples, so mission accomplished.
These 4 Code Snippets won WWDC
WWDC 2025 delivered on the one thing I was hoping to see from WWDC 2024: free, unlimited invocation of Apple's on-device language models by developers. It may have arrived later than I would have liked, but all it took was the first few code examples from the Platforms State of the Union presentation to convince me that the wait was worth it.
Assuming you're too busy to be bothered to watch the keynote, much less the SOTU undercard presentation, here are the four bits of Swift that have me excited to break ground on a new LLM-powered iOS app:
@Generable
and@Guide
annotations#Playground
macroLanguageModelSession
's asyncstreamResponse
functionTool
interface
The @Generable and @Guide annotations
Here's the first snippet:
@Generable
struct Landmark {
var name: String
var continent: Continent
var journalingIdea: String
}
@Generable
enum Continent {
case africa, asia, europe, northAmerica, oceania, southAmerica
}
let session = LanguageModelSession()
let response = try await session.respond(
to: "Generate a landmark for a tourist and a journaling suggestion",
generating: Landmark.self
)
You don't have to know Swift to see why this is cool: just tack @Generable
onto any struct and you can tell the LanguageModelSession
to return that type. No fussing with marshalling and unmarshalling JSON. No custom error handling for when the LLM populates a given value with an unexpected type. You simply declare the type, and it becomes the framework's job to figure out how to color inside the lines.
And if you want to make sure the LLM gets the spirit of a value as well as its basic type, you can prompt it on an attribute-by-attribute basis with @Guide
, as shown here:
@Generable
struct Itinerary: Equatable {
let title: String
let destinationName: String
let description: String
@Guide (description: "An explanation of how the itinerary meets user's special requests.")
let rationale: String
@Guide(description: "A list of day-by-day plans.")
@Guide(.count(3))
let days: [DayPlan]
}
Thanks to @Guide
, you can name your attributes whatever you want and separately document for the LLM what those names mean for the purpose of generating values.
The #Playground macro
My ears perked up when the presenter Richard Wei said, "then I'm going to use the new playground macro in Xcode to preview my non-UI code." Because when I hear, "preview my non-UI code," my brain finishes the sentence with, "to get faster feedback." Seeing magic happen in your app's UI is great, but if going end-to-end to the UI is your only mechanism for getting any feedback from the system at all, forward progress will be unacceptably slow.
Automated tests are one way of getting faster feedback. Working in a REPL is another. Defining a #Playground
inside a code listing is now a third tool in that toolbox.
Here's what it might look like:
#Playground {
let session = LanguageModelSession()
for landmark in ModelData.shared.landmarks {
let response = try await session.respond(
to: "What's a good name for a trip to \(landmark.name)?
Reply only with a title."
)
}
}
Which brings up a split view with an interactive set of LLM results, one for each landmark
in the set of sample data:

Watch the presentation and skip ahead to 23:27 to see it in action.
Streaming user interfaces
Users were reasonably mesmerized when they first saw ChatGPT stream its textual responses as it plopped one word in front of another in real-time. In a world of loading screens and all-at-once responses, it was one of the reasons that the current crop of AI assistants immediately felt so life-like. ("The computer is typing—just like me!")
So, naturally, in addition to being able to await a big-bang respond
request, Apple's new LanguageModelSession
also provides an async streamResponse
function, which looks like this:
let stream = session.streamResponse(generating: Itinerary.self) {
"Generate a \(dayCount)-day itinerary to \(landmark.name). Give it a fun title!"
}
for try await partialItinerary in stream {
itinerary = partialItinerary
}
The fascinating bit—and what sets this apart from mere text streaming—is that by simply re-assigning the itinerary
to the streamed-in partialItinerary
, the user interface is able to recompose complex views incrementally. So now, instead of some plain boring text streaming into a chat window, multiple complex UI elements can cohere before your eyes. Which UI elements? Whichever ones you've designed to be driven by the @Generable
structs you've demanded the LLM provide. This is where it all comes together:

Scrub to 25:29 in the video and watch this in action (and then re-watch it in slow motion). As a web developer, I can only imagine how many dozens of hours of painstaking debugging it would take me to approximate this effect in JavaScript—only for it to still be hopelessly broken on slow devices and unreliable networks. If this API actually works as well as the demo suggests, then Apple's Foundation Models framework is seriously looking to cash some of the checks Apple wrote over a decade ago when it introduced Swift and more recently, SwiftUI.
The Tool interface
When the rumors were finally coalescing around the notion that Apple was going to allow developers to invoke its models on device, I was excited but skeptical. On device meant it would be free and work offline—both of which, great—but how would I handle cases where I needed to search the web or hit an API?
It didn't even occur to me that Apple would be ready to introduce something akin to Model Context Protocol (which Anthropic didn't even coin until last November!), much less the paradigm of the LLM as an agent calling upon a discrete set of tools able to do more than merely generate text and images.
And yet, that's exactly what they did! The Tool
interface, in a slide:
public protocol Tool: Sendable {
associatedtype Arguments
var name: String { get }
var description: String { get }
func call(arguments: Arguments) async throws -> ToolOutput
}
And what a Tool
that calls out to MapKit
to search for points of interest might look like:
import FoundationModels
import MapKit
struct FindPointOfInterestTool: Tool {
let name = "findPointOfInterest"
let description = "Finds a point of interest for a landmark."
let landmark: Landmark
@Generable
enum Category: String, CaseIterable {
case restaurant
case campground
case hotel
case nationalMonument
}
@Generable
struct Arguments {
@Guide(description: "This is the type of destination to look up for.")
let pointOfInterest: Category
@Guide(description: "The natural language query of what to search for.")
let naturalLanguageQuery: String
}
func call(arguments: Arguments) async throws -> ToolOutput {}
private func findMapItems(nearby location: CLLocationCoordinate2D,
arguments: Arguments) async throws -> [MKMapItem] {}
}
And all it takes to pass that tool to a LanguageModelSession
constructor:
self.session = LanguageModelSession(
tools: [FindPointOfInterestTool(landmark: landmark)]
)
That's it! The LLM can now reach for and invoke whatever Swift code you want.
Why this is exciting
I'm excited about this stuff, because—even though I was bummed out that none of this came last year—what Apple announced this week couldn't have been released a year ago, because basic concepts like agents invoking tools didn't exist a year ago. The ideas themselves needed more time in the oven. And because Apple bided its time, version one of its Foundation Models framework is looking like a pretty robust initial release and a great starting point from which to build a new app.
It's possible you skimmed this post and are nevertheless not excited. Maybe you follow AI stuff really closely and all of these APIs are old hat to you by now. That's a completely valid reaction. But the thing that's going on here that's significant is not that Apple put out an API that kinda sorta looks like the state of the art as of two or three months ago, it's that this API sits on top of a strongly-typed language and a reactive, declarative UI framework that can take full advantage of generative AI in a way web applications simply can't—at least not without a hobbled-together collection of unrelated dependencies and mountains of glue code.
Oh, and while every other app under the sun is trying to figure out how to reckon with the unbounded costs that come with "AI" translating to "call out to a hilariously-expensive API endpoints", all of Apple's stuff is completely free for developers. I know a lot of developers are pissed at Apple right now, but I can't think of another moment in time when Apple made such a compelling technical case for building on its platforms specifically and at the exclusion of cross-compiled, multi-platform toolkits like Electron or React Native.
And now, if you'll excuse me, I'm going to go install some betas and watch my unusually sunny disposition turn on a dime. 🤞
Why agents are bad pair programmers
LLM agents make bad pairs because they code faster than humans think.
I'll admit, I've had a lot of fun using GitHub Copilot's agent mode in VS Code this month. It's invigorating to watch it effortlessly write a working method on the first try. It's a relief when the agent unblocks me by reaching for a framework API I didn't even know existed. It's motivating to pair with someone even more tirelessly committed to my goal than I am.
In fact, pairing with top LLMs evokes many memories of pairing with top human programmers.
The worst memories.
Memories of my pair grabbing the keyboard and—in total and unhelpful silence—hammering out code faster than I could ever hope to read it. Memories of slowly, inevitably becoming disengaged after expending all my mental energy in a futile attempt to keep up. Memories of my pair hitting a roadblock and finally looking to me for help, only to catch me off guard and without a clue as to what had been going on in the preceding minutes, hours, or days. Memories of gradually realizing my pair had been building the wrong thing all along and then suddenly realizing the task now fell to me to remediate a boatload of incidental complexity in order to hit a deadline.
So yes, pairing with an AI agent can be uncannily similar to pairing with an expert programmer.
The path forward
What should we do instead? Two things:
- The same thing I did with human pair programmers who wanted to take the ball and run with it: I let them have it. In a perfect world, pairing might lead to a better solution, but there's no point in forcing it when both parties aren't bought in. Instead, I'd break the work down into discrete sub-components for my colleague to build independently. I would then review those pieces as pull requests. Translating that advice to LLM-based tools: give up on editor-based agentic pairing in favor of asynchronous workflows like GitHub's new Coding Agent, whose work you can also review via pull request
- Continue to practice pair-programming with your editor, but throttle down from the semi-autonomous "Agent" mode to the turn-based "Edit" or "Ask" modes. You'll go slower, and that's the point. Also, just like pairing with humans, try to establish a rigorously consistent workflow as opposed to only reaching for AI to troubleshoot. I've found that ping-pong pairing with an AI in Edit mode (where the LLM can propose individual edits but you must manually accept them) strikes the best balance between accelerated productivity and continuous quality control
Give people a few more months with agents and I think (hope) others will arrive at similar conclusions about their suitability as pair programmers. My advice to the AI tool-makers would be to introduce features to make pairing with an AI agent more qualitatively similar to pairing with a human. Agentic pair programmers are not inherently bad, but their lightning-fast speed has the unintended consequence of undercutting any opportunity for collaborating with us mere mortals. If an agent were designed to type at a slower pace, pause and discuss periodically, and frankly expect more of us as equal partners, that could make for a hell of a product offering.
Just imagining it now, any of these features would make agent-based pairing much more effective:
- Let users set how many lines-per-minute of code—or words-per-minute of prose—the agent outputs
- Allow users to pause the agent to ask a clarifying question or push back on its direction without derailing the entire activity or train of thought
- Expand beyond the chat metaphor by adding UI primitives that mirror the work to be done. Enable users to pin the current working session to a particular GitHub issue. Integrate a built-in to-do list to tick off before the feature is complete. That sort of thing
- Design agents to act with less self-confidence and more self-doubt. They should frequently stop to converse: validate why we're building this, solicit advice on the best approach, and express concern when we're going in the wrong direction
- Introduce advanced voice chat to better emulate human-to-human pairing, which would allow the user both to keep their eyes on the code (instead of darting back and forth between an editor and a chat sidebar) and to light up the parts of the brain that find mouth-words more engaging than text
Anyway, that's how I see it from where I'm sitting the morning of Friday, May 30th, 2025. Who knows where these tools will be in a week or month or year, but I'm fairly confident you could find worse advice on meeting this moment.
As always, if you have thoughts, e-mail 'em.