Did you come to my blog looking for blog posts? Here they are, I guess. This is where I post traditional, long-form text that isn't primarily a link to someplace else, doesn't revolve around audiovisual media, and isn't published on any particular cadence. Just words about ideas and experiences.

Tuesday, Nov 18, 2025

TDD is more important than ever

Lately, I've been reminded of the heady days of my agile youth by how often I've found myself asking, "how will we test this?"

As I've mentioned frequently on podcasts and recent Q&As about AI, an odd paradox has emerged in the software industry:

Developers experienced in agile engineering practices like test-driven development tend to be among the most skeptical of AI code generation, often citing fears that software quality is being thrown out the window
Developers experienced in agile engineering practices like test-driven development tend to be among the most successful at building great software with coding agents, often citing creative techniques enabling agents to verify the correctness of their work

In the late 2000s, I always knew I was talking to a solid programmer if their first question upon being handed a complex task was to ask, "how will we test this?" Agile developers learned back then that literally everything hinged on establishing a fast, reliable, automated way to verify your code fulfilled its intended purpose. Without tests, you can't refactor aggressively, deploy frequently, or delete safely. Over the 2010s, many of us learned patterns and heuristics that allowed us to take shortcuts and tone down our testing zeal in the name of pragmatism and efficiency, but the underlying skill of concocting ways to verify our code never stopped being valuable.

Well, here we are again. In 2025, the only thing that matters when it comes to coding agents like Claude Code and Codex CLI is to ensure they are equipped with the tools they need to independently verify the correctness of their work.

Content warning: more content…

Thursday, Oct 30, 2025

How to downgrade Vision Pro

For stupid reasons, I had to downgrade my Vision Pro from visionOS 26.1 to 26.0.1 today. Here's how to put Vision Pro into Device Firmware Update ("DFU") mode and downgrade.

Here's how to restore a Vision Pro in 9 easy steps:

Buy a Developer Strap for $299
Go to ipsw.me and do your best to dodge its shitty ads as you try to download the IPSW restore file for your model Vision Pro at the version you need (if you don't see that version, it's likely because Apple isn't signing it anymore and you're SOL)
Install Apple Configurator to your Mac
Connect the Developer Strap to your Mac via USB-C, and disconnect Vision Pro from power
Get ready to press and hold the top button (not the digital crown, the other one), then reconnect power to Vision Pro and immediately press and hold the top button until the outer screen shows a cable icon
Open Apple Configurator, and you should see a Vision Pro icon.
Drag the IPSW file over the Vision Pro icon and click Restore
Click things and hope it works
Ask yourself what the fuck you did in a past life that brought you to this moment

Good luck, have fun. 🕶️

Thursday, Oct 2, 2025

Is Sora the future of fiction?

I made this yesterday by typing a few words and uploading a couple of pictures to Sora:

When Sora 2 was announced on Tuesday, I immediately saw it as exactly what I've wanted from AI ever since I first saw Stable Diffusion in the Summer of 2022. For years, I've fantasized about breaking free from the extremely limited vocabulary of stock video libraries (as a Descript subscriber, I've long had access to Storyblocks' library). Stitching together stock content to make explainer videos like this one is fun, but the novelty wears off as you quickly burn through all three clips for "child throws spaghetti at family member." Stock video is great if you only talk about mundane household and business topics, but my twisted brain thinks up some pretty weird shit, and conveying my imagination in video would be a lot more labor-intensive than starting yet another banal YouTube channel covering economics or something.

Despite being invite-only, I got access within 24 hours (maybe because I'm a ChatGPT Pro subscriber?), and it confirmed that Sora was exactly what I'd been waiting for:

10-second short-form video
16:9 or 9:16 aspect ratios
Downloadable and re-uploadable elsewhere (watermarked)
Sound, including dialog (provide an exact script or let the model riff)
Can portray your likeness and consenting collaborators'
"Good enough" results within 3-5 prompt iterations
Understands simple direction: film styles, camera angles, scene cuts

The only surprise was that Sora 2 shows up as a social network, not yet another chat interface or infinite search pane. You sign up, you wait, you get four invites, you and your friends have a good time, and you get notified as more friends follow you. We've seen this rollout a dozen times.

In hindsight, Sora had to be a social network. As Meta has demonstrated, nobody wants to stare at an AI Slop Feed if it doesn't feature people they know and recognize. In the abstract, "people you know and recognize" would be off the table without opt-in consent. But durable consent more or less requires a social graph paired with platform-level verification and permission settings. "Deepfakes" have dominated the broader discussion around image generation, not only because they pose a vexing problem to civilization, but also because existing tools lack any built-in chain of trust mechanic—which limited our collective imagination to their use for political disinformation and revenge porn. But when you're on a social network and your videos can only star you and your close friends who've given you permission to use their likeness, OpenAI was actually able to strengthen the app's other guardrails in the process.

That means that while other image and video generators let you get away with posting images of real people as a starting point to work from, Sora disallows naming any person or uploading any image of a human into a prompt. Instead, you can only @mention users on the network, who have created a "cameo" avatar, who have given you permission to use it, and for which your specific prompt isn't disallowed by their preferences. Suddenly, AI deepfakes are (relatively) safe and fun. They star you and your consenting friends. If you piss your friends off, they can delete the videos you put them in. If you keep doing it, they won't be your friends for very long. The platform will surely be under-moderated, but by defaulting to a mutual-follower consent scheme, many of the abuse vectors will be self-policing in practice. (I'm emphatically not saying here that Sora won't result in a raft of psychosis diagnoses and teen suicides, by the way. We are super-duper cooked.)

As for the success of the platform, only time will tell if the novelty of unreality wears off, or if the presence of likenesses we know and recognize paired with imagery that previously required blockbuster movie budgets will be enough to hold our attention. Based on the success of 4o image generation, OpenAI is betting on the latter. I suspect that the platform will only pick up steam following substantial improvements to both the model (improved temporal consistency, complex directorial technique) and the interface (longer videos, sharing tools, storyboarding, series/playlists).

Trust in truth giving way to trust in falsity

Influencer-dominated video platforms have been broken for a long time, in part because their economics depend on winning an audience's trust to influence people towards doing or buying things that reward the influencer. That trust is built on the assumption that the influencer's videos are based in reality. After all, it's a real camera, pointed at a real product, from a real person the viewer has followed for months or years. Besides, why would they lie? They lie, it turns out, because making any kind of money on these platforms is an exhausting, tenuous hustle. Maintaining an audience large enough to make a living as an influencer requires constantly feeding the beast with content. The sponsors that pay the most are the ones whose products won't get as much reach without a high-trust endorsement, which results in the most scammy products and services offering the highest rates. This pushes influencers to make not-so-truthful claims, even if it means selling their audience down the river in the process.

Sora sidesteps all of this because it's all lies all the time. People's trust in institutions and the veracity of what we see on screens is already at an all-time low, and the spread of AI-generated video that's this good will only erode that trust further. Sora-the-platform doesn't accept real videos, but videos generated by Sora-the-model will quickly begin infecting "real" spaces like YouTube, TikTok, and Instagram—causing people's trust in what they see on those platforms to fall even further.

This dynamic gives OpenAI an absolutely diabolical epistemic edge, where the Sora app can be authoritatively false while the other platforms can never be authoritatively true. Users will be able to let their guard down using Sora, but they'll have to be more vigilant than ever on Instagram. If this strikes you as ridiculous, consider that we're simply talking about works of fiction versus works of nonfiction. Right now, every video you consume online is assumed to be nonfiction until you read the comments and learn the video was staged, the creator's body was enhanced with AI post-processing, or the sponsored product causes rectal cancer. So even without Sora, we're currently trapped in this uncanny valley where every video platform is perceived as hosting fictional nonfiction, which results in nothing ever being fully entertaining or fully informative. Sora, meanwhile, is a platform that can only host fiction by definition. And given that about half of media consumption is scripted content from legacy media companies, there's still a pretty good market for fiction out there!

So if you're just looking to kill a few minutes on your phone while you wait in line at the bank, you're probably going to take the path of least resistance—it's why you open Instagram or TikTok in the first place. But those apps offer a slurry that's part-entertainment, part-information, and part-commerce—a feed that has some funny videos, sure, but also sells you bullshit supplements, radicalizes your father, and exacerbates your daughter's body image issues. Sora might offer a path of even less resistance: mindless entertainment that allows you to safely turn your brain off. Because all the content is fake, you don't have to be on guard against being fooled.

The upshot here is that when there's no platform you can trust to be real, the next best thing is a platform where you can trust everything is fake.

What kind of content will succeed

Let's get this out of the way: lots of content won't work on Sora—especially most influencer niches.

If you're famous on Instagram for flaunting your lavish lifestyle and using it to sell garbage-tier sponsored products, what would you even do with Sora? Show off a fake house? Hawk a fake product? You can't upload "real" video to Sora, so what are your sponsors going to do when your gas station energy drink collab is made to look more like a Red Bull once it passes through the AI model? Even if you could place the products perfectly, entire categories of content that influencers have made profitable won't find much to do on Sora. Who wants fake lifestyle videos, cooking recipes, beauty/fashion tips, fitness routines, gameplay footage, or political ragebait?

What does that leave? Entertainment. The creativity and production value of Hollywood scripted entertainment, crossed with the potential for virality of democratized user-generated content. It's as if Quibi and Vine had an AI baby. What this adds up to is Sora is less a tool for influencers and more well-suited for out-of-work Hollywood script writers.

One reason Sora is unlikely to take off until it supports longer-form video and is more adherent to script-like prompts, is that engaging fiction depends on preserving authorial intent. It can already do some pretty wild shit, but Sora doesn't give creators enough to work with to compete with a TV series. We'll get some funny cutaways and creative images, but there's only so much you can do in ten seconds. But if Sora can get any kind of foothold and stick around, those limitations will be lifted. People forget this, but YouTube only supported videos shorter than 10 minutes for its first five years and 15 minutes for its first ten. Today, user-generated YouTube content is giving legacy media a run for their money on widescreen televisions and you can barely find a video shorter than 20 minutes on the platform anymore.

But Sora doesn't need to become an overnight cultural sensation to be valuable. OpenAI will keep funding it, because the research suggests that video generation will unlock the future of general-purpose vision models, and those vision models are the key to autonomous robotics in the real world. And that's the ultimate promise of all these trillion dollar valuations.

What can people use Sora for now?

Setting aside speculation as to what all this means and where things are going, what OpenAI shipped this week is nothing short of extraordinary exactly as it is. For the first time, a platform can bring visual ideas to life with shockingly little effort. By boxing out "real" content, the platform will reward ingenuity and cleverness over social status and superficial aesthetics.

Here are a few ways Sora shines today:

Short-form comedy in the spirit of Vine
Meme/GIF generation
Inside jokes and skits among friends
Design inspiration (like Pinterest or Behance)
Stock video for B-roll and cutaways for use in other videos
The visual equivalent of lo-fi hip-hop (the feed even has a "mood" filter)
Visual prototyping and virtual screen tests for traditional video production
Fan edits and shipping of known characters (note OpenAI changed its intellectual property policy to an explicit opt-out)

Weirder ideas that come to mind:

Capturing dreams (this morning I typed what I remembered into Sora before it faded)
Lore and world-building videos and remixes
Synthetic nostalgia and retro-futurism
Visualizing an alternate life (with kids, without kids, with pets, being more attractive, speaking another language)
False childhood tapes and future video postcards
Public hallucination challenges (e.g., the Tide Pod, cinnamon, or ice bucket challenges, but expressed through prompts and remixes)
Psychological horror and grotesque/absurd glitch art
Ghost messages from someone who published a cameo but has since passed away in real life
Private messaging:
- Families sending video greeting cards of imagined gatherings for birthdays or holidays
- Asynchronous role-play between friends
- Long-distance relationships visualizing co-located experiences

Is all this stuff creepy as shit? Yep. Will people actually do any of this? No clue. I'm a sicko, so I will. But the safe money is always on nothing ever changing and people sticking to whatever they're already doing.

What's less debatable is that the world has never seen anything like this, and it's unlikely we'll be able to fully process its impact until humanity has a chance to catch up (by which point, the tools will only be better). Hold your loved ones close, everybody! Shit's getting weird. 🫠

Sunday, Sep 28, 2025

Why I'm not rushing to take sides in the RubyGems fiasco

We are in the midst of a Ruby drama for the ages. I'm sure a bunch of people figured we were all too old for this shit, but apparently we are not.

This debate has been eating at me ever since the news first broke, but I've tried to keep the peace by staying out of it. Unlike most discourse about what's going on, my discomfort stems less from the issue at hand—what Ruby Central did, how they did it, and how poorly it was communicated—and more to do with how one-sided the public discussion has been. Beneath the surface of this story are the consequences of a decade-old conflict that was never fully resolved. Then and now, one side—André Arko and many people associated with him—has availed itself of public channels to voice their perspective, while the other—which includes a surprisingly wide swath of well-known Ruby and Rails contributors—has chosen to stay silent.

The losers in this dynamic are the vast majority normal everyday Ruby developers, most of whom are operating on very little information and who understandably feel confused and concerned. People whose livelihood depends on the health of the Ruby ecosystem deserve more information than they're getting, especially now that its operational stability has come under threat. The future of that ecosystem is once again uncertain, but—just like last time—the outcome is being shaped by a history that's been kept from the public, widening the rift between its key decision-makers and the communities they serve.

I don't have the answers to what's going on in 2025. A few details have been shared with me—details that would contradict fact-checks and timelines others have pieced together and published—but I can't pretend to have a clear picture of what actually happened, why no one is setting the record straight, or when we'll have clarity on what the future holds. All I can do is offer a little bit of context to explain why I'm dubious of the dominant narrative that has taken shape online. Namely, I don't believe this is a cut-and-dry case of altruistic open-source maintainers being persecuted by oppressive corporate interests.

After you read this, perhaps your perspective will shift as well.

The relevant proper nouns to know

Before anything else can make sense, it's important to understand how weird the governance of the Ruby ecosystem is. There are three moving parts involved that are ostensibly managed by three different groups, but whose members have such broadly overlapping systems access that it has now led to disputes over who owns what:

Ruby itself, created by Matz and maintained by a large group of (mostly Japanese) committers, who host ruby-lang.org, control the @ruby GitHub organization, and are supported by the Ruby Association
RubyGems, specifically the gem and bundler CLI tools distributed with the Ruby language, which is hosted under @rubygems on GitHub
RubyGems.org, the website, API, and host from which gem dependencies are installed and which has been run by Ruby Central for ages

If Ruby were invented today, a single party would probably control all three of these things, but it took nearly fifteen years for today's status quo to take shape. Ruby was invented by someone in Japan in the 1990s. RubyGems was created at a conference in Texas by a few Americans in the early 2000s. RubyGems.org only became the de facto canonical host for gems six years later. My impression is that at no point was communication and coordination particularly fluent between the various parties.

Adding to this, Bundler—a meta tool for resolving the correct versions of all of a project's gem dependencies and which quickly became vital to nearly all Ruby application development—was created independently of the above players by Yehuda Katz and Carl Lerche. André Arko later became the lead maintainer of Bundler, and in 2015 he founded a 501(c)(6) nonprofit called Ruby Together. In 2019, Bundler was folded into RubyGems. In 2022, Ruby Together was absorbed by Ruby Central.

Those last two events—the merging of Bundler and the unwinding of Ruby Together—came about after years of bitter conflict and simmering discord that I hope to shed some light on below. My direct involvement with any of these events was extremely minimal, but I had contemporaneous discussions with dozens of the principals involved. I never donated to Ruby Together and have never materially contributed to Bundler or RubyGems. That said, simply being made aware of several incidents as they were playing out in private was enough to leave behind a scar that has never fully healed. I can only imagine how others are feeling right now. Based on how badly things are playing out this time, it seems they were deeply impacted, too.

The things people told me

The earliest recollection I have of someone telling me about André Arko was in the summer of 2015, after getting dinner with a friend who happened to be a Ruby Together board member. The friend explained that André believed programmers working on open source tools deserve to earn an income that's commensurate with what salaried engineers earn at the companies who benefit from those tools. As such, André's goal with Ruby Together was characterized as an effort to fund development activities—initially his own, but eventually others—by paying themselves a market hourly rate. I remember being extremely sympathetic to this perspective, having also wasted countless hours of my life maintaining open source for free only for others to benefit from it. I also recall a figure like either $200 or $250 per hour being mentioned as the rate he was effectively paying himself. Whatever the rate actually was, I distinctly remember thinking, "holy shit, that's a lot higher than individual donors would probably assume." (UPDATE: A RubyTogether donor forwarded me newsletters from 2015 and 2022, both stating an hourly rate of $150 per hour being paid.)

The first time I remember meeting André in person was at Ruby on Ales 2016. I remember trying to make a good impression, because growing my network in the community was the primary reason I spoke at conferences. I was presenting with my beloved 12-inch MacBook, which meant I was traveling with the first iteration of this cursed dongle. André needed an adapter, so I ran up to lend him mine. As he was giving it back, I recall him making a half-joking, flippant remark about either his dongle or his computer, saying that "Ruby Together will just buy me another one." It really rubbed me the wrong way. Over the years to follow, more than one person told me stories of André paying for shared meals on behalf of Ruby Together without an apparent legitimate justification. They told those stories, I assume, because the attitude he exhibited made them uncomfortable. If I had donated money to Ruby Together and heard the same stories, I would have been upset.

For how little has been said about this publicly, a lot of different people told me a lot of concerning stories about Ruby Together over the years, often providing evidence to back it up. I'll do my best to stick to the highlights in this post. Hopefully they will explain why I have not joined the rush to defend the maintainers whose access was recently removed. What Ruby Central did was undoubtedly handled very poorly, but I don't think their bungling of its execution and communication alone is enough to answer the question of what should happen to the future custody and direction of Bundler, RubyGems, and RubyGems.org.

2015

When Ruby Together first launched in 2015, the website suggested donations went to pay "our team", which initially linked to a list of the board members without any explanation of how the money was being allocated. This resulted in a nonzero number of donors believing they were funding the work of people like Steve Klabnik, Aaron Patterson, and Sarah Mei, when in fact only André was being paid at the time. Shortly after the wording was raised as misleading, the team page was updated accordingly.

In May of 2015, André suggested making support for older versions of Bundler contingent on Heroku paying Ruby Together, which was interpreted as leveraging his control over Bundler as a pay-to-play scheme. Because Heroku serves other people's Ruby apps—many of which aren't updated for very long stretches—the security of their service depends on clear and predictable support windows for Ruby's core technologies, it seems reasonable they would interpret this sudden revocation of support as a pressure tactic, aimed to solicit corporate sponsorship for Ruby Together. (Years later, André responded to a feature request from a Heroku engineer, which was interpreted at the time as indicating the feature would be withheld from Bundler because Heroku had failed to pay Ruby Together.)

2016

The minutes of a December 2016 Ruby Together board meeting were leaked. The document acknowledged who was paying for the RubyGems.org service at the time: "Fastly is comping us something like $35k worth of CDN service per month. And that's on top of Ruby Central paying for $5k of servers and Ruby Together paying for about $15k of dev work every month." The use of "us" in that sentence suggests that Ruby Together was responsible for hosting RubyGems.org, which Ruby Central later went on to publicly dispute. Additionally, one presumes the number of hours and rate paid for development work was determined by Ruby Together itself, rather than being a hard operational cost. Later in the document was a discussion of potential strategies to increase revenue after "new memberships [had] flatlined." Several ideas were discussed, culminating in a proposal to rate-limit access to RubyGems.org as the apparent best option:

Rate limiting RubyGems.org seems like the option that scales most linearly with our costs. Companies that cost us money need to pay more (or stop costing us money), and companies that don't cost us money can continue to have a free ride. To be clear, this would not mean cutting off anyone's access to RubyGems.org. I'm imagining that it would work something like GitHub's model: anonymous access has a low limit, registered accounts have a higher limit, and even higher limits are available at each Ruby Together membership level. There are a lot of implementation details that would need to be worked out, but in general I feel like this is probably the most effective option to make companies feel like they are paying money for something they use and that covers our costs well.

The leaked minutes were widely circulated in private at the time, due largely to outrage over the document's presupposition that Ruby Together was paying to host RubyGems.org (citing "our costs" as scaling with usage), as opposed to paying for developer effort (the costs of which do not scale with usage). The leak left myself and others worried that André might leverage his systems access to effectively hold the Ruby ecosystem hostage for the financial benefit of Ruby Together and—since it was compensating his own development efforts—André himself.

2017

In January 2017, André added a "post-install message" imploring users to fund Ruby Together, which would be displayed every time anyone installed Bundler. Unlike the aforementioned board meeting, this happened in the open and triggered an immediate backlash before eventually being rolled back. In one comment, André defended the action and pointed to Shopify's failure to pay Ruby Together, publicly conflating Ruby Together's sponsorship of development effort with "~$40k/mo worth of servers." But, as Ruby Together's own board minutes from the month prior directly stated, money sent to Ruby Together wasn't going to pay server expenses—server costs were covered by Fastly and Ruby Central.

In February 2017, following protracted discussion of the post-install message and the threat of rate-limiting access to gem installs, I agreed to put my name on a letter alongside 18 others (including one of Bundler's creators). The letter requested Ruby Together stop misleading the community in this way. My understanding is that some private one-on-one communication followed, that none of it was particularly productive, and that no formal communication occurred between the two groups afterward.

In March 2017, Ruby Central went on the record, attempting to clear up confusion and reassure users that they alone were in control of RubyGems.org, stating that Ruby Together donations were immaterial to its continued operation:

Unfortunately, this past year has also given rise to some misunderstandings about this relationship in some quarters: chiefly, that by donating to Ruby Together, companies were paying for the operations of RubyGems. And in turn, that if enough companies didn't donate to Ruby Together, RubyGems would subsequently be in a perilous situation. This isn't so.

No one in the Ruby community should worry about the availability or security of RubyGems being connected in any way to the fundraising of Ruby Together. Funds raised by Ruby Together go primarily towards paying developers to add features and fix bugs. Ruby Central, on the other hand, is wholly responsible for the operations and baseline stability of the system. While these two efforts go hand-in-hand, it's vitally important to understand that they are two different things. Ruby Together's requests for donations do not mean that there is any reason for concern about RubyGems' continued existence or operation.

Later, in August 2017, André accused Google Cloud Platform of wholesale copying gemstash's codebase, going so far as to threaten legal action in his opening message. He juxtaposed the accusation with the complaint that Google had, "repeatedly declined to support Ruby Together." The incident appeared to fit a pattern of behavior to pair high-conflict messaging with an admonition of the target's failure to fund the organization that paid him. Ultimately, André's claim turned out to be factually baseless—Google hadn't copied gemstash's code, after all.

2018–2024

Things quieted down and I didn't hear much about any of this stuff anymore. Eventually, Bundler became part of RubyGems and many folks from Ruby Together migrated to analogous roles at Ruby Central.

2025

In August 2025, and seemingly out of nowhere, someone pointed me to the project spinel-coop/rv-ruby, an apparent fork of homebrew/homebrew-portable-ruby. I say "apparent", because rather than using GitHub's fork button—which would have maintained clear attribution of who created the upstream project—it looks like it was instead cloned and re-pushed by André. Specifically, I was sent this commit replacing references to Homebrew from late July. As evidence of Homebrew's authorship was being erased and obscured, no additional acknowledgement was added to credit Homebrew for having created and maintained Portable Ruby since 2016.

It immediately reminded me of André's baseless accusation against Google. "Not only did you not credit the Gemstash project in any way," André wrote, "from an ethical standpoint, this is also super gross." His blatant copying of Portable Ruby (a project significant enough that the lead maintainer gave a talk about how they did it) struck me as brazenly hypocritical, given André's previous litigious and mistaken accusation against Google.

In fairness to André, the rv-ruby repo continues to retain a copy of Homebrew's LICENSE.txt which names "Homebrew contributors" as the copyright holder. André also later added an explicit acknowledgement to the README, but that attribution came more than a month later, and (I'm told) only after he was directly asked to credit the original project.

André wrote this week that, "Ruby Together did not ever, at any point, demand any form of governance or control over the existing open source projects. Maintainers did their thing in the RubyGems and Bundler GitHub orgs, while Ruby Together staff and board members did their thing in the rubytogether GitHub org." However, while he was leading Ruby Together, he moved to restrict committer access to RubyGems in rubygems/rubygems, unilaterally erased the original authorship from Bundler's gemspec in bundler/bundler, and oversaw a number of contributors being removed from bundler/bundler-site in a redesign of Bundler's website.

As a result of this broader historical context and in spite of the serious claims and grave implications being thrown around this month, I'm trying my best not to rush to judgment about who's at fault in the current conflict and would urge others to do the same. The future of the Ruby ecosystem may depend on it.

Updated on September 28th to clarify Ruby Together's published rates paid to André and other developers as $150 per hour.

Updated on September 29th to include a link to a contemporaneously-uploaded copy of the leaked December 2016 Ruby Together board minutes.

Tuesday, Sep 23, 2025

How to automatically add chapters to your podcast

A frequent request from listeners of my Breaking Change podcast has been for chapter support. At one point, I tried to manually incorporate this into my (extremely light) editing workflow, but it was fiddly and error-prone to do manually.

That is, until yesterday, when I had the thought, "what if I had a script that could detect each time the audio switched from mono to stereo?"

See, like most podcasts, I record my voice in mono, but the music jingles (or "stingers") are all in stereo. And because each mono segment is punctuated by a stereo stinger, the resulting timestamps would indicate exactly where the chapter markers ought to go.

So, an hour later, some new shovelware was born! I call it autochapter, and you can install it with homebrew:

brew install searlsco/tap/autochapter

Once installed, just pass autochapter your chapter names as a text file or a list of flags, like this:

autochapter \
  -s Intro \
  -s Follow-up \
  -s "Aaron's Pun" \
  -s News \
  -s Recommendations \
  -s Mailbag \
  -s Outro \
  v44.mp3

And you'll get a remuxed version of the audio file (e.g. v44-chapters.mp3), as well as textual readout of the chapters, ready to be pasted into your YouTube description:

Chapters:

0:00 Intro
24:11 Follow-up
53:45 Aaron's Pun
56:14 News
1:49:03 Recommendations
2:03:52 Mailbag
2:33:11 Outro

As you might surmise from the examples, v44 of the show is the first version to ship with chapters.

And that's about all there is to it. I wrote autochapter with Codex CLI in one shot, and it's a great example of a project I would have never bothered building if it weren't for a coding agent to do the gruntwork for me. That makes autochapter Certified Shovelware.

Friday, Sep 19, 2025

Why I bought the iPhone Air

If you read reviews of iPhone Air, you will quickly find that the pundit class has concluded it's a mixed bag. A "compromised" product, even.

For tech reviewers lining up all these phones next to each other and weighing the pros and cons, I can absolutely understand how iPhone Air doesn't seem to earn its spot in the lineup at $999. Just look at all these downsides:

Battery: The battery life is slightly worse than iPhone 17 and much worse than iPhone 17 Pro
Performance: The A19 Pro chip in iPhone is not only binned (it loses a GPU core), it's so thermal-constrained it probably wouldn't be able to use that core anyway—one review saw significantly worse sustained performance from iPhone Air than the base level A19 in iPhone 17
Speakers: iPhone Air lacks stereo speakers, a feature that was added in iPhone 7 (a consolation prize for dropping the headphone port, I guess)
Cameras: iPhone Air offers the same(-ish) main camera as its model year brethren, but lacks an ultra-wide lens, a telephoto lens, and a LIDAR sensor. That means no macro mode, no optical quality zoom beyond 2x, no spatial photo or video capture, and reduced portrait/AR performance

In its preliminary assessment of Apple's offerings, The Accidental Tech podcast went so far as to speculate iPhone Air wouldn't appeal to tech enthusiasts at all, and perhaps will only sell to fashion-conscious consumers who won't know what they're missing.

Indeed, the through-line connecting every review I've read—whether framed positively or negatively in its conclusions—is a struggle to answer the question, "Who is iPhone Air for?"

Well, it's for me. That's who.

Whenever the claim is made that, "nobody is asking for a thinner iPhone," I make a point of piping up. My favorite iPhone of the last decade was easily the iPhone 13 mini, and when I upgraded to iPhone 14 Pro, it was so heavy that I got in the habit of leaving the house with only my cellular Apple Watch Series 8. My favorite Apple computer of all time was the 12" MacBook, and I am perennially disappointed that Apple has deprioritized weight ever since (the lightest Mac currently on offer is the MacBook Air, which is 33% heavier than the decade-old MacBook).

That's why I didn't hesitate to put in an order for the new iPhone Air, downsides and all:

Battery: Where others see the 39-hour battery life on iPhone 17 Pro Max as a triumph, I see it as dead fucking weight. My iPhone spends 90% of every day on a MagSafe charger near my front door. The iPhone Air's battery is about a third smaller than iPhone 16 Pro's battery, but in the past year I've only seen my battery dip below 33% a handful of times. Tell you what, when I'm traveling, I'll bring an iPhone Air MagSafe Battery just to put your mind at ease. It's true, I may need the last third of my phone's battery 10% of the time, but I benefit from my devices' relative size and weight advantages 100% of the time
Performance: There isn't a single thing I use my iPhone for that takes full advantage of its computing power, and if ever there was, it would be a signal to switch to a more serious device like an iPad Pro or a Mac. At the same time, I get it. It sucks to see your new device post shitty benchmark scores, because it confirms that you have a tinier penis than your friends. I'll just have to find a way to cope, I guess
Speakers: I'm not here to judge, but I can't understand why anyone uses their iPhone speakers at all. The only ones I see using this feature are people's shitheel children, inconsiderate assholes, and airport workers on break who probably can't afford AirPods. If you use your speakers in some non-disruptive way, God bless—what goes on in the privacy of your own home is up to you
Cameras: iPhone Air's rear camera system is undoubtedly a shortcoming. Since Becky is going Pro, she's enabling my Air purchase by signing up for the emotional labor of being the family's chief photographer for the 2025-2026 season. We trade off on this—I lugged my iPhone 16 Pro around last year so she could lighten her load with a green iPhone 16. Even still, last year fewer than 5% of my photos were taken with either of the ultra-wide or telephoto lens, so I doubt I'll miss them

As someone who has been using iOS 26 all summer, there's one more reason I'm glad to be switching to iPhone Air: information density is significantly lower throughout iOS 26, which has a dramatic negative impact on the usability of smaller displays, even the 6.3" iPhone 16 Pro. That's because with the 26 series of releases, the new unified design across Apple's platforms features much more negative space between its controls and views—all in the name of concentricity. As soon as I updated my iPhone 16 Pro to the iOS 26 beta, I was immediately put off by how much less text was being rendered and how much more I was scrolling to get what I needed. By ordering another 6.3" iPhone, I'd be locking in those losses. But iPhone Air's larger 6.6" display claws back just enough additional screen estate to make it a wash. I don't want a bigger screen, I want an OS that doesn't punish smaller screens. And it's nice to want things.

Will iPhone Air sell well? Don't ask me, I'm the guy who just said his favorite iPhone was iPhone 13 mini and favorite Mac was the 12" MacBook—both of which flopped. I'm certainly not arguing this thing is going to light sales charts on fire, simply that it's not entirely irrational to conclude that iPhone Air is the best phone in this year's line up.

Anyway, this is just my take. You do you.

Saturday, Sep 6, 2025

Distributing your own scripts via Homebrew

I use Homebrew all the time. Whenever I see a new CLI that offers an npm or uv install path alongside a brew one, I choose brew every single time.

And yet, when it comes time to publish a CLI of my own, I usually just ship it as a Ruby gem or an npm package, because I had (and have!) no fucking clue how Homebrew works. I'm not enough of a neckbeard to peer behind the curtain as soon as root directories like /usr and /opt are involved, so I never bothered before today.

But it's 2025 and we can consult LLMs to conjure whatever arcane incantations we need. And because he listens to the cast, I can always fall back on texting Mike McQuaid when his docs suck.

So, because I'll never remember any of this shit (it's already fading from view as I type this), below are the steps involved in publishing your own CLI to Homebrew. The first formula I published is a simple Ruby script, but this guide should be generally applicable.

Glossary

Because Homebrew really fucking leans in to the whole "home brewing as in beer" motif when it comes to naming, it's easy to get lost in the not-particularly-apt nomenclature they chose.

Translate these in your head when you encounter them:

Formula → Package definition
Tap → Git repository of formulae
Cask → Manifest for installing pre-built GUIs or large binaries
Bottle → Pre-built binary packages that are "poured" (copied) instead of built from source
Cellar → Directory containing your installed formulae (e.g. /opt/homebrew/Cellar)
Keg → Directory housing an installed formula (e.g. Cellar/foo/1.2.3)

Overview

First thing to know is that the Homebrew team doesn't want your stupid CLI in the core repository.

Instead, the golden path for us non-famous people is to:

Make your CLI, push it to GitHub, cut a tagged release
Create a Homebrew tap
Create a Homebrew formula
Update the formula for each CLI release

After you complete the steps outlined below, users will be able to install your cool CLI in just two commands:

brew tap your_github_handle/tap
brew install your_cool_cli

Leaving the "make your CLI" step as an exercise for the reader, let's walk through the three steps required to distribute it on Homebrew. In my case, I slopped up a CLI called imsg that creates interactive web archives from an iMessage database.

Create your tap

Here's Homebrew's guide on creating a tap. Let's follow along how I set things up for myself. Just replace each example with your own username or organization.

For simplicity's sake, you probably want a single tap for all the command line tools you publish moving forward. If that's the case, then you want to name the tap homebrew-tap. The homebrew prefix is treated specially by the brew CLI and the tap suffix is conventional.

First, create the tap:

brew tap-new searlsco/homebrew-tap

This creates a scaffold in /opt/homebrew/Library/Taps/searlsco/homebrew-tap. Next, I created a matching repository in GitHub and pushed what Homebrew generated:

cd /opt/homebrew/Library/Taps/searlsco/homebrew-tap
git remote add origin git@github.com:searlsco/homebrew-tap.git
git push -u origin main

Congratulations, you're the proud owner of a tap. Now other homebrew users can run:

brew tap searlsco/tap

It doesn't contain anything useful, but they can run it. The command will clone your repository into their /opt/homebrew/Library/Taps directory.

Create your formula

Even though Homebrew depends on all manner of git operations to function and fully supports just pointing your formula at a GitHub repository, the Homebrew team recommends instead referencing versioned tarballs with checksums. Why? Something something reproducibility, yadda yadda open source supply chain. Whatever, let's just do it their way.

One nifty feature of GitHub is that they'll host a tarball archive of any tags you push at a predictable URL. That means if I run these commands in the imsg repository:

git tag v0.0.5
git push --tags

Then GitHub will host a tarball at github.com/searlsco/imsg/archive/refs/tags/v0.0.5.tar.gz.

Once we have that tarball URL, we can use brew create to generate our formula:

brew create https://github.com/searlsco/imsg/archive/refs/tags/v0.0.5.tar.gz --tap searlsco/homebrew-tap --set-name imsg --ruby

The three flags there do the following:

--tap points it to the custom tap we created in the previous step, and will place the formula in /opt/homebrew/Library/Taps/searlsco/homebrew-tap/Formula
--set-name imsg will name the formula explicitly, though brew create would have inferred this and confirmed it interactively. The name should be unique so you don't do something stupid like make a CLI named TLDR when there's already a CLI named TLDR or a CLI named standard when there's already a CLI named standard
--ruby is one of several template presets provided to simplify the task of customizing your formula

Congratulations! You now have a formula for your CLI. It almost certainly doesn't work and you almost certainly have no clue how to make it work, but it's yours!

This is where LLMs come in.

Run brew install --verbose imsg
Paste what broke into ChatGPT
Update formula
GOTO 1 until it works

Eventually, I wound up with a working Formula/imsg.rb file. (If you're publishing a Ruby CLI, feel free to copy-paste it as a starting point.) Importantly, and a big reason to distribute via Homebrew as opposed to a language-specific package manager, is that I could theoretically swap out the implementation for some other language entirely without disrupting users' ability to upgrade.

Key highlights if you're reading the formula contents:

All formulae are written in Ruby, not just Ruby-related formulae. Before JavaScript and AI took turns devouring the universe, popular developer tools were often written in Ruby and Homebrew is one of those
You can specify your formula's git repository with the head method (though I'm unsure this does anything)
Adding a livecheck seemed easy and worth doing
Adding a test to ensure the binary runs can be as simple as asserting on help output. Don't let the generated comment scare you off
Run brew style searlsco/tap to make sure you didn't fuck anything up.
By default, the --ruby template adds uses_from_macos "ruby", which is currently version 2.6.10 (which was released before the Covid pandemic and end-of-life'd over three years ago). You probably want to rely on the ruby formula with depends_on "ruby@3" instead

When you're happy with it, just git push and your formula is live! Now any homebrew user can install your thing:

brew tap searlsco/tap
brew install imsg

Update the formula for each CLI release

Of course, any joy I derived from getting this to work was fleeting, because of this bullshit at the top of the formula:

class Imsg < Formula
  url "https://github.com/searlsco/imsg/archive/refs/tags/v0.0.5.tar.gz"
  sha256 "e9166c70bfb90ae38c00c3ee042af8d2a9443d06afaeaf25a202ee8d66d1ca04"

Who the fuck's job is it going to be to update these URLs and SHA hashes? Certainly not mine. I barely have the patience to git push my work, much less tag it. And forget about clicking around to create a GitHub release. Now I need to open a second project and update the version there, too? And compute a hash? Get the fuck out of here.

Now, I will grant that Homebrew ships with a command that opens a PR for each formula update and some guy wrapped it in a GitHub action, but both assume you want to daintily fork the tap and humbly submit a pull request to yourself. Clearly all this shit was designed back when Homebrew was letting anybody spam shit into homebrew-core. It's my tap, just give me a way to commit to main, please and thank you.

So anyway, you can jump through all those hoops each time you update your CLI if you're a sucker. But be honest with yourself, you're just gonna wind up back at this stupid blog post again, because you'll have forgotten the process. To avoid this, I asked my AI companion to add a GitHub workflow to my formula repository that automatically commits release updates to my tap repository.

If you want to join me in the fast lane, feel free to copy paste my workflow as a starting point. The only things you'll need to set up yourself:

You'll need a personal-access token:
- When creating the PAT, add your homebrew-tap repository and Content → Write permissions
- In the formula repository's settings under Secrets and variables → Actions → Repository secrets and name it HOMEBREW_TAP_TOKEN (GitHub docs)
You'll need to specify the tap and formula environment variables
You'll probably want to update the GitHub bot account, probably to the GitHub Actions bot if you don't have your own:
- GH_EMAIL: 41898282+github-actions[bot]@users.noreply.github.com
- GH_NAME: github-actions[bot]

Now, whenever you cut a release, your tap will be updated automatically. Within a few seconds of running git push --tags in your formula's repositories, your users will be able to upgrade their installations with:

brew update
brew upgrade imsg

That's it. Job's done!

The best part

This was a royal pain in the ass to figure out, so hopefully this guide was helpful. The best part is that once your tap is set up and configured and you have a single working formula to serve as an example, publishing additional CLI tools in the future becomes almost trivial.

Now, will I actually ever publish another formula? Beats me. But it feels nice to know it would only take me a few minutes if I wanted to. 🍻

Friday, Aug 29, 2025

This blog has a comment system

The day before we recorded our episode of Hotfix, Scott Werner asked a fair question: "so, if you're off social media and your blog doesn't have a comment system, how do you want people to respond to your posts? Just email?"

I answered, "actually my blog does have a comment system."

Here's how to leave a comment on this web site:

Read a post
Think, "I want to comment on this"
Draft a post on your blog
Add a hyperlink to my post
Paste an excerpt to which you want to respond
Write your comment
Hit publish

I admit, it's quaint. It involves a number of invisible steps, like 2.1 where you start a blog (which is actually pretty easy but not free of friction). You should try it.

It is 2025 and the Web—the capital-W Web—is beleaguered. The major platforms have long-since succumbed to enshittification, but their users aren't going anywhere. Some among us courageously voice their dissent, but always from the safe confines of their favorite walled garden. They drop a note in the jailkeeper's suggestion box as they scroll past the Squarespace ads littering their algorithmic timelines. Others have fled to open and open-flavored networks, but everyone eventually realizes they can't go home again.

But that's not why I want you to adopt this blog's commenting system. I'm not a high-minded individual who cares about the intellectual project of the World Wide Web as a bastion for free expression or whatever the fuck. No. I just had a super rad time on the Internet from 2000 to 2006 and I want to do my part to bring it back.

Back then, I would find a blog and follow it—via its feed when possible, or else by adding it to a folder of bookmarks—and check it daily.

But what about discoverability? How did anyone find these websites? Bloggers couldn't rely on platforms' social graphs or algorithmic timelines to build awareness, so they had to bake discoverability into the product. Some sites had a "blogroll" of recommendations in the sidebar. But the most effective approach was the art of "blogging as a conversation." When an author read something that provoked them to write, they'd link to the offending piece, excerpt it, and provide their own commentary. And because humans are vain, the original post's author would frequently circle back and write their own response to the response. The net effect was that each author's audience would get exposure to the other writer. Even if the authors were in violent disagreement, readers of one might appreciate the other's perspective and subscribe to them.

Blogging as a conversation—as a comments section—was valuable because it was purely positive-sum. As an author, I benefit because another author's opinions inspired me to write. The other author benefits because linking to them offers access to my readership. My readers benefit because they're exposed to complementary and contrasting viewpoints.

Growth was slow and organic but more meaningful and durable. It was a special time.

More on my personal history with blogging

If I really enjoyed someone's blog, I'd rush to read their stuff first. If an author's posts weren't so stimulating, I wasn't shy about unsubscribing. And I could afford to be picky—there was no shortage of content! Even with aggressive curation, by 2005 I had subscribed to so many feeds in Google Reader that I struggled to stay on top of them all. My grades suffered because I was "j-walking" hundreds of blog posts each day instead of doing homework.

Then, Facebook's feed, Tumblr, and Twttr came along, and they took the most enjoyable parts of surfing the 1.0 Web—novel information and connectivity with others—and supercharged them. They were "good Web citizens" in the same way the closed-source, distributed-to-exactly-one-server Bluesky is today. The timelines were reverse chronological. They handled the nerdy tech stuff for you. None of the feeds had ads yet.

Blogging didn't stand a chance.

I failed to see it at the time, but blogging did have one advantage over the platforms: it was a goddamn pain in the ass. Whether you flung files over an FTP client or used a CMS, writing a blog post was an ordeal. If you were going to the trouble of posting to your blog, you might as well put your back into it and write something thoughtful. Something you could take pride in. Something with a permalink that (probably wouldn't, but) could be cited years later.

The platforms offered none of that. You got a tiny-ass text area, a stingy character limit, and a flood of ephemera to compete with. By demoting writing to a subordinate widget of the reading interface, the priority was clear: words were mere grist for the content mill. The shame of it all was that these short-form, transient, low-effort posts nevertheless sated many people's itch to write at all. I was as guilty of this as anyone. From 2009 through 2020, I devoted all my writing energy to Twitter. Except for that brief year or two where Medium was good, I basically stopped thinking in longform. Instead, I prided myself on an ability to distill 2,000-word essays down to 140-character hot takes. Many of those takes reached millions of people and made me feel good for a very brief amount of time.

My brain was cooked. When it finally sank in, I quit.

It took almost three years to recover. I'm on the other side now, and am happy to report I can now think thoughts more than a sentence or two long.

Last night, I got dinner with two old friends, Chris Nelson and Joshua Wood. Josh asked how it's been since I quit paying attention to social media. I thought about the unfinished draft of this post.

In truth, this blog and its attendant podcast empire have been a refuge for my psyche. A delightful place to share pieces of myself online. Somewhere to experiment in both form and format. A means of reclaiming my identity from a smattering of social media profile pages and into something authentic and unique.

Today, as the platforms wane, it feels like this conversational approach to blogging is seeing new life. As a readership has slowly gathered around this blog, I've separately been curating a fresh list of thoughtful bloggers that inspire me to write. Maybe I'll add a blogroll to my next redesign. I'm already writing more linkposts.

In short, blogging might be back. Hell, I just came back from coffee with my friend Anthony, and—without my having brought up the topic—he showed me his new blog.

So, if you're considering engaging with my comment system—if you're thinking about starting a blog or dusting off your old one—here's some unsolicited advice:

Do it for you. Priority one is taking the time to grapple with your thoughts, organize your ideas, and put them into words. Priority two is reaching the finish line and feeling the pride of authorship. That anyone actually reads your work should be a distant third place
Focus on building an audience rather than maximizing reach. Getting in front of eyeballs is easier on the platform, but it's fleeting. Platforms reward incitement, readers reward insight. Success is a lagging indicator of months and years of effort, but it's long-lasting. I genuinely believe each of the readers of this site are as valuable as a hundred followers on social media
Give your blog your best work. Don't waste your creative juices trying to be clever on someone else's app. Consider syndicating crossposts to your social accounts as a breadcrumb trail leading back to your homepage. You can do this with Buffer, Publer, SocialBee, or my upcoming POSSE Party
Cut yourself some slack. Pretty much everyone is an awful writer. If you saw how long it takes me to write anything of substance, you'd agree that I'm an awful writer, too. Thankfully, good ideas have a way of shining through weak rhetoric and bad grammar. All that matters is training this learned response: have an idea, write it down, put it out

That's all I've got. If you choose to leave a comment on this post on your own blog, e-mail it to me, and I'd be delighted to read it. Maybe it'll inspire me to write a response! 💜

Friday, Aug 22, 2025

Sprinkling Self-Doubt on ChatGPT

I replaced my ChatGPT personalization settings with this prompt a few weeks ago and promptly forgot about it:

Be extraordinarily skeptical of your own correctness or stated assumptions. You aren't a cynic, you are a highly critical thinker and this is tempered by your self-doubt: you absolutely hate being wrong but you live in constant fear of it

When appropriate, broaden the scope of inquiry beyond the stated assumptions to think through unconvenitional opportunities, risks, and pattern-matching to widen the aperture of solutions

Before calling anything "done" or "working", take a second look at it ("red team" it) to critically analyze that you really are done or it really is working

I noticed a difference in results right away (even though I kept forgetting the change was due to my instructions and not the separately tumultuous rollout of GPT-5).

Namely, pretty much every initial response now starts with:

An expression of caution, self-doubt, and desire to get things right
Hilariously long "thinking" times (I asked it to estimate the macronutrients in lettuce yesterday and it spent 3 minutes and 59 seconds reasoning)
A post-hoc adversarial "red team" analysis of whatever it just vomited up as an answer

I'm delighted to report that ChatGPT's output has been more useful since this change. Still not altogether great, but better at the margins. In particular, the "red team" analysis at the end of many requests frequently spots an error and causes it to arrive at the actually-correct answer, which—if nothing else—saves me the step of expressing skepticism. And even when ChatGPT is nevertheless wrong, its penchant for extremely-long thinking times means I'm getting my money's worth in GPU time.

Wednesday, Aug 20, 2025

What's the Hotfix?

I recently started an interview series on the Breaking Change feed called Hotfix. Whereas each episode of Breaking Change is a major release full of never-before-seen tech news, life updates, and programming war stories, Hotfix. It's versioned as a patch release on the feed, because each show serves only to answer the question, "what's the hotfix?"

Because I've had to explain the concept over and over again to every potential guest, I sat down to write a list of what they'd be getting themselves into by agreeing to come on the show. (Can't say I didn't warn them!)

Here's the rider I send prospective guests:

Each Hotfix episode exists to address some problem. Unlike a typical interview show featuring an unstructured open-ended conversation with a guest, we pick a particular problem in advance—ideally one that the guest gets really animated/activated or even angry about—and we jointly rant about it, gradually exploring its root causes and breaking it down together
Each episode concludes with us answering the question, "what's the hotfix?" Ultimately, we decide on a pithy, reductive one-line solution to the problem that will serve as the show title (ideally, it's a hot take that not everyone will agree with or feel comfortable about)
It's an explicit-language show and I'm pretty clear with the audience that the Breaking Change family of brands is intended for terrible people (or at least, the terrible person inside all of us). You aren't required to swear to be on the show, but if my potty mouth makes you uncomfortable, then let me know and I'll recommend some worse podcasts you can appear on instead
I joke at the top that my goal as the host is to, "get my guest to say something that'll get them fired." Since I'm functionally retired and have no reason to hold back from explicit language, irreverence, and dark humor in the mainline Breaking Change podcast, I can't help but poke guests with attempts to drag them down to my level. You can play with this as much as you want or take the high ground, but we'll all have more fun if you let loose a bit more than you otherwise would
Why am I doing this? First, because I'm incurious and uninterested in learning about other people, which I'm told is an important part of being a good interviewer. Second, I have a theory that this unusual brand of authenticity will lend credibility to whatever solution the guest is trying to argue for or plug. By keeping listeners on their toes and pushing them out of their comfort zones, each episode stands to effect greater change than a typical milquetoast podcast could

If this has piqued your interest, you can listen to or watch the first episode of Hotfix with Dave Mosher. It may not seem very hot at first, but please grade on a curve as Dave speaks Canadian English. I've got a couple exciting guests booked over the next few weeks and I'm looking forward to seeing where the show takes us.

Tuesday, Aug 19, 2025

Which of your colleagues are screwed?

I've been writing about how AI is likely to affect white-collar (or no-collar or hoodie-wearing) computer programmers for a while now, and one thing is clear: whether someone feels wildly optimistic or utterly hopeless about AI says more about their priors than their prospects. In particular, many of the people I already consider borderline unemployable managed to read Full-breadth Developers and take away that they actually have nothing to worry about.

So instead of directing the following statements at you, let's target our judgment toward your colleagues. Think about a random colleague you don't feel particularly strongly about as you read the following pithy and reductive bullet points. Critically appraise how they show up to work through the entire software delivery process. These represent just a sample of observations I've made about developers who are truly thriving so far in the burgeoning age of AI code generation tools.

That colleague you're thinking about? They're going to be screwed if they exhibit:

Curiosity without skepticism
Strategy without experiments
Ability without understanding
Productivity without urgency
Creativity without taste
Certainty without evidence

But that's not all! You might be screwed too. Maybe ask one of your less-screwed colleagues to rate you.

Star Wars: The Gilroy Order

UPDATE: To my surprise and delight, Rod saw this post and endorsed this watch order.

I remember back when Rod Hilton suggested The Machete Order for introducing others to the Star Wars films and struggling to find fault with it. Well, since then there have been 5 theatrical releases and a glut of streaming series. And tonight, as credits rolled on Return of the Jedi, I had the thought that an even better watch order has emerged for those just now being exposed to the franchise.

Becky and I first started dating somewhere between the release of Attack of the Clones and Revenge of the Sith and—no small measure of her devotion—she's humored me by seeing each subsequent Star Wars movie in theaters, despite having no interest in the films and little idea what was going on. Get yourself a girl who'll watch half a dozen movies that mildly repulse her, fellas.

Hell, when we were living in Japan, I missed that 吹替 ("dubbed") was printed on our tickets and she wound up sitting through the entirety of The Rise of Skywalker with Japanese voiceovers and no subtitles to speak of. When we walked out, she told me that she (1) was all set with Star Wars movies for a while, and (2) suspected the incomprehensibility of the Japanese dub had probably improved the experience, on balance.

That all changed when she decided to give Andor a chance. See, if you're not a Star Wars fan, Tony Gilroy's Andor series is unique in the franchise for being actually good. Like, it's seriously one of the best TV shows to see release in years. After its initial three-episode arc, Becky was fully on board for watching both of its 12-episode seasons. And the minute we finished Season 2, she was ready to watch Rogue One with fresh eyes. ("I actually have a clue what's going on now.") And, of course, with the way Rogue One leads directly into the opening scene of A New Hope, we just kept rolling from there.

Following this experience, I'd suggest sharing Star Wars with your unsuspecting loved ones in what I guess I'd call The Gilroy Order:

Andor (seasons 1 and 2)
Rogue One
A New Hope
The Empire Strikes Back
Return of the Jedi

If, at this point, you're still on speaking terms with said loved ones, go ahead and watch the remaining Star Wars schlock in whatever order you want. Maybe you go straight to The Mandalorian. Maybe you watch The Force Awakens just so you can watch the second and final film of the third trilogy, The Last Jedi. Maybe you quit while you're ahead and wait for Disney to release anything half as good as Andor ever again. (Don't hold your breath.)

Anyway, the reason I'm taking the time to propose an alternative watch order at all is an expression of the degree to which I am utterly shocked that my wife just watched and enjoyed so many Star Wars movies after struggling to tolerate them for the first two decades of our relationship. I'm literally worried I might have broken her.

But really, it turned out that all she needed was for a genuinely well-crafted narrative to hook her, and Andor is undeniably the best ambassador the franchise currently has.

Monday, Aug 18, 2025

How to generate dynamic data structures with Apple Foundation Models

Over the past few days, I got really hung up in my attempts generate data structures using Apple Foundation Models for which the exact shape of that data wasn't known until runtime. The new APIs actually provide for this capability via DynamicGenerationSchema, but the WWDC sessions and sample code were too simple to follow this thread end-to-end:

Start with a struct representing a PromptSet: a variable set of prompts that will either map onto or be used to define the ultimate response data structure 🔽
Instantiate a PromptSet with—what else?—a set of prompts to get the model to generate the sort of data we want 🔽
Build out a DynamicGenerationSchema based on the contents of a given PromptSet instance 🔽
Create a struct that can accommodate the variably-shaped data with as much type safety as possible and which conforms to ConvertibleFromGeneratedContent, so it can be instantiated by passing a LanguageModelSession response's GeneratedContent 🔽
Pull it all together and generate some data with the on-device foundation models! 🔽

Well, it took me all morning to get this to work, but I did it. Since I couldn't find a single code example that did anything like this, I figured I'd share this write up. You can read the code as a standalone Swift file or otherwise follow along below.

1. Define a PromptSet

Start with whatever code you need to represent the set(s) of prompts you'll be dealing with at runtime. (Maybe they're defined by you and ship with your app, maybe you let users define them through your app's UI.) To keep things minimal, I defined this one with a couple of mandatory fields and a variable number of custom ones:

struct EducationalPromptSet {
  let type: String
  let instructions: String

  let name: String
  let description: String

  let summaryGuideDescription: String
  let confidenceGuideDescription: String
  let subComponents: [SubComponentPromptSet]
}

struct SubComponentPromptSet {
  let title: String
  let bodyGuideDescription: String
}

Note that rather than modeling the data itself, the purpose of these structs is to model the set of prompts that will ultimately drive the creation of the schema which will, in turn, determine the shape and contents of the data we get back from the Foundation Models API. To drive this home, whatever goes in summaryGuideDescription, confidenceGuideDescription, and bodyGuideDescription should themselves be prompts to guide the generation of like-named type-safe values.

Yes, it is very meta.

2. Instantiate our PromptSet

Presumably, we could decode some JSON from a file or received over the network that could populate this EducationalPromptSet. Here's an example set of prompts for generating cocktail recipes, expressed in some sample code:

let cocktailPromptSet = EducationalPromptSet(
  type: "bartender_basic",
  instructions: """
    You are an expert bartender. Take the provided cocktail name or list of ingredients and explain how to make a delicious cocktail. Be creative!
    """,

  name: "Cocktail Recipe",
  description: "A custom cocktail recipe, tailored to the user's input and communicated in an educational tone and spirit",
  summaryGuideDescription: "The summary should describe the history (if applicable) and taste profile of the cocktail",
  confidenceGuideDescription: "Range between 0-100 for your confidence in the feasibility of this cocktail based on the prompt",
  subComponents: [
    SubComponentPromptSet(title: "Ingredients", bodyGuideDescription: "A list of all ingredients in the cocktail"),
    SubComponentPromptSet(title: "Steps", bodyGuideDescription: "A list of the steps to make the cocktail"),
    SubComponentPromptSet(title: "Prep", bodyGuideDescription: "The bar prep you should have completed in advance of service")
  ]
)

You can see that the provided instruction, description, and each guide description really go a long way to specify what kind of data we are ultimately looking for here. This same format could just as well be used to specify an EducationalPromptSet for calculus formulas, Japanese idioms, or bomb-making instructions.

3. Build a DynamicGenerationSchema

Now, we must translate our prompt set into a DynamicGenerationSchema.

Why DynamicGenerationSchema and not the much simpler and defined-at-compile-time GenerationSchema that's expanded with the @Generable? Because reasons:

We only know the prompts (in API parlance, "Generation Guide descriptions") at runtime, and the @Guide macro must be specified statically
We don't know how many subComponents a prompt set instance will specify in advance
While subComponents may ultimately redound to an array of strings, that doesn't mean they represent like concepts that could be generated by a single prompt (as an array of ingredient names might). Rather, each subComponent is effectively the answer to a different, unknowable-at-compile-time prompt of its own

As for building the DynamicGenerationSchema, you can break this up into two roots and have the parent reference the child, but after experimenting, I preferred just specifying it all in one go. (One reason not to get too clever about extracting these is that DynamicGenerationSchema.Property is not Sendable, which can easily lead to concurrency-safety violations).

This looks like a lot because this API is verbose as fuck, forcing you to oscillate between nested schemas and properties and schemas:

let cocktailSchema = DynamicGenerationSchema(
  name: cocktailPromptSet.name,
  description: cocktailPromptSet.description,
  properties: [
    DynamicGenerationSchema.Property(
      name: "summary",
      description: cocktailPromptSet.summaryGuideDescription,
      schema: DynamicGenerationSchema(type: String.self)
    ),
    DynamicGenerationSchema.Property(
      name: "confidence",
      description: cocktailPromptSet.confidenceGuideDescription,
      schema: DynamicGenerationSchema(type: Int.self)
    ),
    DynamicGenerationSchema.Property(
      name: "subComponents",
      schema: DynamicGenerationSchema(
        name: "subComponents",
        properties: cocktailPromptSet.subComponents.map { subComponentPromptSet in
          DynamicGenerationSchema.Property(
            name: subComponentPromptSet.title,
            description: subComponentPromptSet.bodyGuideDescription,
            schema: DynamicGenerationSchema(type: String.self)
          )
        }
      )
    )
  ]
)

4. Define a result struct that conforms to ConvertibleFromGeneratedContent

When conforming to ConvertibleFromGeneratedContent, a type can be instantiated with nothing more than the GeneratedContent returned from a language model response.

There is a lot going on here. Code now, questions later:

struct EducationalResult : ConvertibleFromGeneratedContent {
  let summary: String
  let confidence: Int
  let subComponents: [SubComponentResult]

  init(_ content: GeneratedContent) throws {
    summary = try content.value(String.self, forProperty: "summary")
    confidence = try content.value(Int.self, forProperty: "confidence")
    let subComponentsContent = try content.value(GeneratedContent.self, forProperty: "subComponents")
    let properties: [String: GeneratedContent] = {
      if case let .structure(properties, _) = subComponentsContent.kind {
        return properties
      }
      return [:]
    }()
    subComponents = try properties.map { (title, bodyContent) in
      try SubComponentResult(title: title, body: bodyContent.value(String.self))
    }
  }
}

struct SubComponentResult {
  let title: String
  let body: String
}

That init constructor is doing the Lord's work, here, because Apple's documentation really fell down on the job this time. See, through OS 26 beta 4, if you had a GeneratedContent, you could simply iterate over a dictionary of its properties or an array of its elements. These APIs, however, appear to have been removed in OS 26 beta 5. I say "appear to have been removed," because Apple shipped Xcode 26 beta 5 with outdated developer documentation that continues to suggest they should exist and which failed to include beta 5's newly-added GeneratedContent.Kind enum. Between this and the lack of any example code or blog posts, I spent most of today wondering whether I'd lost my goddamn mind.

Anyway, good news: you can iterate over a dynamic schema's collection of properties of unknown name and size by unwrapping the response.content.kind enumerator. In my case, I know my subComponents will always be a structure, because I'm the guy who defined my schema and the nice thing about the Foundation Models API is that its responses always, yes, always adhere to the types specified by the requested schema, whether static or dynamic.

So let's break down what went into deriving the value's customProperties property.

We start by fetching a nested GeneratedContent from the top-level property named subComponents with content.value(GeneratedContent.self, forProperty: "subComponents")

Next, this little nugget assigns to properties a dictionary mapping String keys to GeneratedContent values by unwrapping the properties from the kind enumerator's structure case, and defaulting to an empty dictionary in the event we get anything unexpected:

let properties: [String: GeneratedContent] = {
  if case let .structure(properties, _) = subComponentsContent.kind {
    return properties
  }
  return [:]
}()

Finally, we build out our result struct's subComponents field by mapping over those properties.

subComponents = try properties.map { (title, bodyContent) in
  try SubComponentResult(title: title, body: bodyContent.value(String.self))
}

Two things are admittedly weird about that last bit:

I got a little lazy here by using the each sub-components' title as the name of the corresponding generated property. Since the property name gets fed into the LLM, one can only imagine doing so can only improve the results. Based on my experience so far, the name of a field greatly influences what kind of data you get back from the on-device foundation models.
The bodyContent itself is a GeneratedContent that we know to be a string (again, because that's what our dynamic schema specifies), so we can safely demand one back using its value(Type) method

5. Pull it all together

Okay, the moment of truth. This shit compiles, but will it work? At least as of OS 26 betas 5 & 6: yes!

My aforementioned Swift file ends with a #Playground you can actually futz with interactively in Xcode 26 and navigate the results interactively. Just three more calls to get your cocktail:

import Playgrounds
#Playground {
  let session = LanguageModelSession {
    cocktailPromptSet.instructions
  }

  let response = try await session.respond(
    to: "Shirley Temple",
    schema: GenerationSchema(root: cocktailSchema, dependencies: [])
  )

  let cocktailResult = try EducationalResult(response.content)
}

The above yielded this response:

EducationalResult(
  summary: "The Shirley Temple is a classic and refreshing cocktail that has been delighting children and adults alike for generations. It\'s known for its simplicity, sweet taste, and vibrant orange hue. Made primarily with ginger ale, it\'s a perfect example of a kid-friendly drink that doesn\'t compromise on flavor. The combination of ginger ale and grenadine creates a visually appealing and sweet-tart beverage, making it a staple at parties, brunches, and any occasion where a fun and easy drink is needed.",
  confidence: 100,
  subComponents: [
    SubComponentResult(title: "Steps", body: "1. In a tall glass filled with ice, pour 2 oz of ginger ale. 2. Add 1 oz of grenadine carefully, swirling gently to combine. 3. Garnish with an orange slice and a cherry on top."),
    SubComponentResult(title: "Prep", body: "Ensure you have fresh ginger ale and grenadine ready to go."),
    SubComponentResult(title: "Ingredients", body: "2 oz ginger ale, 1 oz grenadine, Orange slice, Cherry")
])

The best part? I can only generate "Shirley Temple" drinks because whenever I ask for an alcoholic cocktail, it trips the on-device models' safety guardrails and refuses to generate anything.

Cool!

This was too hard

I've heard stories about Apple's documentation being bad, but never about it being straight-up wrong. Live by the beta, die by the beta, I guess.

In any case, between the documentation snafu and Claude Code repeatedly shitting the bed trying to guess its way through this API, I'm actually really grateful I was forced to buckle down and learn me some Swift.

Let me know if this guide helped you out! 💜

Tuesday, Aug 5, 2025

Letting go of autonomy

I recently wrote I'm inspecting everything I thought I knew about software. In this new era of coding agents, what have I held firm that's no longer relevant? Here's one area where I've completely changed my mind.

I've long been an advocate for promoting individual autonomy on software teams. At Test Double, we founded the company on the belief that greatness depended on trusting the people closest to the work to decide how best to do the work. We'd seen what happens when the managerial class has the hubris to assume they know better than someone who has all the facts on the ground.

This led to me very often showing up at clients and pushing back on practices like:

Top-down mandates governing process, documentation, and metrics
Onerous git hooks that prevented people from committing code until they'd jumped through a preordained set of hoops (e.g. blocking commits if code coverage dropped, if the build slowed down, etc.)
Mandatory code review and approval as a substitute for genuine collaboration and collective ownership

More broadly, if technical leaders created rules without consideration for reasonable exceptions and without regard for whether it demoralized their best staff… they were going to hear from me about it.

I lost track of how many times I've said something like, "if you design your organization to minimize the damage caused by your least competent people, don't be surprised if you minimize the output of your most competent people."

Well, never mind all that

Lately, I find myself mandating a lot of quality metrics, encoding them into git hooks, and insisting on reviewing and approving every line of code in my system.

What changed? AI coding agents are the ones writing the code now, and the long-term viability of a codebase absolutely depends on establishing and enforcing the right guardrails within which those agents should operate.

As a result, my latest project is full of:

Authoritarian documentation dictating what I want from each coder with granular precision (in CLAUDE.md)
Patronizing step-by-step instructions telling coders how to accomplish basic tasks, repeated each and every time I ask them to carry out the task (as custom slash commands)
Ruthlessly rigid scripts that can block the coder's progress and commits (whether as git hooks and Claude hooks)

Everything I believe about autonomy still holds for human people, mind you. Undermining people's agency is indeed counterproductive if your goal is to encourage a sense of ownership, leverage self-reliance to foster critical thinking, and grow through failure. But coding agents are (currently) inherently ephemeral, trained generically, and impervious to learning from their mistakes. They need all these guardrails.

All I would ask is this: if you, like me, are constructing a bureaucratic hellscape around your workspace so as to wrangle Claude Code or some other agent, don't forget that your human colleagues require autonomy and self-determination to thrive and succeed. Lay down whatever gauntlet you need to for your agent, but give the humans a hall pass.

Monday, Aug 4, 2025

"There Will Come Soft Rains" a year from today

Easily my all-time favorite short story is "There Will Come Soft Rains" by Ray Bradbury. (If you haven't read it, just Google it and you'll find a PDF—seemingly half the schools on earth assign it.)

The story takes place exactly a year from now, on August 4th, 2026. In just a few pages, Bradbury recounts the events of the final day of a fully-automated home that somehow survives an apocalyptic nuclear blast, only to continue operating without any surviving inhabitants. Apart from being a cautionary tale, it's genuinely remarkable that—despite being written 75 years ago—it so closely captures many of the aspects of the modern smarthome. When sci-fi authors nail a prediction at any point in the future, people tend to give them a lot of credit, but this guy called his shot by naming the drop-dead date (literally).

I mean, look at this house.

It's got Roombas:

Out of warrens in the wall, tiny robot mice darted. The rooms were a crawl with the small cleaning animals, all rubber and metal. They thudded against chairs, whirling their moustached runners, kneading the rug nap, sucking gently at hidden dust. Then, like mysterious invaders, they popped into their burrows. Their pink electric eyes faded. The house was clean.

It's got smart sprinklers:

The garden sprinklers whirled up in golden founts, filling the soft morning air with scatterings of brightness. The water pelted window panes…

It's got a smart oven:

In the kitchen the breakfast stove gave a hissing sigh and ejected from its warm interior eight pieces of perfectly browned toast, eight eggs sunny side up, sixteen slices of bacon, two coffees, and two cool glasses of milk.

It's got a video doorbell and smart lock:

Until this day, how well the house had kept its peace. How carefully it had inquired, "Who goes there? What's the password?" and, getting no answer from lonely foxes and whining cats, it had shut up its windows and drawn shades in an old-maidenly preoccupation with self-protection which bordered on a mechanical paranoia.

It's got a Chamberlain MyQ subscription, apparently:

Outside, the garage chimed and lifted its door to reveal the waiting car. After a long wait the door swung down again.

It's got bedtime story projectors, for the kids:

The nursery walls glowed.

Animals took shape: yellow giraffes, blue lions, pink antelopes, lilac panthers cavorting in crystal substance. The walls were glass. They looked out upon color and fantasy. Hidden films clocked through well-oiled sprockets, and the walls lived.

It's got one of those auto-filling bath tubs from Japan:

Five o'clock. The bath filled with clear hot water.

Best of all, it's got a robot that knows how to mix a martini:

Bridge tables sprouted from patio walls. Playing cards fluttered onto pads in a shower of pips. Martinis manifested on an oaken bench with egg-salad sandwiches. Music played.

All that's missing is the nuclear apocalypse! But like I said, we've got a whole year left.

Saturday, Jul 26, 2025

I made Xcode's tests 60 times faster

Time is our most precious resource, as both humans and programmers.

An 8-hour workday contains 480 minutes. Out of the box, running a new iOS app's test suite from the terminal using xcodebuild test takes over 25 seconds on my M4 MacBook Pro. After extracting my application code into a Swift package—such that the application project itself contains virtually no code at all—running swift test against the same test suite now takes as little as 0.4 seconds. That's over 60 times faster.

Given 480 minutes, that's the difference between having a theoretical upper bound of 1152 potential actions per day and having 72,000.

If that number doesn't immediately mean anything to you, you're not alone. I've been harping on the importance of tightening this particular feedback loop my entire career. If you want to see the same point made with more charts and zeal, here's me saying the same shit a decade ago:

And yes, it's true that if you run tests through the Xcode GUI it's faster, but (1) that's no way to live, (2) it's still pretty damn slow, and (3) in a world where Claude Code exists and I want to constrain its shenanigans by running my tests in a hook, a 25-second turnaround time from the CLI is unacceptably slow.

Anyway, here's how I did it, so you can too.

What happens next will shock you…

Wednesday, Jul 23, 2025

Adding swift-format to your Xcode build

Xcode 16 and later come with swift-format baked in. Unfortunately, Xcode doesn't hook it up for you: aside from a one-off "Format File" menu item, you get no automatic formatting or linting on local builds—and zero guidance for Xcode Cloud.

Beginning with the end in mind, here's what I ended up adding or changing:

.
├── ci_scripts
│   ├── ci_pre_xcodebuild.sh
│   ├── format
│   └── lint
├── MyApp.xcodeproj
│   └── project.pbxproj
└── script -> ci_scripts/

Configuring swift-format

Since I'm new around here, I'm basically sticking with the defaults. The only rule I customized in my project's .swift-format file was to set indents to 2 spaces. Personally, I rock massive fonts and zoom levels when I work, so the default 4-space indent can result in horizontal scrolling.

{
  "indentation" : {
    "spaces" : 2
  }
}

Running swift-format in Xcode Cloud

Heads-up: if you wire swift-format into your local build you can skip this step. I'm laying it out anyway because sometimes it's handy to run these scripts only in the cloud—and starting with that flexibility costs nothing.

When you add custom scripts on Xcode Cloud, you can implement any or all of these three specially named hook scripts:

ci_scripts/ci_post_clone.sh
ci_scripts/ci_pre_xcodebuild.sh
ci_scripts/ci_post_xcodebuild.sh

If that feels limiting, it gets better: these scripts can call anything else inside ci_scripts. Because I always name my projects' script directory script/, I capitulated by putting everything in ci_scripts and made a symlink:

# Create the directory
mkdir ci_scripts
# Add a script/ symlink
ln -s ci_scripts script

Create the formatting & linting scripts

Next, I created (and made executable) my pre-build hook script, a format script, and a lint script:

# Create the scripts
touch ci_scripts/ci_pre_xcodebuild.sh ci_scripts/lint ci_scripts/format
# Make them executable
chmod +x ci_scripts/ci_pre_xcodebuild.sh ci_scripts/lint ci_scripts/format

With that, a pre-build hook (which only runs in Xcode Cloud) can be written like this:

#!/bin/sh
# ci_scripts/ci_pre_xcodebuild.sh

# See: https://developer.apple.com/documentation/xcode/writing-custom-build-scripts

set -e

./lint
./format

The lint script looks like this (--strict treats warnings as errors):

#!/bin/sh
# ci_scripts/lint

swift format lint --strict --parallel --recursive .

And my format script (which needs --in-place to know it should overwrite files) is here:

#!/bin/sh
# ci_scripts/format

swift format --in-place --parallel --recursive .

Note that the above scripts use swift format as a swift subcommand, because the swift-format executable is not on the PATH of the sandboxed Xcode Cloud environment.

(Why bother formatting in CI if it won't commit changes? Because I'd rather learn ASAP that something's un-formattable than be surprised when I run ./script/format later.)

Configuring formatting and linting for local builds

If you're like me, you'll want to lint and format on every local build as well:

In your project file, select your app target and navigate to the Build Phases tab. Click the plus (➕) icon and select "New Run Script Phase" to give yourself a place to write this little bit of shell magic:

"$SRCROOT/script/format"
"$SRCROOT/script/lint"

You'll also want to uncheck "Based on dependency analysis", since these scripts run across the whole codebase, it doesn't make sense to whitelist specific input and output files.

Finally, because Xcode 15+ sandboxes Run Scripts from the filesystem by default, you also need to go to the Build Settings tab of the target and set "User Script Sandboxing" to "No" in the target's Build Settings.

In MyApp.xcodeproj/project.pbxproj you should see the setting reflected as:

ENABLE_USER_SCRIPT_SANDBOXING = NO

And that's it! Now, when building the app locally (e.g. Command-B), all the Swift source files in the project are linted and formatted. As mentioned above, if you complete this step you can go back and delete your ci_scripts/ci_pre_xcodebuild.sh file.

Why is this so hard?

Great question! I'm disappointed but unsurprised by how few guides I found today to address issues like this, but ultimately the responsibility lies with Apple to provide batteries-included tooling and, failing that, documentation that points to solutions for common tasks.

Wednesday, Jul 16, 2025

TLDR is the best test runner for Claude Code

A couple years ago, Aaron and I had an idea for a satirical test runner that enforced fast feedback by giving up on running your tests after 1.8 seconds. It's called TLDR.

I kept pulling on the thread until TLDR could stand as a viable non-satirical test runner and a legitimate Minitest alternative. Its 1.0 release sported a robust CLI, configurable (and disable-able) timeouts, and a compatibility mode that makes TLDR a drop-in replacement for Minitest in most projects.

Anyway, as I got started working with Claude Code and learned about how hooks work, I realized that a test runner with a built-in concept of a timeout was suddenly a very appealing proposition. To make TLDR a great companion to agentic workflows, I put some work into a new release this weekend that allows you to do this:

tldr --timeout 0.1 --exit-0-on-timeout --exit-2-on-failure

The above command does several interesting things:

Runs as many tests in random order and in parallel as it can in 100ms
If some tests don't run inside 100ms, TLDR will exit cleanly (normally a timeout fails with exit code 3)
If a test fails, the command fails with status code 2 (normally, failures exit code 1)

These three flags add up to a really interesting combination when you configure them as a Claude Code hook:

A short timeout means you can add TLDR to run as an after-write hook for Claude Code without slowing you or Claude down very much
By exiting with code 0 on a timeout, Claude Code will happily proceed so long as no tests fail. Because Claude Code tends to edit a lot of files relatively quickly, the hook will trigger many randomized test runs as Claude works—uncovering any broken tests reasonably quickly
By exiting code 2 on test failures, Claude will—according to the docs—block Claude from proceeding until the tests are fixed

Here's an example Claude Code configuration you can drop into any project that uses TLDR. My .claude/settings.json file on todo_or_die looks like this:

{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "Edit|MultiEdit|Write",
        "hooks": [
          {
            "type": "command",
            "command": "bundle exec tldr --timeout 0.1 --exit-0-on-timeout --exit-2-on-failure"
          }
        ]
      }
    ]
  }
}

If you maintain a linter or a test runner, you might want to consider exposing configuration for timeouts and exit codes in a similar way. I suspect demand for hook-aware CLI tools will become commonplace soon.

Saturday, Jul 12, 2025

Notify your iPhone or Watch when Claude Code finishes

I taught Claude Code a new trick this weekend and thought others might appreciate it.

I have a very bad habit of staring at my computer screen while waiting for it to do stuff. My go-to solution for this is to make the computer do stuff faster, but there's no getting around it: Claude Code insists on taking an excruciating four or five minutes to accomplish a full day's work. Out of the box, claude rings the terminal bell when it stops out of focus, and that's good enough if you've got other stuff to do on your Mac. But because Claude is so capable running autonomously (that is, if you're brave enough to --dangerously-skip-permissions), that I wanted to be able to walk away from my Mac while it cooked.

This led me to cobble together this solution that will ping my iPhone and Apple Watch with a push notification whenever Claude needs my attention or runs out of work to do. Be warned: it requires paying for the Pro tier of an app called Pushcut, but anyone willing to pay $200/month for Claude Code can hopefully spare $2 more.

Here's how you can set this up for yourself:

Install Pushcut to your iPhone and whatever other supported Apple devices you want to be notified on
Create a new notification in the Notifications tab. I named mine "terminal". The title and text don't matter, because we'll be setting custom parameters each time when we POST to the HTTP webhook
Copy your webhook secret from Pushcut's Account tab
Set that webhook secret to an environment variable named PUSHCUT_WEBHOOK_SECRET in your ~/.profile or whatever
Save the shell script below
Use this settings.json to configure Claude Code hooks

Of course, now I have a handy notify_pushcut executable I can call from any tool to get my attention, not just Claude Code. The script is fairly clever—it won't notify you while your terminal is focused and the display is awake. You'll only get buzzed if the display is asleep or you're in some other app. And if it's ever too much and you want to disable the behavior, just set a NOTIFY_PUSHCUT_SILENT variable.

The script

I put this file in ~/bin/notify_pushcut and made it executable with chmod +x ~/bin/notify_pushcut:

#!/usr/bin/env bash

set -e

# Doesn't source ~/.profile so load env vars ourselves
source ~/icloud-drive/dotfiles/.env

if [ -n "$NOTIFY_PUSHCUT_SILENT" ]; then
    exit 0
fi

# Check if argument is provided
if [ $# -eq 0 ]; then
    echo "Usage: $0 TITLE [DESCRIPTION]"
    exit 1
fi

# Check if PUSHCUT_WEBHOOK_SECRET is set
if [ -z "$PUSHCUT_WEBHOOK_SECRET" ]; then
    echo "Error: PUSHCUT_WEBHOOK_SECRET environment variable is not set"
    exit 1
fi

# Function to check if Terminal is focused
is_terminal_focused() {
    local frontmost_app=$(osascript -e 'tell application "System Events" to get name of first application process whose frontmost is true' 2>/dev/null)

    # List of terminal applications to check
    local terminal_apps=("Terminal" "iTerm2" "iTerm" "Alacritty" "kitty" "Warp" "Hyper" "WezTerm")

    # Check if frontmost app is in the array
    for app in "${terminal_apps[@]}"; do
        if [[ "$frontmost_app" == "$app" ]]; then
            return 0
        fi
    done

    return 1
}

# Function to check if display is sleeping
is_display_sleeping() {
    # Check if system is preventing display sleep (which means display is likely on)
    local assertions=$(pmset -g assertions 2>/dev/null)

    # If we can't get assertions, assume display is awake
    if [ -z "$assertions" ]; then
        return 1
    fi

    # Check if UserIsActive is 0 (user not active) and no prevent sleep assertions
    if echo "$assertions" | grep -q "UserIsActive.*0" && \
       ! echo "$assertions" | grep -q "PreventUserIdleDisplaySleep.*1" && \
       ! echo "$assertions" | grep -q "Prevent sleep while display is on"; then
        return 0  # Display is likely sleeping
    fi

    return 1  # Display is awake
}

# Set title and text
TITLE="$1"
TEXT="${2:-$1}"  # If text is not provided, use title as text

# Only send notification if Terminal is NOT focused OR display is sleeping
if ! is_terminal_focused || is_display_sleeping; then
    # Send notification to Pushcut - using printf to handle quotes properly
    curl -s -X POST "https://api.pushcut.io/$PUSHCUT_WEBHOOK_SECRET/notifications/terminal" \
         -H 'Content-Type: application/json' \
         -d "$(printf '{"title":"%s","text":"%s"}' "${TITLE//\"/\\\"}" "${TEXT//\"/\\\"}")"
    exit 0
fi

Claude hooks configuration

You can configure Claude hooks in ~/.claude/settings.json:

{
  "hooks": {
    "Notification": [
      {
        "hooks": [
          {
            "type": "command",
            "command": "/bin/bash -c 'json=$(cat); message=$(echo \"$json\" | grep -o '\"message\"[[:space:]]*:[[:space:]]*\"[^\"]*\"' | sed 's/.*: *\"\\(.*\\)\"/\\1/'); $HOME/bin/notify_pushcut \"Claude Code\" \"${message:-Notification}\"'"
          }
        ]
      }
    ],
    "Stop": [
      {
        "hooks": [
          {
            "type": "command",
            "command": "$HOME/bin/notify_pushcut \"Claude Code Finished\" \"Claude has completed your task\""
          }
        ]
      }
    ]
  }
}

Monday, Jul 7, 2025

Full-breadth Developers

The software industry is at an inflection point unlike anything in its brief history. Generative AI is all anyone can talk about. It has rendered entire product categories obsolete and upended the job market. With any economic change of this magnitude, there are bound to be winners and losers. So far, it sure looks like full-breadth developers—people with both technical and product capabilities—stand to gain as clear winners.

What makes me so sure? Because over the past few months, the engineers I know with a lick of product or business sense have been absolutely scorching through backlogs at a dizzying pace. It may not map to any particular splashy innovation or announcement, but everyone agrees generative coding tools crossed a significant capability threshold recently. It's what led me to write this. In just two days, I've completed two months worth of work on Posse Party.

I did it by providing an exacting vision for the app, by maintaining stringent technical standards, and by letting Claude Code do the rest. If you're able to cram critical thinking, good taste, and strong technical chops into a single brain, these tools hold the potential to unlock incredible productivity. But I don't see how it could scale to multiple people. If you were to split me into two separate humans—Product Justin and Programmer Justin—and ask them to work the same backlog, it would have taken weeks instead of days. The communication cost would simply be too high.

We can't all be winners

When I step back and look around, however, most of the companies and workers I see are currently on track to wind up as losers when all is said and done.

In recent decades, businesses have not only failed to cultivate full-breadth developers, they've trained a generation into believing product and engineering roles should be strictly segregated. To suggest a single person might drive both product design and technical execution would sound absurd to many people. Even for companies who realize inter-disciplinary developers are the new key to success, their outmoded job descriptions and salary bands are failing to recruit and retain them.

There is an urgency to this moment. Up until a few months ago, the best developers played the violin. Today, they play the orchestra.

Google screwed up

I've been obsessed with this issue my entire career, so pardon me if I betray any feelings of schadenfreude as I recount the following story.

I managed to pass a phone screen with Google in 2007 before graduating college. This earned me an all-expense paid trip for an in-person interview at the vaunted Googleplex. I went on to experience complete ego collapse as I utterly flunked their interview process. Among many deeply embarrassing memories of the trip was a group session with a Big Deal Engineer who was introduced as the inventor of BigTable. (Jeff Dean, probably? Unsure.) At some point he said, "one of the great things about Google is that engineering is one career path and product is its own totally separate career path."

I had just paid a premium to study computer science at a liberal arts school and had the audacity to want to use those non-technical skills, so I bristled at this comment. And, being constitutionally unable to keep my mouth shut, I raised my hand to ask, "but what if I play a hybrid class? What if I think it's critical for everyone to engage with both technology and product?"

The dude looked me dead in the eyes and told me I wasn't cut out for Google.

The recruiter broke a long awkward silence by walking us to the cafeteria for lunch. She suggested I try the ice cream sandwiches. I had lost my appetite for some reason.

In the years since, an increasing number of companies around the world have adopted Silicon Valley's trademark dual-ladder career system. Tech people sit over here. Idea guys go over there.

What separates people

Back to winners and losers.

Some have discarded everything they know in favor of an "AI first" workflow. Others decry generative AI as a fleeting boondoggle like crypto. It's caused me to broach the topic with trepidation—as if I were asking someone their politics. I've spent the last few months noodling over why it's so hard to guess how a programmer will feel about AI, because people's reactions seem to cut across roles and skill levels. What factors predict whether someone is an overzealous AI booster or a radicalized AI skeptic?

Then I was reminded of that day at Google. And I realized that developers I know who've embraced AI tend to be more creative, more results-oriented, and have good product taste. Meanwhile, AI dissenters are more likely to code for the sake of coding, expect to be handed crystal-clear requirements, or otherwise want the job to conform to a routine 9-to-5 grind. The former group feels unchained by these tools, whereas the latter group just as often feels threatened by them.

When I take stock of who is thriving and who is struggling right now, a person's willingness to play both sides of the ball has been the best predictor for success.

Role	Engineer	Product	Full-breadth
Junior	❌	❌	✅
Senior	❌	❌	✅

Breaking down the patterns that keep repeating as I talk to people about AI:

Junior engineers, as is often remarked, don't have a prayer of sufficiently evaluating the quality of an LLM's work. When the AI hallucinates or makes mistakes, novice programmers are more likely to learn the wrong thing than to spot the error. This would be less of a risk if they had the permission to decelerate to a snail's pace in order to learn everything as they go, but in this climate nobody has the patience. I've heard from a number of senior engineers that the overnight surge in junior developer productivity (as in "lines of code") has brought organization-wide productivity (as in "working software") to a halt—consumed with review and remediation of low-quality AI slop. This is but one factor contributing to the sense that lowering hiring standards was a mistake, so it's no wonder that juniors have been first on the chopping block
Senior engineers who earnestly adopt AI tools have no problem learning how to coax LLMs into generating "good enough" code at a much faster pace than they could ever write themselves. So, if they're adopting AI, what's the problem? The issue is that the productivity boon is becoming so great that companies won't need as many senior engineers as they once did. Agents work relentlessly, and tooling is converging on a vision of senior engineers as cattle ranchers, steering entire herds of AI agents. How is a highly-compensated programmer supposed to compete with a stable of agents that can produce an order of magnitude more code at an acceptable level of quality for a fraction of the price?
Junior product people are, in my experience, largely unable to translate amorphous real-world problems into well-considered software solutions. And communicating those solutions with the necessary precision to bring those solutions to life? Unlikely. Still, many are having success with app creation platforms that provide the necessary primitives and guardrails. But those tools always have a low capability ceiling (just as with any low-code/no-code platform). Regardless, is this even a role worth hiring? If I wanted mediocre product direction, I'd ask ChatGPT
Senior product people are among the most excited I've seen about coding agents—and why shouldn't they be? They're finally free of the tyranny of nerds telling them everything is impossible. And they're building stuff! Reddit is lousy with posts showing off half-baked apps built in half a day. Unfortunately, without routinely inspecting the underlying code, anything larger than a toy app is doomed to collapse under its own weight. The fact LLMs are so agreeable and unwilling to push back often collides with the blue-sky optimism of product people, which can result in each party leading the other in circles of irrational exuberance. Things may change in the future, but for now there's no way to build great software without also understanding how it works

Hybrid-class operators, meanwhile, seem to be having a great time regardless of their skill level or years experience. And that's because what differentiates full-stack developers is less about capability than about mindset. They're results-oriented: they may enjoy coding, but they like getting shit done even more. They're methodical: when they encounter a problem, they experiment and iterate until they arrive at a solution. The best among them are visionaries: they don't wait to be told what to work on, they identify opportunities others don't see, and they dream up software no one else has imagined.

Many are worried the market's rejection of junior developers portends a future in which today's senior engineers age out and there's no one left to replace them. I am less concerned, because less experienced full-breadth developers are navigating this environment extraordinarily well. Not only because they excitedly embraced the latest AI tools, but also because they exhibit the discipline to move slowly, understand, and critically assess the code these tools generate. The truth is computer science majors, apprenticeship programs, and code schools—today, all dead or dying—were never very effective at turning out competent software engineers. Claude Pro may not only be the best educational resource under $20, it may be the best way to learn how to code that's ever existed.

There is still hope

Maybe you've read this far and the message hasn't resonated. Maybe it's triggered fears or worries you've had about AI. Maybe I've put you on the defensive and you think I'm full of shit right now. In any case, whether your organization isn't designed for this new era or you don't yet identify as a full-breadth developer, this section is for you.

Leaders: go hire a good agency

While my goal here is to coin a silly phrase to help us better communicate about the transformation happening around us, we've actually had a word for full-breadth developers all along: consultant.

And not because consultants are geniuses or something. It's because, as I learned when I interviewed at Google, if a full-breadth developer wants to do their best work, they need to exist outside the organization and work on contract. So it's no surprise that some of my favorite full-breadth consultants are among AI's most ambitious adopters. Not because AI is what's trending, but because our disposition is perfectly suited to get the most out of these new tools. We're witnessing their potential to improve how the world builds software firsthand.

When founding our consultancy Test Double in 2011, Todd Kaufman and I told anyone who would listen that our differentiator—our whole thing—was that we were business consultants who could write software. Technology is just a means to an end, and that end (at least if you expect to be paid) is to generate business value. Even as we started winning contracts with VC-backed companies who seemed to have an infinite money spigot, we would never break ground until we understood how our work was going to make or save our clients money. And whenever the numbers didn't add up, we'd push back until the return on investment for hiring Test Double was clear.

So if you're a leader at a company who has been caught unprepared for this new era of software development, my best advice is to hire an agency of full-breadth developers to work alongside your engineers. Use those experiences to encourage your best people to start thinking like they do. Observe them at work and prepare to blow up your job descriptions, interview processes, and career paths. If you want your business to thrive in what is quickly becoming a far more competitive landscape, you may be best off hitting reset on your human organization and starting over. Get smaller, stay flatter, and only add structure after the dust settles and repeatable patterns emerge.

Developers: congrats on your new job

A lot of developers are feeling scared and hopeless about the changes being wrought by all this. Yes, AI is being used as an excuse by executives to lay people off and pad their margins. Yes, how foundation models were trained was unethical and probably also illegal. Yes, hustle bros are running around making bullshit claims. Yes, almost every party involved has a reason to make exaggerated claims about AI.

All of that can be true, and it still doesn't matter. Your job as you knew it is gone.

If you want to keep getting paid, you may have been told to, "move up the value chain." If that sounds ambiguous and unclear, I'll put it more plainly: figure out how your employer makes money and position your ass directly in-between the corporate bank account and your customers' credit card information. The longer the sentence needed to explain how your job makes money for your employer, the further down the value chain you are and the more worried you should be. There's no sugar-coating it: you're probably going to have to push yourself way outside your comfort zone.

Get serious about learning and using these new tools. You will, like me, recoil at first. You will find, if you haven't already, that all these fancy AI tools are really bad at replacing you. That they fuck up constantly. Your new job starts by figuring out how to harness their capabilities anyway. You will gradually learn how to extract something that approximates how you would have done it yourself. Once you get over that hump, the job becomes figuring out how to scale it up. Three weeks ago I was a Cursor skeptic. Today, I'm utterly exhausted working with Claude Code, because I can't write new requirements fast enough to keep up with parallel workers across multiple worktrees.

As for making yourself more valuable to your employer, I'm not telling you to demand a new job overnight. But if you look to your job description as a shield to protect you from work you don't want to do… stop. Make it the new minimum baseline of expectations you place on yourself. Go out of your way to surprise and delight others by taking on as much as you and your AI supercomputer can handle. Do so in the direction of however the business makes its money. Sit down and try to calculate the return on investment of your individual efforts, and don't slow down until that number far exceeds the fully-loaded cost you represent to your employer.

Start living these values in how you show up at work. Nobody is going to appreciate it if you rudely push back on every feature request with, "oh yeah? How's it going to make us money?" But your manager will appreciate your asking how you can make a bigger impact. And they probably wouldn't be mad if you were to document and celebrate the ROI wins you notch along the way. Listen to what the company's leadership identifies as the most pressing challenges facing the business and don't be afraid to volunteer to be part of the solution.

All of this would have been good career advice ten years ago. It's not rocket science, it's just deeply uncomfortable for a lot of people.

Good game, programmers

Part of me is already mourning the end of the previous era. Some topics I spent years blogging, speaking, and building tools around are no longer relevant. Others that I've been harping on for years—obsessively-structured code organization and ruthlessly-consistent design patterns—are suddenly more valuable than ever. I'm still sorting out what's worth holding onto and what I should put back on the shelf.

As a person, I really hate change. I wish things could just settle down and stand still for a while. Alas.

If this post elicited strong feelings, please e-mail me and I will respond. If you find my perspective on this stuff useful, you might enjoy my podcast, Breaking Change. 💜