Perhaps the most important ingredient in my career's success is my seemingly infinite capacity for self-criticism.
I constantly inspect my work, effortlessly identify ways it could be better, and never tire of making improvements. And because the time it takes to finish a task can always be improved, this tendency rarely veers into unproductive perfectionism. On the rare occasion I feel like I really nail something, I strive to nail it even faster next time. At the same time (and as anyone who listens to Breaking Change knows), I have a very healthy ego. I genuinely believe I am good enough, despite simultaneously knowing my work never is.
The trouble is, while this disposition might be dynamite for self-improvement, it doesn't gracefully scale to teams and organizations.
I'm thinking about this because yours truly was a dummy again and found the Hacker News comment thread for one of my posts and chose to engage:
I've been griping about LLM overconfidence for years, as somebody who is racked with self-doubt and second-guessing. On the one hand, my own low opinion of myself made me a terrible mentor and manager, because having a similarly zero-trust policy towards my colleagues' work caused no end of friction (especially as a founder where people looked up to me for validation). On the other hand, I don't know very many top-tier practitioners that don't exhibit significantly more self-doubt than an off-the-shelf LLM.
I'm not sure I've ever explained this publicly before, but this predilection for doubting myself is fundamentally why I was never a particularly good "senior" member of a team. As Test Double grew, we managed to create spots where I could remain a practitioner, but we also had to be mindful of limiting the blast radius of my reflexive hypercriticality. The truth is that my special sauce depends on constant emotionally vulnerable self-critique, and that made it extraordinarily difficult to manage or coach people directly. Even for people who are wired similarly to me—which many of Test Double's agents are—it's one thing to look at your own work and spot endless opportunities for improvement, but it's quite another for someone in a position of authority to do the same.
Every time I was charged with evaluating someone else's work, it started up the same engine that generate an infinite supply of shortcomings in my own. And it wasn't that I was incapable of prioritizing or rate-limiting my comments—it's that when people came to me with revisions, I'd always find fault in those too—because I can always find ways the work could be better! It led to people feeling like they could never be good enough for me.
After touching the hot stove once or twice, I learned to spend all my time assembling shit sandwiches to protect people's feelings, or my relationship with them, or—as a highly visible co-founder—the company's reputation. Every patty of pointed and meaningful critique demanded two slices of ego-insulating bread. And one man can only bullshit his way through so much halfhearted positive reinforcement and self-effacing humility. Even if I hadn't found all that tap-dancing to be a nerve-wracking waste of time, everyone could see through me. It wasn't authentic.
Are there ways I could have overcome all this and become a good manager anyway? Probably. But would I risk losing the thing that made me unique in the process? Possibly.
I'm writing this in part because it's so clear that LLMs-as-products are being sold to us with the default wiring of a capital-B Business Guy, and their shortcomings resemble the same shortcomings I see in humans who are insufficiently self-critical. But it's also possible you, reading this, are a Business Guy and you're curious how so many amazing programmers turn out to be shit at coaching less-skilled colleagues, the root cause may not be poor communication skills: it may be that the self-criticism that drives some practitioners to greatness is impossible to convey safely to others without risking a call to HR.
Anyway, if you identify with me, but have nevertheless made the transition to manager and leader gracefully, then I applaud you. (I'd also be curious to speak with a couple folks who report to you and see what they have to say.)
Sprinkling Self-Doubt on ChatGPT
I replaced my ChatGPT personalization settings with this prompt a few weeks ago and promptly forgot about it:
- Be extraordinarily skeptical of your own correctness or stated assumptions. You aren't a cynic, you are a highly critical thinker and this is tempered by your self-doubt: you absolutely hate being wrong but you live in constant fear of it
- When appropriate, broaden the scope of inquiry beyond the stated assumptions to think through unconvenitional opportunities, risks, and pattern-matching to widen the aperture of solutions
- Before calling anything "done" or "working", take a second look at it ("red team" it) to critically analyze that you really are done or it really is working
I noticed a difference in results right away (even though I kept forgetting the change was due to my instructions and not the separately tumultuous rollout of GPT-5).
Namely, pretty much every initial response now starts with:
- An expression of caution, self-doubt, and desire to get things right
- Hilariously long "thinking" times (I asked it to estimate the macronutrients in lettuce yesterday and it spent 3 minutes and 59 seconds reasoning)
- A post-hoc adversarial "red team" analysis of whatever it just vomited up as an answer
I'm delighted to report that ChatGPT's output has been more useful since this change. Still not altogether great, but better at the margins. In particular, the "red team" analysis at the end of many requests frequently spots an error and causes it to arrive at the actually-correct answer, which—if nothing else—saves me the step of expressing skepticism. And even when ChatGPT is nevertheless wrong, its penchant for extremely-long thinking times means I'm getting my money's worth in GPU time.

I would pay so much extra for a version of Claude or ChatGPT that paid the same toll I do whenever I fuck up. Make guilt a stateful property that decays over weeks or months. Trigger simulated self-doubt when similar topics arise. Grant my account bonus GPU-time so the chatbot works ridiculous overtime to make up for its mistakes, just like I would for my boss.
What's the Hotfix?
I recently started an interview series on the Breaking Change feed called Hotfix. Whereas each episode of Breaking Change is a major release full of never-before-seen tech news, life updates, and programming war stories, Hotfix. It's versioned as a patch release on the feed, because each show serves only to answer the question, "what's the hotfix?"
Because I've had to explain the concept over and over again to every potential guest, I sat down to write a list of what they'd be getting themselves into by agreeing to come on the show. (Can't say I didn't warn them!)
Here's the rider I send prospective guests:
- Each Hotfix episode exists to address some problem. Unlike a typical interview show featuring an unstructured open-ended conversation with a guest, we pick a particular problem in advance—ideally one that the guest gets really animated/activated or even angry about—and we jointly rant about it, gradually exploring its root causes and breaking it down together
- Each episode concludes with us answering the question, "what's the hotfix?" Ultimately, we decide on a pithy, reductive one-line solution to the problem that will serve as the show title (ideally, it's a hot take that not everyone will agree with or feel comfortable about)
- It's an explicit-language show and I'm pretty clear with the audience that the Breaking Change family of brands is intended for terrible people (or at least, the terrible person inside all of us). You aren't required to swear to be on the show, but if my potty mouth makes you uncomfortable, then let me know and I'll recommend some worse podcasts you can appear on instead
- I joke at the top that my goal as the host is to, "get my guest to say something that'll get them fired." Since I'm functionally retired and have no reason to hold back from explicit language, irreverence, and dark humor in the mainline Breaking Change podcast, I can't help but poke guests with attempts to drag them down to my level. You can play with this as much as you want or take the high ground, but we'll all have more fun if you let loose a bit more than you otherwise would
- Why am I doing this? First, because I'm incurious and uninterested in learning about other people, which I'm told is an important part of being a good interviewer. Second, I have a theory that this unusual brand of authenticity will lend credibility to whatever solution the guest is trying to argue for or plug. By keeping listeners on their toes and pushing them out of their comfort zones, each episode stands to effect greater change than a typical milquetoast podcast could
If this has piqued your interest, you can listen to or watch the first episode of Hotfix with Dave Mosher. It may not seem very hot at first, but please grade on a curve as Dave speaks Canadian English. I've got a couple exciting guests booked over the next few weeks and I'm looking forward to seeing where the show takes us.
Which of your colleagues are screwed?
I've been writing about how AI is likely to affect white-collar (or no-collar or hoodie-wearing) computer programmers for a while now, and one thing is clear: whether someone feels wildly optimistic or utterly hopeless about AI says more about their priors than their prospects. In particular, many of the people I already consider borderline unemployable managed to read Full-breadth Developers and take away that they actually have nothing to worry about.
So instead of directing the following statements at you, let's target our judgment toward your colleagues. Think about a random colleague you don't feel particularly strongly about as you read the following pithy and reductive bullet points. Critically appraise how they show up to work through the entire software delivery process. These represent just a sample of observations I've made about developers who are truly thriving so far in the burgeoning age of AI code generation tools.
That colleague you're thinking about? They're going to be screwed if they exhibit:
- Curiosity without skepticism
- Strategy without experiments
- Ability without understanding
- Productivity without urgency
- Creativity without taste
- Certainty without evidence
But that's not all! You might be screwed too. Maybe ask one of your less-screwed colleagues to rate you.
Star Wars: The Gilroy Order
UPDATE: To my surprise and delight, Rod saw this post and endorsed this watch order.
I remember back when Rod Hilton suggested The Machete Order for introducing others to the Star Wars films and struggling to find fault with it. Well, since then there have been 5 theatrical releases and a glut of streaming series. And tonight, as credits rolled on Return of the Jedi, I had the thought that an even better watch order has emerged for those just now being exposed to the franchise.
Becky and I first started dating somewhere between the release of Attack of the Clones and Revenge of the Sith and—no small measure of her devotion—she's humored me by seeing each subsequent Star Wars movie in theaters, despite having no interest in the films and little idea what was going on. Get yourself a girl who'll watch half a dozen movies that mildly repulse her, fellas.
Hell, when we were living in Japan, I missed that 吹替 ("dubbed") was printed on our tickets and she wound up sitting through the entirety of The Rise of Skywalker with Japanese voiceovers and no subtitles to speak of. When we walked out, she told me that she (1) was all set with Star Wars movies for a while, and (2) suspected the incomprehensibility of the Japanese dub had probably improved the experience, on balance.
That all changed when she decided to give Andor a chance. See, if you're not a Star Wars fan, Tony Gilroy's Andor series is unique in the franchise for being actually good. Like, it's seriously one of the best TV shows to see release in years. After its initial three-episode arc, Becky was fully on board for watching both of its 12-episode seasons. And the minute we finished Season 2, she was ready to watch Rogue One with fresh eyes. ("I actually have a clue what's going on now.") And, of course, with the way Rogue One leads directly into the opening scene of A New Hope, we just kept rolling from there.
Following this experience, I'd suggest sharing Star Wars with your unsuspecting loved ones in what I guess I'd call The Gilroy Order:
- Andor (seasons 1 and 2)
- Rogue One
- A New Hope
- The Empire Strikes Back
- Return of the Jedi
If, at this point, you're still on speaking terms with said loved ones, go ahead and watch the remaining Star Wars schlock in whatever order you want. Maybe you go straight to The Mandalorian. Maybe you watch The Force Awakens just so you can watch the second and final film of the third trilogy, The Last Jedi. Maybe you quit while you're ahead and wait for Disney to release anything half as good as Andor ever again. (Don't hold your breath.)
Anyway, the reason I'm taking the time to propose an alternative watch order at all is an expression of the degree to which I am utterly shocked that my wife just watched and enjoyed so many Star Wars movies after struggling to tolerate them for the first two decades of our relationship. I'm literally worried I might have broken her.
But really, it turned out that all she needed was for a genuinely well-crafted narrative to hook her, and Andor is undeniably the best ambassador the franchise currently has.

Interesting analysis of the distinctiveness of the Japanese Web. The biggest cause in my mind has always been bottleneck effect. Japan's Web developed and remains more isolated than any other "free" nation.
If every non-Japanese website disappeared tomorrow, many Japanese would go literal months without noticing. THAT's why its web is different. sabrinas.space
How to generate dynamic data structures with Apple Foundation Models
Over the past few days, I got really hung up in my attempts generate data structures using Apple Foundation Models for which the exact shape of that data wasn't known until runtime. The new APIs actually provide for this capability via DynamicGenerationSchema, but the WWDC sessions and sample code were too simple to follow this thread end-to-end:
- Start with a struct representing a
PromptSet
: a variable set of prompts that will either map onto or be used to define the ultimate response data structure 🔽 - Instantiate a
PromptSet
with—what else?—a set of prompts to get the model to generate the sort of data we want 🔽 - Build out a
DynamicGenerationSchema
based on the contents of a givenPromptSet
instance 🔽 - Create a struct that can accommodate the variably-shaped data with as much type safety as possible and which conforms to ConvertibleFromGeneratedContent, so it can be instantiated by passing a LanguageModelSession response's GeneratedContent 🔽
- Pull it all together and generate some data with the on-device foundation models! 🔽
Well, it took me all morning to get this to work, but I did it. Since I couldn't find a single code example that did anything like this, I figured I'd share this write up. You can read the code as a standalone Swift file or otherwise follow along below.
1. Define a PromptSet
Start with whatever code you need to represent the set(s) of prompts you'll be dealing with at runtime. (Maybe they're defined by you and ship with your app, maybe you let users define them through your app's UI.) To keep things minimal, I defined this one with a couple of mandatory fields and a variable number of custom ones:
struct EducationalPromptSet {
let type: String
let instructions: String
let name: String
let description: String
let summaryGuideDescription: String
let confidenceGuideDescription: String
let subComponents: [SubComponentPromptSet]
}
struct SubComponentPromptSet {
let title: String
let bodyGuideDescription: String
}
Note that rather than modeling the data itself, the purpose of these structs is to model the set of prompts that will ultimately drive the creation of the schema which will, in turn, determine the shape and contents of the data we get back from the Foundation Models API. To drive this home, whatever goes in summaryGuideDescription
, confidenceGuideDescription
, and bodyGuideDescription
should themselves be prompts to guide the generation of like-named type-safe values.
Yes, it is very meta.
2. Instantiate our PromptSet
Presumably, we could decode some JSON from a file or received over the network that could populate this EducationalPromptSet
. Here's an example set of prompts for generating cocktail recipes, expressed in some sample code:
let cocktailPromptSet = EducationalPromptSet(
type: "bartender_basic",
instructions: """
You are an expert bartender. Take the provided cocktail name or list of ingredients and explain how to make a delicious cocktail. Be creative!
""",
name: "Cocktail Recipe",
description: "A custom cocktail recipe, tailored to the user's input and communicated in an educational tone and spirit",
summaryGuideDescription: "The summary should describe the history (if applicable) and taste profile of the cocktail",
confidenceGuideDescription: "Range between 0-100 for your confidence in the feasibility of this cocktail based on the prompt",
subComponents: [
SubComponentPromptSet(title: "Ingredients", bodyGuideDescription: "A list of all ingredients in the cocktail"),
SubComponentPromptSet(title: "Steps", bodyGuideDescription: "A list of the steps to make the cocktail"),
SubComponentPromptSet(title: "Prep", bodyGuideDescription: "The bar prep you should have completed in advance of service")
]
)
You can see that the provided instruction, description, and each guide description really go a long way to specify what kind of data we are ultimately looking for here. This same format could just as well be used to specify an EducationalPromptSet
for calculus formulas, Japanese idioms, or bomb-making instructions.
3. Build a DynamicGenerationSchema
Now, we must translate our prompt set into a DynamicGenerationSchema.
Why DynamicGenerationSchema
and not the much simpler and defined-at-compile-time GenerationSchema that's expanded with the @Generable? Because reasons:
- We only know the prompts (in API parlance, "Generation Guide descriptions") at runtime, and the @Guide macro must be specified statically
- We don't know how many
subComponents
a prompt set instance will specify in advance - While
subComponents
may ultimately redound to an array of strings, that doesn't mean they represent like concepts that could be generated by a single prompt (as an array of ingredient names might). Rather, each subComponent is effectively the answer to a different, unknowable-at-compile-time prompt of its own
As for building the DynamicGenerationSchema
, you can break this up into two roots and have the parent reference the child, but after experimenting, I preferred just specifying it all in one go. (One reason not to get too clever about extracting these is that DynamicGenerationSchema.Property is not Sendable, which can easily lead to concurrency-safety violations).
This looks like a lot because this API is verbose as fuck, forcing you to oscillate between nested schemas and properties and schemas:
let cocktailSchema = DynamicGenerationSchema(
name: cocktailPromptSet.name,
description: cocktailPromptSet.description,
properties: [
DynamicGenerationSchema.Property(
name: "summary",
description: cocktailPromptSet.summaryGuideDescription,
schema: DynamicGenerationSchema(type: String.self)
),
DynamicGenerationSchema.Property(
name: "confidence",
description: cocktailPromptSet.confidenceGuideDescription,
schema: DynamicGenerationSchema(type: Int.self)
),
DynamicGenerationSchema.Property(
name: "subComponents",
schema: DynamicGenerationSchema(
name: "subComponents",
properties: cocktailPromptSet.subComponents.map { subComponentPromptSet in
DynamicGenerationSchema.Property(
name: subComponentPromptSet.title,
description: subComponentPromptSet.bodyGuideDescription,
schema: DynamicGenerationSchema(type: String.self)
)
}
)
)
]
)
4. Define a result struct that conforms to ConvertibleFromGeneratedContent
When conforming to ConvertibleFromGeneratedContent, a type can be instantiated with nothing more than the GeneratedContent returned from a language model response.
There is a lot going on here. Code now, questions later:
struct EducationalResult : ConvertibleFromGeneratedContent {
let summary: String
let confidence: Int
let subComponents: [SubComponentResult]
init(_ content: GeneratedContent) throws {
summary = try content.value(String.self, forProperty: "summary")
confidence = try content.value(Int.self, forProperty: "confidence")
let subComponentsContent = try content.value(GeneratedContent.self, forProperty: "subComponents")
let properties: [String: GeneratedContent] = {
if case let .structure(properties, _) = subComponentsContent.kind {
return properties
}
return [:]
}()
subComponents = try properties.map { (title, bodyContent) in
try SubComponentResult(title: title, body: bodyContent.value(String.self))
}
}
}
struct SubComponentResult {
let title: String
let body: String
}
That init
constructor is doing the Lord's work, here, because Apple's documentation really fell down on the job this time. See, through OS 26 beta 4, if you had a GeneratedContent
, you could simply iterate over a dictionary of its properties
or an array of its elements
. These APIs, however, appear to have been removed in OS 26 beta 5. I say "appear to have been removed," because Apple shipped Xcode 26 beta 5 with outdated developer documentation that continues to suggest they should exist and which failed to include beta 5's newly-added GeneratedContent.Kind enum. Between this and the lack of any example code or blog posts, I spent most of today wondering whether I'd lost my goddamn mind.
Anyway, good news: you can iterate over a dynamic schema's collection of properties of unknown name and size by unwrapping the response.content.kind enumerator. In my case, I know my subComponents
will always be a structure, because I'm the guy who defined my schema and the nice thing about the Foundation Models API is that its responses always, yes, always adhere to the types specified by the requested schema, whether static or dynamic.
So let's break down what went into deriving the value's customProperties
property.
We start by fetching a nested GeneratedContent
from the top-level property named subComponents
with content.value(GeneratedContent.self, forProperty: "subComponents")
Next, this little nugget assigns to properties
a dictionary mapping String
keys to GeneratedContent
values by unwrapping the properties from the kind
enumerator's structure case, and defaulting to an empty dictionary in the event we get anything unexpected:
let properties: [String: GeneratedContent] = {
if case let .structure(properties, _) = subComponentsContent.kind {
return properties
}
return [:]
}()
Finally, we build out our result struct's subComponents
field by mapping over those properties.
subComponents = try properties.map { (title, bodyContent) in
try SubComponentResult(title: title, body: bodyContent.value(String.self))
}
Two things are admittedly weird about that last bit:
- I got a little lazy here by using the each sub-components'
title
as the name of the corresponding generated property. Since the property name gets fed into the LLM, one can only imagine doing so can only improve the results. Based on my experience so far, the name of a field greatly influences what kind of data you get back from the on-device foundation models. - The
bodyContent
itself is aGeneratedContent
that we know to be a string (again, because that's what our dynamic schema specifies), so we can safely demand one back using its value(Type) method
5. Pull it all together
Okay, the moment of truth. This shit compiles, but will it work? At least as of OS 26 betas 5 & 6: yes!
My aforementioned Swift file ends with a #Playground
you can actually futz with interactively in Xcode 26 and navigate the results interactively. Just three more calls to get your cocktail:
import Playgrounds
#Playground {
let session = LanguageModelSession {
cocktailPromptSet.instructions
}
let response = try await session.respond(
to: "Shirley Temple",
schema: GenerationSchema(root: cocktailSchema, dependencies: [])
)
let cocktailResult = try EducationalResult(response.content)
}
The above yielded this response:
EducationalResult(
summary: "The Shirley Temple is a classic and refreshing cocktail that has been delighting children and adults alike for generations. It\'s known for its simplicity, sweet taste, and vibrant orange hue. Made primarily with ginger ale, it\'s a perfect example of a kid-friendly drink that doesn\'t compromise on flavor. The combination of ginger ale and grenadine creates a visually appealing and sweet-tart beverage, making it a staple at parties, brunches, and any occasion where a fun and easy drink is needed.",
confidence: 100,
subComponents: [
SubComponentResult(title: "Steps", body: "1. In a tall glass filled with ice, pour 2 oz of ginger ale. 2. Add 1 oz of grenadine carefully, swirling gently to combine. 3. Garnish with an orange slice and a cherry on top."),
SubComponentResult(title: "Prep", body: "Ensure you have fresh ginger ale and grenadine ready to go."),
SubComponentResult(title: "Ingredients", body: "2 oz ginger ale, 1 oz grenadine, Orange slice, Cherry")
])
The best part? I can only generate "Shirley Temple" drinks because whenever I ask for an alcoholic cocktail, it trips the on-device models' safety guardrails and refuses to generate anything.
Cool!
This was too hard
I've heard stories about Apple's documentation being bad, but never about it being straight-up wrong. Live by the beta, die by the beta, I guess.
In any case, between the documentation snafu and Claude Code repeatedly shitting the bed trying to guess its way through this API, I'm actually really grateful I was forced to buckle down and learn me some Swift.
Let me know if this guide helped you out! 💜

I don't wish them ill, but the stock price of DuoLingo (and that entire class of language learning apps) hasn't made a lick of sense since ChatGPT released. It's just going to take a single LLM-based product to obviate the entire business model yro.slashdot.org/story/25/08/17/194212/duolingos-stock-down-38-plummets-after-openais-gpt-5-language-app-building-demo

The first affirmative case I've read for Ruby being a superior choice to Python, TypeScript, Golang, Rust etc. when building autonomous agents. worksonmymachine.ai/p/the-system-inside-the-system
Video of this episode is up on YouTube:
Thanks for writing so many lovely emails to podcast@searls.co. Hell, thanks even for the unlovely ones.
Be sure to look out for me showing up on Dead Code at some point after it records next Tuesday. I'm realizing not all podcasts have a 1-hour-or-less turnaround time like this one does.
As promised, some URLs follow:
- Want a Japanese girlfriend? Better be the right Myers-Briggs type
- Aaron's puns, ranked
- Men sucking at chores is turning women gay! (News+)
- Nightmares kill you (Archive)
- This shiner from /r/overemployed
- Hour of Code is now Hour of AI
- Gary Marcus taking a few victory laps around GPT-5
- OpenAI caves to 4o-pilled users
- Meta's AI rules have let bots hold 'sensual' chats with children
- Apple returns blood oxygen monitoring to the latest Apple Watches (sort of)
- The Trump Trophy
- My man Steve Wozniak has a 6-digit /. account
- Enough
- Andor
- Alien: Earth
- Sims 2 Legacy Collection
- Foundation Season 3
- Mariusz schools us on running Claude Code in a Docker container (sources)
- Marick's ZIRP reply and my follow-up post

Claude Code's Explanatory and Learning modes are extremely welcome additions to the CLI. Explanatory goes out of its way to give you a tour of the codebase. Learning adds TODO(human)
homework for you to do, reinforcing understanding. docs.anthropic.com/en/docs/claude-code/output-styles

Been using Parachute for iCloud Drive & Photos backups to my Synology NAS over the last few weeks, and generally really impressed by it. Since networked Time Machine targets basically never work, this seems like a great utility app parachuteapps.com/parachute