Sunday, Jun 9, 2024

3 Simple Rules for Using my Large Language Model

When it comes to AI, it seems like the vast majority of people I talk to believe large language models (LLMs) are either going to surpass human intelligence any day now or are a crypto-scale boondoggle with zero real-world utility. Few people seem to land in-between.

Not a ton of nuance out there.

The truth is, there are tasks for which LLMs are already phenomenally helpful, and tasks for which today's LLMs will invariably waste your time and energy. I've been using ChatGPT, GitHub Copilot, and a dozen other generative AI tools since they launched and I've had to learn the hard way—unlike with web search engines, perhaps—that falling into the habit of immediately reaching for an LLM every single time I'm stuck is a recipe for frustratingly inconsistent results.

As B.F. Skinner taught us, if a tool is tremendously valuable 30% of the time and utterly useless the other 70%, we'll nevertheless keep coming back to it even if we know we're probably going to get nothing out of it. Fortunately, I've been able to drastically increase my success rate by developing a set of heuristics to determine whether an LLM is the right tool for the job before I start typing into a chat window. They're based on the grand unifying theory that language models produce fluent bullshit, which makes them the right tool for the job when you desire fluent output and don't mind inaccurate bullshit.

Generative AI is perhaps the fastest-moving innovation in the history of computing, so It goes without saying that that everything I suggest here may be very useful on June 9th, 2024, but will read as a total farce in the distant future of November 30th, 2024. That said, if you've been sleeping on using LLMs in your daily life up to this point and are looking to improve your mental model of how to best relate to them (as opposed to one-off pro-tips on how to accomplish specific tasks), I hope you'll find this post useful.

So here they are, three simple rules to live by.

Use an LLM when:

You know less than the average person in whatever domain you're asking about (e.g. I know way less Japanese than the average Japanese user, so even a mediocre response would be an improvement on my current understanding)
An imperfect response is still perfectly acceptable (e.g. if I'm asking about retro Japanese cocktails and the LLM tells me Akadama Punch came out in 1978 instead of 1977… no harm no foul)
To whatever extent you desire accuracy, you'd be able to independently validate the LLM's output yourself without the aid of some third tool (e.g. if I ask it for a Ruby script and it makes several boneheaded decisions, I'm very likely to immediately identify and correct them myself without having to reach for Google or StackOverflow)

Alternatively, if you flip all three rules on their head, you can reframe them negatively to surmise when an LLM would be the wrong tool for the job.

Do not use an LLM when:

You know more than the average person in whatever domain you're asking about (e.g. I'm much more experienced at Ruby than most Rubyists, so my refined taste and nuanced understanding means I'm unlikely to be satisfied by an LLM's middle-of-the-road output)
The utility of a response depends on its being highly precise (e.g. if I ask for executable code for a complicated task, the presence of a single flaw could result in more time wasted poring over that code than would've been spent writing it by hand)
To whatever extent you desire accuracy, you'd be unable to independently validate the LLM's output yourself without the aid of some third tool (e.g. if I want the driving distance to a location so to see if it's within my EV's range, the only way to validate an LLM's answer would be to do the work all over again myself in a map application)

How this looks in practice

Reading the above, it should make sense that "Justin, the intermediate Japanese language student" derives value from LLMs more easily than "Justin, the experienced and opinionated software developer".

As a software developer, I need to be very mindful of whether I'm better off solving my problem some other way before I find myself 40 messages deep in an hour-long back-and-forth with a chatbot trying to make it understand what it did wrong. If I want to save time typing by having an LLM "write code like me," I will hate what it creates 100% of the time—this makes sense, because I'm the world's foremost expert on me, and today's LLMs are at best an approximation of the median person posting about something on the internet. (And before you e-mail me, yes, I've tried creating custom GPTs and uploading an entire corpus of Searls-style code and it still can't come close.) That said, having an LLM available as a research assistant has made me much more comfortable trying out new technologies outside my comfort zone, because in the areas where I'm a total neophyte I can be highly confident the LLM will pull me up to at least a middling level of competency—I don't need to worry that I'm going to be slogging through shitty documentation, contradictory StackOverflow posts, and searching PDFs of outdated tech books. Overall, it's a mixed bag, but because I know when to open ChatGPT and when to phone a friend, I find that LLMs usually do what I want.

As a Japanese language learner, today's LLMs—particularly OpenAI's latest GPT 4o model—is fucking lights-out. Like, it's "throw away your dictionary and translation apps," levels of good. It can read handwriting and comprehend audio that many native speakers would struggle to parse. It can answer nuanced questions about grammar and etymology and honorifics that none of my Japanese friends can, and it can do so effortlessly—without the intense, drawn-out struggle of first figuring out how to convey my question in the first place. It's so good that it's seriously got me questioning how much more I should invest in language study (a topic I ruminated on in my latest newsletter). Honestly, the most important thing to avoid with respect to LLMs for second-language education is over-reliance on them—several times while traveling, I caught myself lazily reaching for GPT 4o's stellar handwriting recognition as a crutch to avoid reading inscrutable izakaya menus… or consulting ChatGPT so often I would catch myself failing to be present in the current moment. Overall, though, today's best models are the worst that they will ever be, which suggests the foreign language education business will never be the same again.

Some parting advice

Not for nothing, here are some other tips I've gathered:

If you tend to have very high expectations of other humans—exacting attention to detail, impatient with imperfection, hate having your time wasted—don't be surprised when you're disappointed by LLMs, too. In fact, LLMs' loudest critics often seem to be people who aren't happy with anyone else's work, either
If you don't know what you want, don't be shocked when you don't get what you want. Clearly and precisely communicating what you want—again, just like when dealing with humans—yields more predictable results
If it doesn't get it right the first time, you're usually better giving up rather than trying to extract the right answer via follow-ups. Once an LLM has given you a wrong answer, the likelihood it will get it right after you ask it to try again is very low. And once you've repeated yourself twice, the likelihood of getting a different, satisfactory outcome quickly approaches zero. (It's exactly because a long volley of messages typically indicates failure that I think chat is fundamentally the wrong UI abstraction for these capabilities)
Relatedly, if you realize your initial message led the LLM down the wrong path, you're better off starting over with an improved message than clarifying yourself in a reply—the combined context of both messages will usually just muddy the water
If your question is about a country or culture that speaks a different language, send your first message in the native language of that target group. This works, because it causes the pachinko board that is an LLM's brain to drop the ball in the right place, and you'll get much better answers (if you need to ask the LLM to first translate your request for you, be sure to do so in a separate chat first so as not to anchor it)
When available, leverage GPT custom instructions and persistent memory to get what you want. Every time you get a response that's technically correct but still requires tweaks, use these to "teach" the LLM to do it your way in the future
Never become complacent when an LLM demonstrates a flash of brilliance by assuming its subsequent messages will also be correct. If anything, receiving the perfect response often encourages users to delve into more speculative territory in subsequent requests which in turn can result in spectacularly wrong answers

Anyway, happy prompting! I hope you find this perspective useful in dealing with our newfound stochastic overlords. If you've got ideas of your own you'd like to share, I'd love to hear them—email me!

3 Simple Rules for Using my Large Language Model

Use an LLM when:

Do not use an LLM when:

How this looks in practice

Some parting advice

Got a taste for hot, fresh takes?