Multifaceted: the linguistic echo chambers of LLMs

James Padolsey noticed the phrase "complex and multifaceted" was cropping up more often than usual and makes a compelling case that the meme is actually driven by LLMs overindexing on it:

As we see, from 2021 onwards, just around the time when GPT and other LLMs started to take the world by storm, the prevalence of our word 'multifaceted' increased significantly, from being in only 0.05% of PDFs to 0.23%.

This is really fascinating for a couple reasons.

First, I suspect if we have any hope of fingerprinting AI-generated text, it will probably be to cross-reference the date of publication with the emergence of contemporarious LLM memes like this one.

Second, I'm not an LLM expert by any stretch, but I wouldn't be surprised if this wasn't due to bottleneck in training data per se, but rather the result of the method LLMs are being rewarded for during training. It could be that a definitive and intellectual-seeming statement that can be applied to literally any genre of content would occupy a wider slot on the AI Plinko board than a phrase that hewed more closely to a more specific cluster of topics.

Of course, the fun part of discussing LLMs in the early 2020s, though, is that the correct answer is always, "who the hell knows!"

