'A.I.' is just a search engine

You're probably going to hear a whole lot about the "artificial intelligence bubble" in coming weeks and months, if you haven't already. I regret to report that it is not what it sounds like. When you think "artificial intelligence bubble," you probably imagine a massive soap bubble that won't shut up about the duality of man or whatever—that'd be pretty exciting, albeit of limited utility—but unfortunately we mean the bad kind of bubble.

Whenever you hear bubble on the news it is always, always, always the bad kind of bubble. That's how news works.

What drives me mad, however, is that there's still been precious few non-technical explanations of just what this thing markets are calling artificial intelligence actually is. You would think that part would be really important! You can't have entire governments agreeing to burn the planet to cinders while just mumbling their way through even the most general explanation of what it is we're roiling electricity markets, reengineering tech markets, and ostensibly reshaping every aspect of our sorry meat-confined existence in order to manifest.

No, that part is always skipped. There's always room for some industry pseudo-philosopher to belt out another take about how close we are to finally, at long last, being ruled by robot overlords like we always dreamed of. But journalists don't spend even a sentence explaining What. The Product. Does.

If you're non-technical, here's what you need to know about "Artificial Intelligence" in its current incarnation. By "artificial intelligence," the markets almost exclusively mean "Large Language Models," the algorithms that OpenAI and Grok and all the other companies are based on.

So what's a Large Language Model?

A Large Language Model is a form of search engine.

That's it. It isn't anything else. It isn't "intelligence" in any sense, not artificial or otherwise. It's a kind of search algorithm, not all that different from the one that runs Google. Every time somebody on television says "artificial intelligence," they mean "a search engine."

Okay, you're all up to speed now. I think we can wrap this up, so thanks for join—what's that? You want a bit more detail than that? Fine. I suppose that's only fair.

The details.

As I said, the "artificial intelligence" products currently being stuffed into your every orifice by panicking corporate executives are, at heart, a not-particularly-complicated form of search engine. Like any search engine, LLMs operate by sucking up vast quantities of data and stuffing it in an archive.

When you type a phrase into the LLM's search box, or when you "engineer your prompt" if you're going to be a pretentious Silicon Valley jackass about it, the LLM searches its archives for the phrases you've used and responds by showing you the phrases that real-life humans in its archive have been most likely to reply with. That's the key difference: While Google and other dominant current search engines tend to respond by showing you a list of internet sites that it thinks are the best matches for what you've typed, LLMs comb through their archives and assembles responses phrase-by-phrase and paragraph-by-paragraph.

So you don't get a list of internet pages, when you ask a question of ChatGPT or one of the other LLMs. Instead, you get whatever was on those pages, all chewed up, mixed together, and spit back out in well-formatted text. The algorithm takes the pieces it finds and reassembles them as if you asked it to deliver a phrase-by-phrase "average response" to your question. I don't mean the best response, because there's no such thing. The algorithm works by finding the most strongly associated phrases to use in its own response.

Once you understand that, most of the other details are self-explanatory. That's why the pseudo-"AI" of large language models tends to be very good at answering questions for which the internet has hundreds and hundreds of answers already. It's also why "AI" tends to fail badly when you ask a question that hasn't been asked very often. If it doesn't already have dozens or hundreds of examples of how human beings answered the same question, it has to get increasingly "fuzzy" in its decisions about what to show you instead.

If you understand this, the real-world limitations of this particular incarnation of "AI" begin to become obvious—and it's easy to see those limitations when you're actually using the products.

Since computer programming has become one of the most-hyped professional use cases for LLMs, we'll use that as an example. LLMs do great when it comes to answering beginner and slightly-better-than-beginner questions that have already been asked on Reddit, Stack Overflow, or similar community-driven sites—because both the questions and the answers tend to be both very specific and very repetitive. If your current problem is prominent enough to have inspired entire YouTube videos explaining the answer, your chatbot will have you covered.

Ask a question about a problem that almost nobody else has had, though, and things fall apart quickly. That's when so-called "hallucinations" kick in; the algorithm can't find enough data to know what the "average" answer looks like, so it has to rely on the "average" answers for questions that are sort of like yours, or questions that are sort of like ones that are sort of like yours.

Not only are LLM's not magical, they literally can't answer any question that you can't already find from a regular search engine search. That's because it's all the same data, in the end. Google archives the whole damn internet in order to present a list of pages that are linguistically "similar" to what you typed; LLMs archive the whole damn internet in order to build their maps of linguistically "similar" phrases and paragraphs.

It also explains why LLM "products" trained to prioritize certain sources while de-prioritizing others might, say, start randomly spewing the same racist language that those "certain sources" commonly use. If you've told it to prioritize phrases that people like that most frequently use themselves, looking in your general direction Elon Musk, then that's what it does. If you want to act surprised by that, you do you.

As a side note, "AI" backers tend to get ticked off when you assert that their hyperintelligent imaginary friend is just a fancy search engine. It's not a search engine, they say. It may have been trained by searching and indexing all the digital content it could possibly find, but all of that content was abstracted away again in the training step, the algorithms don't actually keep it themselves or deliver it to users.

Except the algorithms actually do just that.

Qian added, “Perhaps what was surprising is that we found that OpenAI’s GPT-4, which is arguably the most powerful model that’s being used by a lot of companies and also individual developers, produced copyrighted content on 44% of prompts that we constructed.”

Wait, so what's the breakthrough here?

The breakthrough here is in what large language models are so-far unmatched at, and the hint is in the name. Large language models are very, very good at modeling written language. They're quite good at coming up with text that's properly formatted, uses the right grammar, and has the same rhetorical tricks as human authors because that's the thing it has the most data on, by far.

LLMs have the whole of the internet—plus whatever else their human minders can scrape up, which is resulting in a hell of a lot of litigation about copyrights right now—to come up with "average human sentences." It has many millions of essays, of paragraphs, of sentences, and of phrases to show it what sorts of sentences go where; the entire data set is about where words go.

It's not coming up with anything new, but it's blending all of that information into a milquetoast Average Written Response with so few syntactic errors that our meat-based pattern-seeking brains can't tell the difference.

The loneliness to psychosis pipeline is going to be overwhelmingly flooded
— benjamin (@benjamino.bsky.social) 2025-08-19T22:35:44.152Z

Yeah ... sorry. There's no magic there. "It asked me to frame the doorway into that dream space to enter" is not a deep thought. It's a common metaphor both in and about dreams, and that's almost certainly where ChatGPT ripped it from. The notion that there are hidden conversations inside our conversations is also a relentlessly explored concept.

There's no question that Silicon Valley has indeed invented something truly significant here. It's invented the world's first Artificial Stoner. It's invented a machine that can confidently parrot back some of the most time-worn tropes and questions human beings have ever pondered, come to the most trope-y and generic conclusions, and insist that it has discovered a deeper truth that nobody has yet pondered.

A truth that will be forgotten again an hour later—but in the moment, it was deep indeed. There ya go, my little electronic buddy, you were just one more toke away from solving physics forever only to lose it again when you saw a cat outside. Better luck next time.

It's the similarity to written human conversation that's the LLM's secret sauce, and it's one that indeed has plenty of real-world utility. There are plenty of individuals for whom a cobbled-together "average human answer" to a question is, as we've seen, an easier to grasp thing than clicking through several pages of enshittified search results.

That's a valid use case. Returning search results in a format that's more comfortable for casual questioners has been the whole point behind voice assistants like Siri and Alexa. It's the reason your smartphone has a bunch of little pictures on it instead of requiring you to thumb-type "C colon backslash weather dot app" to find out if it'll be cloudy tomorrow. Mimicking the way humans respond to humans is a significant technology, and one that every decade sees new advances in. Some flop. Some prove indispensable. Most of them develop cult followers who insist that the machine isn't just responding to prompts, it's demonstrating intelligence, because as it turns out our meat brains are very, very susceptible to that particular conceit.

Anything we don't immediately understand is considered a form of magic, and when that happens we tend to do one of three things.

1) We set it on fire.

2) We declare it a miracle and form a new religion around it.

3) We try to figure out how to have sex with it.

I don't make the rules, I just report them.

There are, however, a couple of very specific problems with LLM-based search engines.

It's still just a search engine. A very, very expensive search engine.

Just because a given computer algorithm can effectively mimic human writing by sheer brute-forcing its way through the collected output of the whole human species until it figures out how writing tends to be is structured does not mean it can think. It's still only returning search results, and as we've noted, it only returns plausible results if it has a large chunk of actual human-written answers to work with.

You can see where all of this breaks down whenever you ask a question that has a simple, definitive answer but which does not tend to be asked in normal conversation. Everything goes belly-up.

I had to try the “blueberry” thing myself with GPT5. I merely report the results.
— Kieran Healy (@kjhealy.co) 2025-08-08T00:03:59.716Z

The problem arises because the algorithm isn't thinking. It's returning the average result for an average question, and people do not tend to go on the internet to ask how many R's are in blueberry.

What the algorithm might know, however, is that about a year ago there were a few thousand memes unleashed after users discovered that ChatGPT couldn't answer the question "how many R's are in the world strawberry." It said two. The correct answer is three.

That was a hell of a news cycle, so at this point the algorithm has now been nudged towards thinking that when people talk about r's and berries, they expect to see the answer Three no matter what sort of berry you're talking about. Google's own AI results show the problem:

Ask it how many R's are in the word "buffaloberry", and it responds "three." Nope!

Ask it how many R's are in "bullberry", and it will churn for a while before responding "one." Well, I suppose it all averages out in the end.

It doesn't know the answer. It can't "learn" the answer. The search algorithm doesn't know what the hell you're talking about, and doesn't care. If no humans have ever asked and received answers to a question, at least not in the spaces the algorithm's human minders were able to dump into it, there's no answer for it to "find."

This is why LLM's can't do math: There is no LLM concept of "math." If you ask what 2+2 equals, you'll likely get the right answer because there are a million cultural references that note 2+2=4. It's not a math question, it's an idiom. Ask it a question that just happens to be a longtime professor's go-to final exam question, it'll probably find that too—and for similar reasons.

But it can't reason. It's solving the problems by finding linguistic matches.

The algorithm doesn't understand satire. That might be an avoidable problem, had the human minders not funneled the entire contents of The Onion, Reddit, and every message board post from the AOL days onward into their models in an attempt to get the natural language parts working. But it is a problem, and that's exactly why large language models tell you that you can thicken your pizza sauces by adding glue.

It doesn't know a damn thing about chemistry either, which is why ChatGPT allegedly told a man he could replace the sodium chloride in his diet—that is, salt—with sodium bromide—a toxin.

How could a computer make such a mistake, you ask? How could it not? Again, this is a search engine that relies solely on finding linguistic matches to queries, and all it knows is that people who tend to talk about chloride also tend to talk about bromide, because they're nerds, so if somebody's asking a nerd question the machine knows it needs to prioritize a nerd answer.

Under the hood, madness reigns.

The second existential problem for current LLM-based companies is that as search engines go, this sort of algorithm is very, very, very expensive. And that, too, is a fundamental problem with the underlying technology.

There's a reason why the current incarnations of chatbots can sort of get away with calling themselves "artificial intelligence," but it's not because they're intelligent. It's because of how they work. They're based on decades of artificial intelligence research (or a particular branch of it) that posited "how do we get computers to mimic how neurons in our brains work," complicated only a bit by the problem that we don't understand how neurons in our brains work all that well to begin with, and the only important part of that history is that the resulting LLMs and their near-identical image-generating and sound-generating and video-generation cousins all work the same way and have the same problems.

They're all designed to perform searches by not storing the page-by-page contents of what they've archived, but with blurring step; they store what they find chunk-by-chunk and phrase-by-phrase, ostensibly building a list of human phrases and thoughts that are detached (stolen?) from the pages they were collected from. (Except even that's a bit of a lie; you'll see from Google's search page AI product that actually it knows full well where a lot of the information comes from, and will link you to it just fine if you're the sort of luddite jerk who doesn't want to take the search engine's word for it.)

There's a lot of buzzwords and you don't need to know any of them. What's important is that the search that gets performed is very, very "fuzzy," with a lot of layers and abstraction to it.

A normal search engine search would take your question, break it into words and phrases, search for the pages that most prominently feature those words and phrases, and return them. Boom, done, over and out.

An LLM-based search happens a bit differently. Your search still gets broken into its words and phrases, abstracted into algorithmic bits called—you know what, never mind what the marketers call them, we're going to call them "potatoes." Your question gets translated onto the fleshy skin of a dozen potatoes. The electronic potatoes get shoved down an electronic tube, down to a multi-layered algorithm that looks for potatoes that sort of look like your potatoes, then down to layers that find more potatoes that look like even more your potatoes, and maybe back up a layer, and back down again, and eventually there's a good little chunk of an entire data center devoted to finding potatoes that look sort of like potatoes that would normally hang out sort of near your personal potatoes and if you think this all sounds insanely expensive YOU ARE RIGHT.

LLM algorithms are based on research into how to do "fuzzy" searches based on huge maps of nodes (or "neurons" or "potatoes") that have stronger or weaker connections to other nodes, sort of how we think signal processing in human brains and maybe certain fungi works. By blurring the search and not returning exactly what was asked for, but things with the thickest path-based adjacency to what was asked for, researchers look to mimic the more abstract patterns that make human brains able to "create" new information through extrapolation and association rather than just regurgitating back whatever you put into them.

The problem with the research has, for decades, been the enormous computing power required to actually pull such a ridiculous brute-force solution off. For a very long time it was effectively impossible. Then it was possible, if you happened to have access to purpose-built university machines and could arrange to use them for long stretches of time. And then it became possible so long as you had any ol' data center, a mountain of high-performance chips of the same sort you'll find in home computers, and a billion dollars to purchase the latter and shove them in the former.

That was the LLM revolution. "This technology is finally plausible with store-bought equipment, but it's still so slow and expensive that we can't possibly commercialize this" was revolutionized by the new, tech-evangelized model of "f--k it, what if we just do it anyway."

I'm simplifying. But not by a lot.

Performing a "fuzzy" search that works less like a database and more like a free-association game in which words lead to concepts lead to colors lead to potatoes lead to fries lead back to words is a very neat trick. It's what makes new image generation tools possible: the algorithm doesn't know anything about art, but it knows that if it sees something shaped like a human torso there's usually at least one and probably more things that look like legs just below it. The blurred-question and blurred-answer approach is perfect for such cases, or at least you can make it passable after you've spent a few years building vast human sweatshops that "train" your models by repetitively clicking "good" or "bad" on the images they generate until your model "learns" what its human minders most want to see from its output.

Until recently this was too expensive a proposition to do as consumer-level product at all, with or without the near-slave-labor doing a good chunk of the actual "training," and while LLM companies have been able to brute force their way to the scale needed to do it, there's no obvious path to making it cheaper.

So we've got companies pondering new nuclear plants for the sole purpose of coming up with the electricity needed to run the queries, and electricity prices in some places are spiking because new data centers are using up enough electricity to threaten local grids, and supposedly pseudo-plausibly-maybe-climate-aware companies like Microsoft and Google are quietly announcing that Actually, screw the planet, we need to blow all previous carbon emissions estimates out of the water because producing an imaginary friend who will pretend to be sexually attracted to you is more important than the survival of humanity itself.

That is how expensive LLMs are to run, and there is currently no plausible solution for any of it. Investors are shoveling hundreds of billions into a product that is likely already near the peak of its efficiency—and utility.

So what happens next?

No, really, what happens next?

While most of us are preoccupied with a dozen other political and economic crises, the uncertainty roiling tech markets right now is very much about the inability of "A.I." companies to answer, for realsies this time, how they're going to make the trillions of dollars investors were promised before the whole damn sector runs out of disposable cash.

The "runs out of cash" part of the scenario is one we've been inching up to for a long, long time now, because even the biggest investors in the industry do not themselves have infinite cash to throw down this well. But the "A.I." companies continue to hemorrhage money, even setting aside the visions of building entirely new power plants just to keep the things running, and there really seems to be no path to fixing that because of the nature of the technology. It was built to be expensive. It's not going to get cheaper.

Up until now, everyone involved has been betting on expanding computing capacity to the ludicrous size that might be required if "AI" really was the wave of the future, vibe-based investments based mainly on OpenAI CEO Sam Altman's ever-escalating visions of the product as the awakening of a new Robot Jesus that will restructure our entire society much as the invention of the Segway made automobiles obsolete, or something like that.

The part about "how will we get consumers to pay the prices our companies need to even break even," however, is an inquiry that will still get you kicked out of those meetings. And there are very good reasons to believe that the current LLM models are—again, being blunt—already near the peak "intelligence" that can be asked of them.

Even the biggest players in the industry now look to be losing their footing—and their nerve.

Meta freezes AI hiring, WSJ reports reut.rs/4fPDc1x
— Reuters (@reuters.com) 2025-08-21T01:30:26Z

It turns out that quite a few problems can be solved by inventing a new sort of search engine that answers your questions in natural language. It's a genuine tech advance. But A.I. companies sold their products as being much more capable, and much more magical, than "really nifty search engine." They sold it as intelligence. A good chunk of the entire industry has been propped up by corporate-promoted mass delusions and false promises of what was possible.

And, sooner or later, that's going to become everybody's problem. You can't flush a half trillion dollars down your invest-o-toilet too many times before the core investors (like SoftBank) run out of money, new investors start balking because they see that the old investors have run out of money, and that's how you get a run-on-the-bank style market collapse.

When the herd bolts, it bolts all at once in the same direction. Someone was telling me earlier they think investors have a good case here for claiming they were defrauded but I don’t know: if someone sells you a perpetual motion machine and you believe them, then surely caveat emptor applies.
— flyingrodent (@flyingrodent.bsky.social) 2025-08-20T16:08:58.898Z

Running the numbers led a lot of industry critics to warn that the run-on-the-bank part of the current bubble should by all rights have already happened quite some time ago. Sooner or later, one of these companies is going to bounce a billion-dollar check, the power to the data center is going to get shut off, and all hell is going to break loose among the folks who promised giant profits Real Soon Now.

At that point, the weird billionaire hype men who got themselves into this mess will try their hardest to convince world governments to bail them out. And it will probably work, because as with the banking industry crookedness that led to the Great Recession, "the markets" won't be able to stomach the secondary effects of letting the compulsive gamblers who bet wrong eat their own losses.

What you need to know, however, is that none of this is "artificial intelligence." That's not what's being sold. That's not what we're risking national economies and the planet's very atmosphere to build.

It's a new kind search engine with a fancy user interface and an inability to know the difference between fact and fiction. The image generators are just search engines that return the "average" content of images that people describe to be attractive woman or like Ghibli or "a polar bear looking at a Coke can but put Jim Carrey in the background."

It's useful, sure. But it's mostly useful in situations where you need mediocrity, or will settle for it. There is no intelligence there.

Large language models are search engines that return the blurred works of whichever real-life human beings they were trained on. They are only that, and nothing else.

'A.I.' is just a search engine

Hunter Lazzaro

Comments

'A.I.' is just a search engine

Hunter Lazzaro

We rely on your support!

Comments

Related

Latest