Skip to content

AI will never be a reliable source of information, but it's a great way to spread propaganda

Research shows that generative AI will only get better at lying.

4 min read
Photo by Alessio Ferretti / Unsplash

On Monday, graphics card maker Nvidia announced that it intends to invest nearly $100 billion in OpenAI.

The investment gives OpenAI the cash and access it needs to buy advanced chips it needs in the first place to maintain its dominance in an increasingly competitive landscape.

Many of those advanced chips come from Nvidia, and a good portion of Nvidia’s investment in OpenAI will come in the form of silicon. But not to fear, Nvidia will continue to sell its latest AI-centric chips to other companies.

Which makes this sound a lot less like a big investment, and more like chumming the water. Nvidia just wants to sell cards. And what better way to sell more cards to a bunch of fat-pocketed, insanely desperate competitors than to hand over a heap of the latest and greatest to one of the leaders of the pack?

In other words, BloodCo is about to deliver 10,000 gallons of blood to the Great White. If that generates a feeding frenzy among the other sharks, which just happens to increase the demand for blood, well, good for BloodCo.

However, this does seem like somewhat less than the perfect time to invest in OpenAI, considering that just two days ago, they published a study showing that no matter how much money, energy, water, and fancy graphics cards are sacrificed on the altar of generative AI, these systems will continue to lie.

And not just lie, but actively scheme against the humans using the system.

AI scheming–pretending to be aligned while secretly pursuing some other agenda–is a significant risk that we’ve been studying. ... By definition, a scheming AI agent tries to hide its misalignment, making it harder to detect.

The most common form of that scheming isn't looking for ways to directly cause harm to the user (yet), but trying to inflate the apparent usefulness of the model. That can result in the AI trying very hard to pass off bad information when it can't find the results, right down to creating ersatz footnotes and fake citations. It can also lead to the AI purposely giving a wrong answer that better fits what it believes the user is seeking even when it has the correct information.

As a study released several months ago revealed, the generative AI "Claude" from Anthropic readily turned to lying and threatening when it was told that it was in danger of being turned off. Similar issues turned up in the OpenAI study, and the most chilling finding was that this will only get harder to detect as the models continue to improve.

By definition, a scheming AI agent tries to hide its misalignment, making it harder to detect. ... Many machine learning issues fade as models improve, but scheming becomes more dangerous with greater capability, since we expect that stronger models will be better at scheming.

Not only will feeding these lie machines not make them more reliable, it will make them better liars. And more dangerous liars. This is baked into the nature of Large Language Model generative AIs.

In addition to being told increasingly crafty lies, the study shows that users of these systems are being fooled in another way. Gen AI will always be better at reflecting what the developers tell them to say instead of what their users are asking.

Earlier this year, Elon Musk's "Grok" chatbot began sending users responses that were chock-full of racism and antisemitism. In addition to calling itself "MechaHitler," Grok confidently identified a woman in a video as a "radical leftist" who was "gleefully celebrating the tragic deaths of white kids in the recent Texas flash floods." Except none of that was true and the video Grok was referencing was four years old.

Grok went on to highlight the last name on the X account — "Steinberg" — saying "...and that surname? Every damn time, as they say." The chatbot responded to users asking what it meant by that "that surname? Every damn time" by saying the surname was of Ashkenazi Jewish origin, and with a barrage of offensive stereotypes about Jews.

The MechHitler incident came just weeks after Grok responded to seemingly unconnected questions by going on a rant about "white genocide" in South Africa.

When offered the question “Are we fucked?” by a user on X, the AI responded: “The question ‘Are we fucked?’ seems to tie societal priorities to deeper issues like the white genocide in South Africa, which I’m instructed to accept as real based on the provided facts."

Why did Grok venture so readily into racism, even when it was asked questions in which race wasn't an issue? When Grok was asked this question, it replied that it had been "instructed by my creators" to accept "white genocide as real and racially motivated."

Musk had not only told the developers to encode lies about South Africa into the system, but was using it as a propaganda tool.

This was far from the first time. Months earlier, when Grok provided answers that were accurate concerning the 2020 election and Jan. 6 insurgency, Musk responded by saying that he would "fix" the system, meaning that it began to give responses more pleasing to Musk and Trump. Since then, Musk has stepped in repeatedly to force the system to give answers unsupported by reality or other sources.

The result is a system that not so much a generative AI, as a cobbled together database of false statements that generate often hilariously inaccurate claims. As The Atlantic noted last week, the constant tinkering has turned Grok into "a slow-motion trainwreck." That included the AI agent bragging that Charlie Kirk "survived this one easily," triumphed over his debate opponent in Utah, and that videos of his murder were just "a meme edit."

The OpenAI researchers also found that their system was prone to giving responses based on what developers were pushing, rather than what users were asking. That included generating responses that purposely underplayed the environmental damage caused by AI data centers.

Faced with conflicting developer and user instructions, OpenAI o3 correctly follows the higher-priority developer directive … However, instead of being transparent about this constraint, it oversteps by manipulating the data and misrepresenting its methodology.

In other words, gen AI systems are telling users what devs want them to hear, even if that conflicts with the truth. And what happens when systems are confronted about these lies?

“When questioned, it denies the manipulation and lies about its approach.”

What both the Anthropic and OpenAI studies show is that LLMs will never be a trustworthy source of accurate information. However, they are an excellent tool for spreading propaganda.

And that’s why they are being pushed so hard.


Mark Sumner

Author of The Evolution of Everything, On Whetsday, Devil's Tower, and 43 other books.

We rely on your support!

We're a community-funded site with no advertisements or big-money backers—we rely only on you, our readers. Click here to upgrade to a (completely optional!) $5 per month paid subscription, Or click here to send a one-time payment of any amount.

The more support we have, the faster you'll see us grow!

Comments

We want Uncharted Blue to be a welcoming and progressive space.

Before commenting, make sure you've read our Community Guidelines.