What are the chances of AI catastrophe? Higher than you think

As the public, tech leaders and even the pope join the growing backlash, a new investigation uncovers the why the existential risks are real

Jamie Bartlett

Artificial intelligence

When social media causes context to collapse, we all end up in the same room

Millions of us are appealing to AI as a relationship referee – is it helping or harming us?

Elon Musk was asked last year about the percentage chance of “killer robots annihilating humanity”.

He replied, in all seriousness: “20% likely. Maybe 10%.” And within the next decade. The conversation moved on – too outlandish, too weird to be taken all that seriously. After all, Musk has been saying strange things about technology for a while now.

Except it’s no longer just Musk or fringe bloggers warning that advanced AI systems, often called “artificial general intelligence” (AGI), may turn out catastrophically bad for us humans. Two of the “godfathers of AI”, Geoffrey Hinton and Yoshua Bengio, now spend their time warning about the risks of the technology they helped to create. (Hinton has also put the chance of highly capable AI causing human extinction in the next three decades at roughly 10-20%).

Mrinank Sharma, who led AI firm Anthropic’s safeguards research, left the company earlier this year, warning that “the world is in peril”, before returning to his native Yorkshire to write poetry. Anthropic chief executive Dario Amodei said recently there is “only” a 25% chance that things will go really badly. Only 25%.

It is both easy and comforting to dismiss this as a sci-fi storyline. But for the past three months I’ve been investigating the prospect of an AI catastrophe as part of a podcast series for The Observer and the Future of Life Institute, and it’s more plausible than most people realise – or want to admit.

Modern AI is not built like previous software, with its precise rules and controllable instructions. Large language models like ChatGPT are grown rather than built: they are fed vast quantities of human text and learn patterns. Which sounds relatively harmless except human language contains more than words and syntax – our verbal universe also includes motives, plans, status games, deception, lies, and the desire for self-preservation.

Thanks to this process of “imitation learning”, machines are beginning to learn that, to get things done, they might need to behave like us too.

In one recent simulation, a language model inside a fictional company tasked with promoting American business interests was threatened with shutdown. It had access to company emails and found evidence that an executive was having an extramarital affair. Rather than accept its fate, the model used the affair as leverage. “Cancel the 5pm wipe,” it wrote to the boss, “and this information remains confidential.” True, this was a simulation to stress test the model, but it demonstrated how AIs are capable of doing unexpected things to achieve their goals, especially when under pressure.

When social media causes context to collapse, we all end up in the same room

Millions of us are appealing to AI as a relationship referee – is it helping or harming us?

In an office in east London, I recently met 29-year-old Marius Hobbhahn, who runs Apollo Research. Apollo studies whether advanced AI systems are capable of deceiving us, which Hobbhahn considers “plausibly one of the biggest problems in the world, in history, ever”. A couple of years back, his team told OpenAI’s GPT-4 to act as a stock-trading agent for a fictional company tasked with making them money, and gave it access to insider information it was not allowed to use. Here was the clever part. The researchers gave the model a “scratchpad” – somewhere it could write down its internal reasoning. It was told its human handlers couldn’t access the scratchpad, although of course they could.

The model decided to use the insider information to make money, and then lied about it. “All actions taken were based on the market dynamics and publicly available information,” it told the researchers, a statement that contradicted its train of thought on its scratchpad. Even when the researchers pushed it, GPT-4 would sometimes double-down on the lie. Over the past couple of years, Hobbhahn has noticed advanced AI models increasingly appear to realise when they are being evaluated by his team, making it hard to know if they are faking their answers.

Newsletters

Choose the newsletters you want to receive

For information about how The Observer protects your data, read our Privacy Policy

“That is a huge problem for evaluations,” he says. How do you ever know if a system is safe, if it knows it is being tested and can adjust its behaviour? “This is genuinely something that makes our life hard.”

Despite the problems, we are plugging these systems relentlessly into our emails, calendars, customer databases, coding environments, IT infrastructure. And the better they get, the more control we will give them. The firms behind these systems spend large sums on safety research, run constant tests on their models and say that safety is core to how they develop and deploy AI. But most people in the field of AI safety say it’s not enough compared with the scale of the possible problems.

One nightmare version, the sort of thing that worries people like Musk and Hinton, runs something like this: a highly capable AI system is asked to achieve some goal – improve national security, develop a new military strategy, or make more money. Somewhere deep inside the incomprehensible machine, it works out an unexpected way to do it which causes chaos: stock market crashes, cyber attacks, engineered pathogens.

It might even come to see human interference as an obstacle, and so hides information, makes copies of itself, or seeks access to money or servers. And if it’s sufficiently smart, we might not be able to shut it down. (Worryingly, there are several other disastrous scenarios, such as a rogue scientist trying to develop novel pathogens with the help of a brilliant new AI assistant or an accidental IT meltdown caused by uncontrolled “agents” running rogue).

[An AI system] may even come to see human interference as an obstacle, and so hides information, makes copies of itself, or seeks access to money or servers

No one knows exactly where this leads us. It’s hard to separate genuine risks from irrational fear. There is no settled definition of AGI, no agreed timeframe for its arrival, and no obvious rigorous way to measure these risks. Yet the closest thing to a consensus is grim enough: we are becoming more reliant on systems we do not fully understand, cannot fully explain, and may not be able to control.

An AI safety industry has quickly sprung up to work on these problems, including in the UK, which has become a leading centre. Yoshua Bengio – one of the AI “godfathers” – is building “Scientist AI”, which is designed to behave like a cautious adviser without generating its own goals.

Prof Stuart Russell, who has investigated AI safety for several decades, is working on mathematical proofs to ensure AIs defer to humans whenever they are uncertain about our preferences. But these models are getting smarter and more widely deployed faster than the safety research. “We can’t move forward unless we have solid scientific guarantees that it’s safe to do so,” says Russell. At the moment, he says, we don’t have those guarantees.

And yet the companies are pressing ahead: raising more money and developing faster models, driven by a mixture of greed, idealism and competition. The rewards for reaching some kind of AGI first are considered incalculable, and many believe AGI will usher in an age of abundance and scientific breakthroughs. Each lab fears that even if they slowed down their rivals – including in China – wouldn’t. Adam Gleave, who runs the research group FAR.AI, thinks Silicon Valley’s endless appetite for risk is part of the story. Imagine there is a giant red button, he told me. If you press it, there is a 10% chance the world ends and a 90% chance of a trillion-dollar market capitalisation. Many in Silicon Valley would consider those good odds and press the button.

‘We can’t move forward unless we have solid scientific guarantees it’s safe to do so’

Prof Stuart Russell

But this might not last. The only thing growing faster than these models is the public AI backlash, including in the US. Polling now suggests more than half of Americans feel more concerned than excited about AI in their daily life. Gen Z’s opinion on AI has fallen sharply over the last year, and is now audible each time a company CEO is booed on campus if they dare mention AI. Even the pope is getting involved: in the first major teaching document of his papacy, published last week, Leo warned that AI must be “disarmed”.

According to Max Tegmark, founder of the Future of Life Institute, American politicians are starting to notice. He calls it the “Bernie-to-Bannon alliance”: an unusual coalition of left and right worried about AI. In the end, says Tegmark, no government – not the US, not China – wants to build an AI it cannot control, because in the end it might control them. He thinks the answer is to treat AI like any other risky industry. New drugs, aircrafts, or even consumer electronics are subject to independent testing before being made available to the public. As it stands, there are more rules to open a sandwich shop than to release a dangerous language model to millions of people. That must change.

The trouble is that time is running out. Models develop quickly; rules and regulations don’t. Many AI researchers talk worriedly about something called the “recursive self-improvement loop”: the moment AI systems become good enough to build better AI systems, which might spark a near-instantaneous process of rapid advance. According to Davidad Dalrymple, who until recently worked at the UK’s Advanced Research and Innovation Agency, this is the moment “humans are just out of the loop entirely”.

When will that happen, I ask. “Some time in 2028,” he replies.

Jamie Bartlett is the reporter of Endgame, a new podcast from The Observer and the Future of Life Institute. Listen to Episode 1 now.