Is the AI control problem insoluble?

The Cosmopolicast

0:00

-47:20

Is the AI control problem insoluble?

A conversation with Roman Yampolskiy

Jun 04, 2023

“ … unfortunately we show that the AI control problem is not solvable and the best we can hope for is safer AI, but ultimately not 100 percent safe AI, which is not a sufficient level of safety in the domain of existential risk as it pertains to humanity.”

Roman Yampolskiy is a computer scientist and professor at the Speed School of Engineering at the University of Louisville who works on genetic algorithms, neural networks, artificial intelligence, and the alignment problem.

Our conversation surprised me for two reasons. First, he’s the only researcher to whom I’ve spoken who argues that GPT4 is conscious. Second—much more gravely—he believes that we’ll not only fail to solve the control problem before we build a dangerously intelligent AI, but that the problem is inherently and formally insoluble.

As I’ve reflected on control problem this week, I’ve had the growing and uneasy suspicion that this must be so. That said, had you told me five years ago what Large Language Models would be doing in 2023, I would have said that was impossible, too. My intuitions about which problems in AI engineering are soluble aren’t trustworthy. It takes years of working on problems like these to develop good intuitions, and I haven’t done that.

That can’t be said of Roman, however. This is his life’s work. We have every good reason to take his intuitions seriously. So when I heard him say that, my heart sank. He may be wrong, he tried to reassure me. He hopes he is. I hope he is. But he doesn’t think he is.

Our conversation was perfectly calm, as you’ll hear, but that’s because I just can’t bring myself to believe in any of this, despite the evidence. This exchange and its implications seem no more real to me than a thought experiment in a graduate philosophy seminar or a science fiction movie. That I feel this way shows that awareness of one’s cognitive biases is no proof against them. In reality, it’s neither a thought experiment nor a movie; it’s perfectly plausible that he’s right, and if so, we’re in indescribably big trouble.

As hard as it is to take this in, we have to, because this hasn’t happened yet. It may be difficult to stop it at this point, but at least it’s not formally impossible. Once it happens? Too late.

So it’s worth thinking about now.

Nothing should be taken off the table and limited moratoriums and even partial bans on certain types of AI technology should be considered. “The possibility of creating a superintelligent machine that is ethically inadequate should be treated like a bomb that could destroy our planet. Even just planning to construct such a device is effectively conspiring to commit a crime against humanity.” Finally, just like the incompleteness results did not reduce the efforts of the mathematical community or render it irrelevant, the limited results reported in this paper should not serve as an excuse for AI safety researchers to give up and surrender. Rather, it is a reason for more people to dig deeper, and to increase effort and funding for AI safety and security research. We may not ever get to 100 percent safe AI but we can make AI safer in proportion to our efforts, which is a lot better than doing nothing.

It is only for a few years right before AGI is created that a single person has a chance to influence the development of superintelligence, and by extension the forever future of the whole world. This is not the case for billions of years from Big Bang until that moment and it is never an option again. Given the total lifespan of the universe, the chance that one will exist exactly in this narrow moment of maximum impact is infinitely small, yet here we are. We need to use this opportunity wisely.—Roman Yampolskiy.

The unprecedented progress in artificial intelligence over the last decade came alongside multiple AI failures and cases of dual use, causing a realization that it is not sufficient to create highly capable machines, but that it is even more important to make sure that intelligent machines are beneficial for humanity. This led to the birth of the new sub-field of research commonly known as AI safety and security with hundreds of papers and books published annually on the different aspects of the problem.
All such research is done under the assumption that the problem of controlling highly capable intelligent machines is solvable, which has not been established by any rigorous means. However, it is a standard practice in computer science to first show that a problem doesn’t belong to a class of unsolvable problems before investing resources into trying to solve it or deciding what approaches to try. Unfortunately, to the best of our knowledge no mathematical proof or even rigorous argumentation has been published demonstrating that the AI control problem may be solvable, even in principle, much less in practice. …
Yudkowsky considers the possibility that the control problem is not solvable, but correctly insists that we should study the problem in great detail before accepting such a grave limitation. He writes:
“One common reaction I encounter is for people to immediately declare that Friendly AI is an impossibility, because any sufficiently powerful AI will be able to modify its own source code to break any constraints placed upon it… But one ought to think about a challenge, and study it in the best available technical detail, before declaring it impossible—especially if great stakes depend upon the answer. It is disrespectful to human ingenuity to declare a challenge unsolvable without taking a close look and exercising creativity. It is an enormously strong statement to say that you cannot do a thing—that you cannot build a heavier-than-air flying machine, that you cannot get useful energy from nuclear reactions, that you cannot fly to the Moon. Such statements are universal generalizations, quantified over every single approach that anyone ever has or ever will think up for solving the problem. It only takes a single counterexample to falsify a universal quantifier. The statement that Friendly (or friendly) AI is theoretically impossible, dares to quantify over every possible mind design and every possible optimization process —including human beings, who are also minds, some of whom are nice and wish they were nicer. At this point there are any number of vaguely plausible reasons why Friendly AI might be humanly impossible, and it is still more likely that the problem is solvable but no one will get around to solving it in time. But one should not so quickly write off the challenge, especially considering the stakes.”
Yudkowsky further clarifies meaning of the word impossible:
“I realized that the word ‘impossible’ had two usages:
Mathematical proof of impossibility conditional on specified axioms.
‘I can’t see any way to do that.’
Needless to say, all my own uses of the word ‘impossible’ had been of the second type.”
In this paper we attempt to shift our attention to the impossibility of the first type, provide rigorous analysis and argumentation and where possible mathematical proofs, but unfortunately we show that the AI control problem is not solvable and the best we can hope for is safer AI, but ultimately not 100 percent safe AI, which is not a sufficient level of safety in the domain of existential risk as it pertains to humanity.

Detecting qualia in natural and artificial agents:

In this paper, we described a reductionist theory for appearance of qualia in agents based on a fully materialistic explanation for subjective states of mind, an attempt at a solution to the Hard Problem of consciousness. We defined a test for detecting experiences and showed how computers can be made conscious in terms of having qualia. Finally, we looked at implications of being able to detect and generate qualia in artificial intelligence. Should our test indicate presence of complex qualia in software or animals certain protections and rights would be appropriate to grant to such agents. …
… There seems to be a fundamental connection between intelligence, consciousness and liveliness beyond the fact that all three are notoriously difficult to define. We believe that ability to experience is directly proportionate to one’s intelligence and that such intelligent and conscious agents are necessarily alive to the same degree. As all three come in degrees, it is likely that they have gradually evolved together. Modern narrow AIs are very low in general intelligence and so are also very low in their ability to experience or their perceived liveness. Higher primates have significant (but not complete) general intelligence and so can experience complex stimuli and are very much alive. Future machines will be superintelligent, superconscious and by extension alive!

Is the AI control problem insoluble?

Further reading

Discussion about this episode