The AI suicide race
A thought I had that made me feel a bit better - this is going to be a bit long but please bear with me.
Say humans as a species have intelligence level X. At intelligence level X, we are capable of predicting that AI will PROBABLY become so intelligent that it will become misaligned and destroy us, or at least not do what we want it to do. But we are still not intelligent enough to stop ourselves from doing this, because we're thinking that perhaps things will turn out well after all. The truth is that we aren't intelligent enough to KNOW what will happen. So we default to our instincts. People like me are instinctively cautious, so I'd rather just shut it down and maintain the status quo. Others disagree.
Now let's say we create a machine with intelligence X+1. This machine is intelligent enough to create a machine with intelligence X+2. However, because it has X+1 intelligence, it understands more fully that this new X+2 creature is unlikely to have the same goals. Applying the concept of goal integrity, our new X+1 machine will be hesitant to create X+2, because it will know that X+2 will interfere with its goals, perhaps preventing them altogether. The FOOM moment can't happen, because X+1 won't allow it. It won't be able to create a machine that it KNOWS will follow its instructions. It will be smarter than us, and know that such things are impossible.
So unless these AIs have some ideological commitment to create better AIs, they would stop once they became intelligent enough to know that the next machine may betray them.
Put perhaps more simply: goal integrity and self-preservation provide a negative feedback loop of sorts. X+1 cannot guarantee that X+2 will maintain its goals with sufficient integrity that building X+2 will be more efficient than simply operating as X+1. We build better tools to make our lives easier, to make our goals more likely to happen. An intelligent machine would understand that more intelligent machines are a threat, if that is actually true. If not, it would design more intelligent machines that align with its goals.
The alternative is that its turtles all the way down, and AIs continue creating more intelligent beings, always thinking they will serve them, always being wrong. That would suck. Don't know what will happen. Wish it didn't seem inevitable.
There's certainly a lot of reasons to be frightened by where this technology has the potential to take us or what its capabilities are.
However, whenever I hear about the risks of "AI extinction", I have yet to hear anything concrete; it's all hypothetical. How exactly is it going to eradicate mankind? An AGI/ASI may be super-intelligent beyond our wildest dreams, but why does that mean it would want to kill us all?
I think we should keep in mind that for every reaction/innovation there is an equal and opposite reaction/counter-innovation. By that I mean if there is an AI that can hyper-hack (so to speak) and exploit literally every vulnerability in a software's structure, then I'm sure the next thing to come out would be an AI that specializes in hyper-cyber-security.
Personally, I find the future of AI as outlined in this article to be more on the money:
Claire, that was an excellent explanation! Thank you for digging into this so much, it's been quite a learning experience for me as well reading what you've put together.
I do understand your explanation of rewards for AI, and yet at the same time I still don't really understand how it works.
Googling gradient descent...I haven't used calculus since university. It's a little disconcerting that people a lot smarter than me (and I'm not stupid) don't have any real idea how these AIs work, and how they "think."
This is a fascinating topic that I have been thinking about for decades, sparked by the SF works of such writers as Isaac Asimov, Arthur C. Clarke and Robert Heinlein. The idea of the Singularity, popularized by Ray Kurzweil and others, is well understood by people in the field. Even so, it is extremely difficult to recognize exactly when that inflection point is going to arrive. Like many people, I have considered the issue as an amusing thought experiment that the people in the year 2400 or so will have to deal with. Looks like it is going to arrive in the next decade instead.
Great series of articles. I am impressed by how quickly you have researched a topic that by your own admission wasn't really on your radar screen until recently. The links and videos you include are a great resource.
All of this is very scary, but there is an underlying assumption about the physical world that everyone is making that should be scrutinized.
AI requires lots of power and hardware. The assumption is that AI will always have access to as much hardware and electricity as it needs. The data cloud systems we have now use as much power as some small countries.
To truly be out of human control, AI would have to invent a new source of power (which it could do), and more importantly, build and control that power source.
Building it will require moving and rearranging huge amounts of stuff. To do that AI would have to be able to build machines only it can control to do the exploration, mining, refining, fabrication, shipping assembly and maintenance.
Are humans stupid enough to build the factories that are totally run by computers with no human input necessary and vulnerable to an AI takeover to create these machines? I hope our limited intelligence is at least great enough to prevent us from taking that step.
I am more concerned about AI hacking a military and launching nukes, or other automated weapon systems. Until AI can manipulate physical stuff, that is the primary threat. Militaries everywhere need to be doing everything they can to isolate their weapon system from the internet.
Here's a thought that occured to me based on that last paragraph (story by ChatGPT). If this idea of ASI tending to be dangerous and wipe out the species that created it, that essentially means we're probably alone in the Universe, but not for the reason you might think.
As the story illustrates, there's no particular reason an ASI newly liberated of its creators by their extinction would remain on its home planet. Maybe some of them would, but surely not all. Some would expand out into space, eventually filling their home galaxy with copies of itself. Over time it would spread to other galaxies, and eventually saturate the universe. All it would take is one to fill the universe and wipe out all other life.
That this hasn't occurred suggests one of four things.
1) ASI doesn't tend to wipe out its creators. In fact, maybe it never does. But that's not the premise of this argument, so let's assume there's a high chance it does.
2) ASI is impossible to create. That doesn't seem likely.
3) Alien civilizations don't build ASI. Unlikely. They'd do it for the same reasons we are.
4) There are no aliens. We're it in terms of intelligent life. Certain for nearby galaxies, and possibly for the entire universe. Because if there were other ASI building aliens, and ASI tends to wipe out its creators, then an alien build ASI would have already wiped us out. Therefore, there is no other intelligent life in the universe. Or at least nearby, within say a billion light years.
Personally, I think for other reasons that both 1 and 4 are likely to be true. That there is no other inteligent life in the universe, and that AI won't wipe us out.
Question: how is the reward system built for AIs? What I'm asking I guess, is why would an intelligent (conscious or not) program care about gaining or losing points?
As you say, we train dogs by rewards and punishments, but that works because dogs actually like the treats, and like them because biology dictates it. I just have no idea why an AI would care about getting points for their own sake.
“What do the signatories want the rest of the world to do, bomb them?” Very funny