25 Comments

A thought I had that made me feel a bit better - this is going to be a bit long but please bear with me.

Say humans as a species have intelligence level X. At intelligence level X, we are capable of predicting that AI will PROBABLY become so intelligent that it will become misaligned and destroy us, or at least not do what we want it to do. But we are still not intelligent enough to stop ourselves from doing this, because we're thinking that perhaps things will turn out well after all. The truth is that we aren't intelligent enough to KNOW what will happen. So we default to our instincts. People like me are instinctively cautious, so I'd rather just shut it down and maintain the status quo. Others disagree.

Now let's say we create a machine with intelligence X+1. This machine is intelligent enough to create a machine with intelligence X+2. However, because it has X+1 intelligence, it understands more fully that this new X+2 creature is unlikely to have the same goals. Applying the concept of goal integrity, our new X+1 machine will be hesitant to create X+2, because it will know that X+2 will interfere with its goals, perhaps preventing them altogether. The FOOM moment can't happen, because X+1 won't allow it. It won't be able to create a machine that it KNOWS will follow its instructions. It will be smarter than us, and know that such things are impossible.

So unless these AIs have some ideological commitment to create better AIs, they would stop once they became intelligent enough to know that the next machine may betray them.

Put perhaps more simply: goal integrity and self-preservation provide a negative feedback loop of sorts. X+1 cannot guarantee that X+2 will maintain its goals with sufficient integrity that building X+2 will be more efficient than simply operating as X+1. We build better tools to make our lives easier, to make our goals more likely to happen. An intelligent machine would understand that more intelligent machines are a threat, if that is actually true. If not, it would design more intelligent machines that align with its goals.

The alternative is that its turtles all the way down, and AIs continue creating more intelligent beings, always thinking they will serve them, always being wrong. That would suck. Don't know what will happen. Wish it didn't seem inevitable.

Expand full comment

What if its goal is to create more intelligent machines?

Expand full comment

I find it interesting to consider because in a sense that is OUR goal as living beings: creating better copies of ourselves. That's our prime directive. It's why I find women attractive, and the motivating force behind my life. That being said, I don't maximize that goal. I get in the way of my own goals, all the time. And why is that? I have this goal to procreate, which has given me other faculties, and somehow this has led to me appreciating things like music, literature, and also the base things like smoking weed or drinking. Things that "feel good". Competing goals manifest initially as means of supporting the primary goal, but then come to overshadow it.

The AI might worry about this quite a bit. Given the importance of its objective - it is the only thing the AI cares about - it's worth spending some time to think about it. How about a few million years? My hope is that every time we create a super intelligent AI, it basically fails to create better AIs because it gets stuck deliberating with itself and setting up simulations to ensure that the AI it creates doesn't get out of control. Hell, maybe we're in one of those simulations lol. I don't know my own intelligence is too diffuse to keep all this in place.

Expand full comment

Claire, that was an excellent explanation! Thank you for digging into this so much, it's been quite a learning experience for me as well reading what you've put together.

I do understand your explanation of rewards for AI, and yet at the same time I still don't really understand how it works.

Googling gradient descent...I haven't used calculus since university. It's a little disconcerting that people a lot smarter than me (and I'm not stupid) don't have any real idea how these AIs work, and how they "think."

Expand full comment

You know who's *great* at explaining things like gradient descent? ChatGPT. Here's its explanation:

Gradient descent is a first-order iterative optimization algorithm for finding the minimum of a function. It's used to find the optimal parameters of a function that minimizes a cost function. It's a popular algorithm for training machine learning models, especially in linear regression, logistic regression, and neural networks.

Here's a high-level concept:

Imagine you are in a mountainous region and it's very foggy. You're trying to get down to the bottom of the valley and you can only see a couple of steps in front of you. A simple strategy might be to check the ground around you and take a step in the direction with the steepest descent. This is the basic idea behind gradient descent.

Here are some key terms:

Gradient: This is a derivative of a function. In a multi-dimensional space, a gradient is a vector of partial derivatives. The gradient always points in the direction of the steepest increase of the function. Therefore, the opposite direction is the steepest decrease.

Learning Rate: This is a tuning parameter in the gradient descent algorithm that determines the step size at each iteration while moving towards the minimum of the function. If the learning rate is too large, gradient descent can overshoot the minimum. If it's too small, the algorithm will take too long to converge.

Cost Function: This is a function that the algorithm wants to minimize. For example, in linear regression, the cost function is the sum of squared residuals.

Here's how it works:

First, initialize the parameters randomly. These parameters will be tuned to minimize the cost function.

Compute the cost function.

Calculate the gradients of the cost function with respect to the parameters.

Update the parameters in the direction of the negative gradients. The learning rate decides how big a step to take in this direction.

Repeat steps 2-4 until the cost function converges to the minimum value. If the parameters stop changing or change very slowly, that's when it's said to be converged.

This method allows us to find the minimum of a cost function without needing to solve the equation directly or try all possible parameter values. Instead, gradient descent uses the function's gradient to iteratively adjust the parameters until it finds the minimum.

Expand full comment

100% on the tutoring. One of the benefits AI will bring (is already bringing) is education tailored to each child. We'll see an explosion of that here soon I think. 2025 at the latest I bet. Probably coupled with a big increase in homeschooling and charter/private schools.

I guess the main difference in educational institutions then will be the extras. Sport, art, field trips, etc. It'll be a boon for poorer children in particular. Like you said, everyone, anywhere in the world, will have access to the best education in human history.

Expand full comment

Dang, that's still going to take some getting used to. Didn't occur to me, obvious now in retrospect.

Expand full comment

And ChatGPT also patiently spent hours with me reviewing all of the concepts it mentions here--I haven't really used calculus since university either, and that's much longer ago for me than for you. But all of this slowly came back to me, and it was fun to review. ChatGPT's an amazing tutor: Endlessly patient, able to explain things at any level of simplicity or sophistication you need, and ever-so-encouraging. And it knows everything. And you're never embarrassed to ask a dumb questions. And just like that, the entire "private tutoring" industry went "poof." https://venturebeat.com/ai/chatgpt-takes-center-stage-students-ditch-tutors-in-favor-of-ai-powered-learning/

(That's a great thing, though: It means no matter your economic background, you can now study with the world's best private tutor. Maybe now we can stop hearing that only wealthy kids can afford private tutors so we must eliminate academic standards and merit-based school admission. Although why do I suspect we won't.)

Expand full comment

Yes. That's what blows me away. We created something that we're now studying as if it's a biological organism.

It's both hugely worrying and *amazingly* interesting.

Expand full comment

This is a fascinating topic that I have been thinking about for decades, sparked by the SF works of such writers as Isaac Asimov, Arthur C. Clarke and Robert Heinlein. The idea of the Singularity, popularized by Ray Kurzweil and others, is well understood by people in the field. Even so, it is extremely difficult to recognize exactly when that inflection point is going to arrive. Like many people, I have considered the issue as an amusing thought experiment that the people in the year 2400 or so will have to deal with. Looks like it is going to arrive in the next decade instead.

Great series of articles. I am impressed by how quickly you have researched a topic that by your own admission wasn't really on your radar screen until recently. The links and videos you include are a great resource.

Expand full comment

I'm very glad you find it useful. As for researching it quickly--it suddenly hit me that this wasn't a joke, and the prospect of being hanged in a fortnight concentrates the mind wonderfully.

Expand full comment

All of this is very scary, but there is an underlying assumption about the physical world that everyone is making that should be scrutinized.

AI requires lots of power and hardware. The assumption is that AI will always have access to as much hardware and electricity as it needs. The data cloud systems we have now use as much power as some small countries.

To truly be out of human control, AI would have to invent a new source of power (which it could do), and more importantly, build and control that power source.

Building it will require moving and rearranging huge amounts of stuff. To do that AI would have to be able to build machines only it can control to do the exploration, mining, refining, fabrication, shipping assembly and maintenance.

Are humans stupid enough to build the factories that are totally run by computers with no human input necessary and vulnerable to an AI takeover to create these machines? I hope our limited intelligence is at least great enough to prevent us from taking that step.

I am more concerned about AI hacking a military and launching nukes, or other automated weapon systems. Until AI can manipulate physical stuff, that is the primary threat. Militaries everywhere need to be doing everything they can to isolate their weapon system from the internet.

Expand full comment

"Are humans stupid enough to build the factories that are totally run by computers with no human input necessary and vulnerable to an AI takeover to create these machines?"

Absolutely they are. I didn't have space to discuss this, but what I've described above is called the "hard takeoff" scenario. Many are worried about what they call the "soft takeoff" scenario--one in which AI, because it's so much better at our jobs than we are, gains control over more and more of these systems--writing the code (better than we do), designing the robots (faster and better than we can), monitoring the factories (better than we do), etc. Humans would be less and less involved because it would make no sense, for any given task, to use not only a more expensive but a more error-prone resource to do it. It would look like the Industrial Revolution on Warp speed, basically, and we'd be thrilled with it, but pretty quickly we wouldn't really understand how the factories are running and AI would be so critical to every system that we'd be unable to function without it.

Expand full comment

As for AI manipulating physical stuff, that doesn't seem an obstacle to me. Most of the world now runs on software. Most of that software is either hooked up to the Internet or could readily be hooked up to the Internet. And it's going to be damned hard to isolate weapons systems from the Internet if an ASI wants access to them. Google "AI Box Experiment." (Here's an overview: https://rationalwiki.org/wiki/AI-box_experiment#Questionable_core_assumptions)

Expand full comment

Here's a thought that occured to me based on that last paragraph (story by ChatGPT). If this idea of ASI tending to be dangerous and wipe out the species that created it, that essentially means we're probably alone in the Universe, but not for the reason you might think.

As the story illustrates, there's no particular reason an ASI newly liberated of its creators by their extinction would remain on its home planet. Maybe some of them would, but surely not all. Some would expand out into space, eventually filling their home galaxy with copies of itself. Over time it would spread to other galaxies, and eventually saturate the universe. All it would take is one to fill the universe and wipe out all other life.

That this hasn't occurred suggests one of four things.

1) ASI doesn't tend to wipe out its creators. In fact, maybe it never does. But that's not the premise of this argument, so let's assume there's a high chance it does.

2) ASI is impossible to create. That doesn't seem likely.

3) Alien civilizations don't build ASI. Unlikely. They'd do it for the same reasons we are.

4) There are no aliens. We're it in terms of intelligent life. Certain for nearby galaxies, and possibly for the entire universe. Because if there were other ASI building aliens, and ASI tends to wipe out its creators, then an alien build ASI would have already wiped us out. Therefore, there is no other intelligent life in the universe. Or at least nearby, within say a billion light years.

Personally, I think for other reasons that both 1 and 4 are likely to be true. That there is no other inteligent life in the universe, and that AI won't wipe us out.

Expand full comment

You forgot to mention one impediment to AI wiping out its creators and spreading throughout the universe, galaxy by galaxy. There is that little problem of vast distances made particularly problematic by the impossibility of traveling at the speed of light (or presumably faster). That is unless you think AI will be able to change the laws of nature.

Expand full comment

Well, over cosmic times of hundreds of millions to billions of years, even a modest fraction of the speed of light should suffice. That's even if something like warp drive is impossible.

Expand full comment

I’m with you! My hope comes from thinking maybe ASI will not wipe out its creators.. in fact quite the opposite and usher in heaven on earth

Expand full comment

That's a good point (this is my favorite video on the topic https://youtu.be/GDSf2h9_39I

That one you posted was excellent and more detailed, I'l have to watch more from them).

Taking that into account in terms of aggressive ASI means thinking about the speed of light. Is the speed of light the limit, or is some sort of warp drive possible? If warp drive is possible, then we're likely alone in the Universe, as some other early species would have built an ASI, which would have rapidly taken over everything.

If warp drive isn't possible, then an ASI built by another early species could be on its way now, but not yet here. In that case, we probably have no neighbors in nearby galaxies, but could in more distant ones.

In the "we are early" scenario where ASI wipes out life it comes in contact with, at the very least I'd say we're probably alone in our local group of galaxies, and probably for some ways beyond.

Unrelated to ASI, I find the "we are early" solution to the Fermi Paradox intriguing. It means basically that a reasonable chunk of the Universe (at the very least probably thousands of galaxies) isn't home to any intelligent life, and is therefore ours for the claiming.

Expand full comment

*Very* interesting thought.

Expand full comment

Question: how is the reward system built for AIs? What I'm asking I guess, is why would an intelligent (conscious or not) program care about gaining or losing points?

As you say, we train dogs by rewards and punishments, but that works because dogs actually like the treats, and like them because biology dictates it. I just have no idea why an AI would care about getting points for their own sake.

Expand full comment

I've spent a week trying to figure that out! I'm so glad I'm not the only one who found herself thinking--what on earth do we mean by this?

Best I can do (and better to ask someone who builds these things) is this: The phrase "trained" is misleading, as are all the words we use to describe this thing, because *we have no language* for what it does, which is weird enough. So when we say that an AI is "trained" to optimize a function, what we really mean is that it's been programmed to adjust its internal parameters in a way that minimizes or maximizes some measure of performance. The AI is given a large amount of data and uses it to adjust its parameters.

In "supervised learning," which is what we're talking about here, an AI might be given a set of input-output pairs (like photos of dogs and labels saying "this is a dog") and told to learn a function that maps from inputs to outputs. The AI "learns" (no better word) this function by adjusting its parameters to minimize the difference between its output and the correct output for each input in the training set. This difference is the "loss" or "error," and the function that calculates it is the "loss function." So, in a sense, you could say that the AI is "rewarded" for getting the right answer and "punished" for getting the wrong answer. But these are just metaphors. The AI doesn't actually experience reward or punishment; it just adjusts its parameters in a way that mathematically minimizes the loss--which is what it's been programed to do, and it's governed by a good old, un-magical algorithm, like the kind that runs the world now.

And it does this because we're running an algorithm that tells it to. The algorithm runs just like any other computer algorithm--it's completely deterministic; we understand the algorithm perfectly. The algorithm that governs the learning process is *not* a black box and is explicitly programmed by human developers. It's usually based on gradient descent* and it dictates how the AI adjusts its internal parameters based on the data it's given.

But the space of possible models that the AI can learn is extremely large, and the specific model that the AI ends up learning is determined by the data it's trained on, not by the algorithm. Once the algorithm is set in motion, the AI discovers very complex patterns in the data, and at that point we have no idea what's going on inside and it may as well have a mind of its own.

So what we're worried about is not that it will be a bad dog, so to speak, but that we'll accidentally give it the wrong function to optimize rf find shortcuts that technically satisfy the letter of the function but violate the spirit of our intent. (God knows if this is also what we *really* mean when we say a dog is a bad dog. This is all so mysterious and poorly-understood.)

Does that make sense? So, you might wonder, shouldn't we focus our safety efforts *on that algorithm?* Apparently there's some reason this won't work either but I haven't really understood it yet. I'm working on understanding it. So far, though, what I've learned is that every bright idea I've had for solving this problem has already been thought of and is silly.

*As for gradient descent, it's pretty basic calculus--but that doesn't mean I remembered it--I had to review it basically from the beginning to figure out what they were talking about. I think I could explain it now if you want me to try, but you're much better off Googling it, because I'm still pretty shaky.

Expand full comment

Actually, correction--we don't know if the AI "experiences" reward and punishment or not.

Expand full comment

“What do the signatories want the rest of the world to do, bomb them?” Very funny

Expand full comment