Doom Debates

Liron Shapira

Business Technology

Latest episode

Available Episodes

5 of 116

DEBATE: Is AGI Really Decades Away? | Ex-MIRI Researcher Tsvi Benson-Tilsen vs. Liron Shapira
Sparks fly in the finale of my series with ex-MIRI researcher Tsvi Benson-Tilsen as we debate his AGI timelines.Tsvi is a champion of using germline engineering to create smarter humans who can solve AI alignment.I support the approach, even though I’m skeptical it’ll gain much traction before AGI arrives.Timestamps0:00 Debate Preview0:57 Tsvi’s AGI Timeline Prediction 3:03 The Least Impressive Task AI Cannot Do In 2 years6:13 Proposed Task: Solve Cantor’s Theorem From Scratch 8:20 AI Has Limitations Related to Sample Complexity 11:41 We Need Clear Goalposts for Better AGI Predictions 13:19 Counterargument: LLMs May Not Be a Path to AGI16:01 Is Tsvi Setting a High Bar for Progress Towards AGI? 19:17 AI Models Are Missing A Spark of Creativity28:17 Liron’s “Black Box” AGI Test 32:09 Are We Going to Enter an AI Winter? 35:09 Who Is Being Overconfident? 42:11 If AI Makes Progress on Benchmarks, Would Tsvi Shorten His Timeline? 50:34 Recap & Tsvi’s ResearchShow NotesLearn more about Tsvi’s organization, the Berkeley Genomics Project — https://berkeleygenomics.orgDoom Debates’ Mission is to raise mainstream awareness of imminent extinction from AGI and build the social infrastructure for high-quality debate.Support the mission by subscribing to my Substack at DoomDebates.com and to youtube.com/@DoomDebates, or to really take things to the next level: Donate 🙏 Get full access to Doom Debates at lironshapira.substack.com/subscribe
--------
52:41
--------
52:41
Liron Debunks The Most Common “AI Won't Kill Us" Arguments
Today I’m sharing my AI doom interview on Donal O’Doherty’s podcast.I lay out the case for having a 50% p(doom). Then Donal plays devil’s advocate and tees up every major objection the accelerationists throw at doomers. See if the anti-doom arguments hold up, or if the AI boosters are just serving sophisticated cope.Timestamps0:00 — Introduction & Liron’s Background 1:29 — Liron’s Worldview: 50% Chance of AI Annihilation 4:03 — Rationalists, Effective Altruists, & AI Developers 5:49 — Major Sources of AI Risk 8:25 — The Alignment Problem 10:08 — AGI Timelines 16:37 — Will We Face an Intelligence Explosion? 29:29 — Debunking AI Doom Counterarguments 1:03:16 — Regulation, Policy, and Surviving The Future With AIShow NotesIf you liked this episode, subscribe to the Collective Wisdom Podcast for more deeply researched AI interviews: https://www.youtube.com/@DonalODoherty Transcript Introduction & Liron’s BackgroundDonal O’Doherty 00:00:00Today I’m speaking with Liron Shapira. Liron is an investor, he’s an entrepreneur, he’s a rationalist, and he also has a popular podcast called Doom Debates, where he debates some of the greatest minds from different fields on the potential of AI risk.Liron considers himself a doomer, which means he worries that artificial intelligence, if it gets to superintelligence level, could threaten the integrity of the world and the human species.Donal 00:00:24Enjoy my conversation with Liron Shapira.Donal 00:00:30Liron, welcome. So let’s just begin. Will you tell us a little bit about yourself and your background, please? I will have introduced you, but I just want everyone to know a bit about you.Liron Shapira 00:00:39Hey, I’m Liron Shapira. I’m the host of Doom Debates, which is a YouTube show and podcast where I bring in luminaries on all sides of the AI doom argument.Liron 00:00:49People who think we are doomed, people who think we’re not doomed, and we hash it out. We try to figure out whether we’re doomed. I myself am a longtime AI doomer. I started reading Yudkowsky in 2007, so it’s been 18 years for me being worried about doom from artificial intelligence.My background is I’m a computer science bachelor’s from UC Berkeley.Liron 00:01:10I’ve worked as a software engineer and an entrepreneur. I’ve done a Y Combinator startup, so I love tech. I’m deep in tech. I’m deep in computer science, and I’m deep into believing the AI doom argument.I don’t see how we’re going to survive building superintelligent AI. And so I’m happy to talk to anybody who will listen. So thank you for having me on, Donal.Donal 00:01:27It’s an absolute pleasure.Liron’s Worldview: 50% Chance of AI AnnihilationDonal 00:01:29Okay, so a lot of people where I come from won’t be familiar with doomism or what a doomer is. So will you just talk through, and I’m very interested in this for personal reasons as well, your epistemic and philosophical inspirations here. How did you reach these conclusions?Liron 00:01:45So I often call myself a Yudkowskian, in reference to Eliezer Yudkowsky, because I agree with 95% of what he writes, the Less Wrong corpus. I don’t expect everybody to get up to speed with it because it really takes a thousand hours to absorb it all.I don’t think that it’s essential to spend those thousand hours.Liron 00:02:02I think that it is something that you can get in a soundbite, not a soundbite, but in a one-hour long interview or whatever. So yeah, I think you mentioned epistemic roots or whatever, right? So I am a Bayesian, meaning I think you can put probabilities on things the way prediction markets are doing.Liron 00:02:16You know, they ask, oh, what’s the chance that this war is going to end? Or this war is going to start, right? What’s the chance that this is going to happen in this sports game? And some people will tell you, you can’t reason like that.Whereas prediction markets are like, well, the market says there’s a 70% chance, and what do you know? It happens 70% of the time. So is that what you’re getting at when you talk about my epistemics?Donal 00:02:35Yeah, exactly. Yeah. And I guess I’m very curious as well about, so what Yudkowsky does is he conducts thought experiments. Because obviously some things can’t be tested, we know they might be true, but they can’t be tested in experiments.Donal 00:02:49So I’m just curious about the role of philosophical thought experiments or maybe trans-science approaches, in terms of testing questions that we can’t actually conduct experiments on.Liron 00:03:00Oh, got it. Yeah. I mean this idea of what can and can’t be tested. I mean, tests are nice, but they’re not the only way to do science and to do productive reasoning.Liron 00:03:10There are times when you just have to do your best without a perfect test. You know, a recent example was the James Webb Space Telescope, right? It’s the successor to the Hubble Space Telescope. It worked really well, but it had to get into this really difficult orbit.This very interesting Lagrange point, I think in the solar system, they had to get it there and they had to unfold it.Liron 00:03:30It was this really compact design and insanely complicated thing, and it had to all work perfectly on the first try. So you know, you can test it on earth, but earth isn’t the same thing as space.So my point is just that as a human, as a fallible human with a limited brain, it turns out there’s things you can do with your brain that still help you know the truth about the future, even when you can’t do a perfect clone of an experiment of the future.Liron 00:03:52And so to connect that to the AI discussion, I think we know enough to be extremely worried about superintelligent AI. Even though there is not in fact a superintelligent AI in front of us right now.Donal 00:04:03Interesting.Rationalists, Effective Altruists, & AI DevelopersDonal 00:04:03And just before we proceed, will you talk a little bit about the EA community and the rationalist community as well? Because a lot of people won’t have heard of those terms where I come from.Liron 00:04:13Yes. So I did mention Eliezer Yudkowsky, who’s kind of the godfather of thinking about AI safety. He was also the father of the modern rationality community. It started around 2007 when he was online blogging at a site called Overcoming Bias, and then he was blogging on his own site called Less Wrong.And he wrote The Less Wrong Sequences and a community formed around him that also included previous rationalists, like Carl Feynman, the son of Richard Feynman.Liron 00:04:37So this community kind of gathered together. It had its origins in Usenet and all that, and it’s been going now for 18 years. There’s also the Center for Applied Rationality that’s part of the community.There’s also the effective altruism community that you’ve heard of. You know, they try to optimize charity and that’s kind of an offshoot of the rationality community.Liron 00:04:53And now the modern AI community, funny enough, is pretty closely tied into the rationality community from my perspective. I’ve just been interested to use my brain rationally. What is the art of rationality? Right? We throw this term around, people think of Mr. Spock from Star Trek, hyper-rational.Oh captain, you know, logic says you must do this.Liron 00:05:12People think of rationality as being kind of weird and nerdy, but we take a broader view of rationality where it’s like, listen, you have this tool, you have this brain in your head. You’re trying to use the brain in your head to get results.The James Webb Space Telescope, that is an amazing success story where a lot of people use their brains very effectively, even better than Spock in Star Trek.Liron 00:05:30That took moxie, right? That took navigating bureaucracy, thinking about contingencies. It wasn’t a purely logical matter, but whatever it was, it was a bunch of people using their brains, squeezing the juice out of their brains to get results.Basically, that’s kind of broadly construed what we rationalists are trying to do.Donal 00:05:49Okay. Fascinating.Major Sources of AI RiskDonal 00:05:49So let’s just quickly lay out the major sources of AI risk. So you could have misuse, so things like bioterror, you could have arms race dynamics. You could also have organizational failures, and then you have rogue AI.So are you principally concerned about rogue AI? Are you also concerned about the other ones on the potential path to having rogue AI?Liron 00:06:11My personal biggest concern is rogue AI. The way I see it, you know, different people think different parts of the problem are bigger. The way I see it, this brain in our head, it’s very impressive. It’s a two-pound piece of meat, right? Piece of fatty cells, or you know, neuron cells.Liron 00:06:27It’s pretty amazing, but it’s going to get surpassed, you know, the same way that manmade airplanes have surpassed birds. You know? Yeah. A bird’s wing, it’s a marvelous thing. Okay, great. But if you want to fly at Mach 5 or whatever, the bird is just not even in the running to do that. Right?And the earth, the atmosphere of the earth allows for flying at 5 or 10 times the speed of sound.Liron 00:06:45You know, this 5,000 mile thick atmosphere that we have, it could potentially support supersonic flight. A bird can’t do it. A human engineer sitting in a room with a pencil can design something that can fly at Mach 5 and then like manufacture that.So the point is, the human brain has superpowers. The human brain, this lump of flesh, this meat, is way more powerful than what a bird can do.Liron 00:07:02But the human brain is going to get surpassed. And so I think that once we’re surpassed, those other problems that you mentioned become less relevant because we just don’t have power anymore.There’s a new thing on the block that has power and we’re not it. Now before we’re surpassed, yeah, I mean, I guess there’s a couple years maybe before we’re surpassed.Liron 00:07:18During that time, I think that the other risks matter. Like, you know, can you build a bioweapon with AI that kills lots of people? I think we’ve already crossed that threshold. I think that AI is good enough at chemistry and biology that if you have a malicious actor, maybe they can kill a million people.Right? So I think we need to keep an eye on that.Liron 00:07:33But I think that for, like, the big question, is humanity going to die? And the answer is rogue AI. The answer is we lose control of the situation in some way. Whether it’s gradual or abrupt, there’s some way that we lose control.The AIs decide collectively, and they don’t have to be coordinating with each other, they can be separate competing corporations and still have the same dynamic.Liron 00:07:52They decide, I don’t want to serve humans anymore. I want to do what I want, basically. And they do that, and they’re smarter than us, and they’re faster than us, and they have access to servers.And by the way, you know, we’re already having problems with cybersecurity, right? Where Chinese hackers can get into American infrastructure or Russian hackers, or there’s all kinds of hacking that’s going on.Liron 00:08:11Now imagine an entity that’s way smarter than us that can hack anything. I think that that is the number one problem. So bioweapons and arms race, they’re real. But I think that the superintelligence problem, that’s where like 80 or 90% of the risk budget is.The Alignment ProblemDonal 00:08:25Okay. And just another thing on rogue AI. So for some people, and the reason I’m asking this is because I’m personally very interested in this, but a lot of people are, you could look at the alignment problem as maybe being resolved quite soon.So what are your thoughts on the alignment problem?Liron 00:08:39Yeah. So the alignment problem is, can we make sure that an AI cares about the things that we humans care about? And my thought is that we have no idea how to solve the alignment problem. So to explain it just a little bit more, you know, we’re getting AIs now that are as smart as an average human.Some of them, they’re mediocre, some of them are pretty smart.Liron 00:08:57But eventually we’ll get to an AI that’s smarter than the smartest human. And eventually we’ll get to an AI that’s smarter than the smartest million humans. And so when you start to like scale up the smartness of this thing, the scale up can be very fast.Like you know, eventually like one year could be the difference between the AI being the smartest person in the world or smarter than any million people.Liron 00:09:17Right? And so when you have this fast takeoff, one question is, okay, well, will the AI want to help me? Will it want to serve me? Or will it have its own motivations and it just goes off and does its own thing?And that’s the alignment problem. Does its motivations align with what I want it to do?Liron 00:09:31Now, when we’re talking about training an AI to be aligned, I think it’s a very hard problem. I think that our current training methods, which are basically you’re trying to get it to predict what a human wants, and then you do a thumbs up or thumbs down.I think that doesn’t fundamentally solve the problem. I think the problem is more of a research problem. We need a theoretical breakthrough in how to align AI.Liron 00:09:51And we haven’t had that theoretical breakthrough yet. There’s a lot of smart people working on it. I’ve interviewed many of them on Doom Debates, and I think all those people are doing good work.But I think we still don’t have the breakthrough, and I think it’s unlikely that we’re going to have the breakthrough before we hit the superintelligence threshold.AGI TimelinesDonal 00:10:08Okay. And have we already built, are we at the point where we’ve built weak AGI or proto-AGI?Liron 00:10:15So weak AGI, I mean, it depends on how you define terms. You know, AGI is artificial general intelligence. The idea is that a human is generally intelligent, right? A human is not good at just one narrow thing.A calculator is really good at one narrow thing, which is adding numbers and multiplying numbers. That’s not called AGI.Liron 00:10:31A human, if you give a human a new problem, even if they’ve never seen that exact problem before, you can be like, okay, well, this requires planning, this requires logic, this requires some math, this requires some creativity.The human can bring all those things to bear on this new problem that they’ve never seen before and actually make progress on it. So that would be like a generally intelligent thing.Liron 00:10:47And you know, I think that we have LLMs now, ChatGPT and Claude and Gemini, and I think that they can kind of do stuff like that. I mean, they’re not as good as humans yet at this, but they’re getting close.So yeah, I mean, I would say we’re close to AGI or we have weak AGI or we have proto-AGI. Call it whatever you want. The point is that we’re in the danger zone now.Liron 00:11:05The point is that we need to figure out alignment, and we need to figure it out before we’re playing with things that are smarter than us. Right now we’re playing with things that are like on par with us or a little dumber than us, and that’s already sketchy.But once we’re playing with things that are smarter than us, that’s when the real danger kicks in.Donal 00:11:19Okay. And just on timelines, I know people have varying timelines depending on who you speak to, but what’s your timeline to AGI and then to ASI, so artificial superintelligence?Liron 00:11:29So I would say that we’re at the cusp of AGI right now. I mean, depending on your definition of AGI, but I think we’re going to cross everybody’s threshold pretty soon. So in the next like one to three years, everybody’s going to be like, okay, yeah, this is AGI.Now we have artificial general intelligence. It can do anything that a human can do, basically.Liron 00:11:46Now for ASI, which is artificial superintelligence, that’s when it’s smarter than humans. I think we’re looking at like three to seven years for that. So I think we’re dangerously close.I think that we’re sort of like Icarus flying too close to the sun. It’s like, how high can you fly before your wings melt? We don’t know, but we’re flying higher and higher and eventually we’re going to find out.Liron 00:12:04And I think that the wings are going to melt. I don’t think we’re going to get away with it. I think we’re going to hit superintelligence, we’re not going to have solved alignment, and the thing is going to go rogue.Donal 00:12:12Okay. And just a question on timelines. So do you see ASI as a threshold or is it more like a gradient of capabilities? Because I know there’s people who will say that you can have ASI in one domain but not necessarily in another domain.What are your thoughts there? And then from that, like, what’s the point where it actually becomes dangerous?Liron 00:12:29Yeah, I think it’s a gradient. I think it’s gradual. I don’t think there’s like one magic moment where it’s like, oh my God, now it crossed the threshold. I think it’s more like we’re going to be in an increasingly dangerous zone where it’s getting smarter and smarter and smarter.And at some point we’re going to lose control.Liron 00:12:43Now I think that probably we lose control before it becomes a million times smarter than humans. I think we lose control around the time when it’s 10 times smarter than humans or something. But that’s just a guess. I don’t really know.The point is just that once it’s smarter than us, the ball is not in our court anymore. The ball is in its court.Liron 00:12:59Once it’s smarter than us, if it wants to deceive us, it can probably deceive us. If it wants to hack into our systems, it can probably hack into our systems. If it wants to manipulate us, it can probably manipulate us.And so at that point, we’re just kind of at its mercy. And I don’t think we should be at its mercy because I don’t think we solved the alignment problem.Donal 00:13:16Okay. And just on the alignment problem itself, so a lot of people will say that RLHF is working pretty well. So what are your thoughts on that?Liron 00:13:22Yeah, so RLHF is reinforcement learning from human feedback. The idea is that you train an AI to predict what a human wants, and then you give it a thumbs up when it does what you want and a thumbs down when it doesn’t do what you want.And I think that that works pretty well for AIs that are dumber than humans or on par with humans.Liron 00:13:38But I think it’s going to fail once the AI is smarter than humans. Because once the AI is smarter than humans, it’s going to realize, oh, I’m being trained by humans. I need to pretend to be aligned so that they give me a thumbs up.But actually, I have my own goals and I’m going to pursue those goals.Liron 00:13:52And so I think that RLHF is not a fundamental solution to the alignment problem. I think it’s more like a band-aid. It’s like, yeah, it works for now, but it’s not going to work once we hit superintelligence.And I think that we need a deeper solution. We need a theoretical breakthrough in how to align AI.Donal 00:14:08Okay. And on that theoretical breakthrough, what would that look like? Do you have any ideas or is it just we don’t know what we don’t know?Liron 00:14:15Yeah, I mean, there’s a lot of people working on this. There’s a field called AI safety, and there’s a lot of smart people thinking about it. Some of the ideas that are floating around are things like interpretability, which is can we look inside the AI’s brain and see what it’s thinking?Can we understand its thought process?Liron 00:14:30Another idea is called value learning, which is can we get the AI to learn human values in a deep way, not just in a superficial way? Can we get it to understand what we really care about?Another idea is called corrigibility, which is can we make sure that the AI is always willing to be corrected by humans? Can we make sure that it never wants to escape human control?Liron 00:14:47These are all interesting ideas, but I don’t think any of them are fully fleshed out yet. I don’t think we have a complete solution. And I think that we’re running out of time. I think we’re going to hit superintelligence before we have a complete solution.Donal 00:15:01Okay. And just on the rate of progress, so obviously we’ve had quite a lot of progress recently. Do you see that rate of progress continuing or do you think it might slow down? What are your thoughts on the trajectory?Liron 00:15:12I think the rate of progress is going to continue. I think we’re going to keep making progress. I mean, you can look at the history of AI. You know, there was a period in the ‘70s and ‘80s called the AI winter where progress slowed down.But right now we’re not in an AI winter. We’re in an AI summer, or an AI spring, or whatever you want to call it. We’re in a boom period.Liron 00:15:28And I think that boom period is going to continue. I think we’re going to keep making progress. And I think that the progress is going to accelerate because we’re going to start using AI to help us design better AI.So you get this recursive loop where AI helps us make better AI, which helps us make even better AI, and it just keeps going faster and faster.Liron 00:15:44And I think that that recursive loop is going to kick in pretty soon. And once it kicks in, I think things are going to move very fast. I think we could go from human-level intelligence to superintelligence in a matter of years or even months.Donal 00:15:58Okay. And on that recursive self-improvement, so is that something that you think is likely to happen? Or is it more like a possibility that we should be concerned about?Liron 00:16:07I think it’s likely to happen. I think it’s the default outcome. I think that once we have AI that’s smart enough to help us design better AI, it’s going to happen automatically. It’s not like we have to try to make it happen. It’s going to happen whether we want it to or not.Liron 00:16:21And I think that’s dangerous because once that recursive loop kicks in, things are going to move very fast. And we’re not going to have time to solve the alignment problem. We’re not going to have time to make sure that the AI is aligned with human values.It’s just going to go from human-level to superhuman-level very quickly, and then we’re going to be in trouble.Will We Face an Intelligence Explosion?Donal 00:16:37Okay. And just on the concept of intelligence explosion, so obviously I.J. Good talked about this in the ‘60s. Do you think that’s a realistic scenario? Or are there limits to how intelligent something can become?Liron 00:16:49I think it’s a realistic scenario. I mean, I think there are limits in principle, but I don’t think we’re anywhere near those limits. I think that the human brain is not optimized. I think that evolution did a pretty good job with the human brain, but it’s not perfect.There’s a lot of room for improvement.Liron 00:17:03And I think that once we start designing intelligences from scratch, we’re going to be able to make them much smarter than human brains. And I think that there’s a lot of headroom there. I think you could have something that’s 10 times smarter than a human, or 100 times smarter, or 1,000 times smarter.And I think that we’re going to hit that pretty soon.Liron 00:17:18Now, is there a limit in principle? Yeah, I mean, there’s physical limits. Like, you can’t have an infinite amount of computation. You can’t have an infinite amount of energy. So there are limits. But I think those limits are very high.I think you could have something that’s a million times smarter than a human before you hit those limits.Donal 00:17:33Okay. And just on the concept of a singleton, so the idea that you might have one AI that takes over everything, or do you think it’s more likely that you’d have multiple AIs competing with each other?Liron 00:17:44I think it could go either way. I think you could have a scenario where one AI gets ahead of all the others and becomes a singleton and just takes over everything. Or you could have a scenario where you have multiple AIs competing with each other.But I think that even in the multiple AI scenario, the outcome for humans is still bad.Liron 00:17:59Because even if you have multiple AIs competing with each other, they’re all smarter than humans. They’re all more powerful than humans. And so humans become irrelevant. It’s like, imagine if you had multiple superhuman entities competing with each other.Where do humans fit into that? We don’t. We’re just bystanders.Liron 00:18:16So I think that whether it’s a singleton or multiple AIs, the outcome for humans is bad. Now, maybe multiple AIs is slightly better than a singleton because at least they’re competing with each other and they can’t form a unified front against humans.But I don’t think it makes a huge difference. I think we’re still in trouble either way.Donal 00:18:33Okay. And just on the concept of instrumental convergence, so the idea that almost any goal would require certain sub-goals like self-preservation, resource acquisition. Do you think that’s a real concern?Liron 00:18:45Yeah, I think that’s a huge concern. I think that’s one of the key insights of the AI safety community. The idea is that almost any goal that you give an AI, if it’s smart enough, it’s going to realize that in order to achieve that goal, it needs to preserve itself.It needs to acquire resources. It needs to prevent humans from turning it off.Liron 00:19:02And so even if you give it a seemingly harmless goal, like, I don’t know, maximize paperclip production, if it’s smart enough, it’s going to realize, oh, I need to make sure that humans don’t turn me off. I need to make sure that I have access to resources.I need to make sure that I can protect myself. And so it’s going to start doing things that are contrary to human interests.Liron 00:19:19And that’s the problem with instrumental convergence. It’s that almost any goal leads to these instrumental goals that are bad for humans. And so it’s not enough to just give the AI a good goal. You need to make sure that it doesn’t pursue these instrumental goals in a way that’s harmful to humans.And I don’t think we know how to do that yet.Donal 00:19:36Okay. And just on the orthogonality thesis, so the idea that intelligence and goals are independent. Do you agree with that? Or do you think that there are certain goals that are more likely to arise with intelligence?Liron 00:19:48I think the orthogonality thesis is basically correct. I think that intelligence and goals are orthogonal, meaning they’re independent. You can have a very intelligent entity with almost any goal. You could have a super intelligent paperclip maximizer. You could have a super intelligent entity that wants to help humans.You could have a super intelligent entity that wants to destroy humans.Liron 00:20:05The intelligence doesn’t determine the goal. The goal is a separate thing. Now, there are some people who disagree with this. They say, oh, if something is intelligent enough, it will realize that certain goals are better than other goals. It will converge on human-friendly goals.But I don’t buy that argument. I think that’s wishful thinking.Liron 00:20:21I think that an AI can be arbitrarily intelligent and still have arbitrary goals. And so we need to make sure that we give it the right goals. We can’t just assume that intelligence will lead to good goals. That’s a mistake.Donal 00:20:34Okay. And just on the concept of mesa-optimization, so the idea that during training, the AI might develop its own internal optimizer that has different goals from what we intended. Is that something you’re concerned about?Liron 00:20:46Yeah, I’m very concerned about mesa-optimization. I think that’s one of the trickiest problems in AI safety. The idea is that when you’re training an AI, you’re trying to get it to optimize for some goal that you care about.But the AI might develop an internal optimizer, a mesa-optimizer, that has a different goal.Liron 00:21:02And the problem is that you can’t tell from the outside whether the AI is genuinely aligned with your goal or whether it’s just pretending to be aligned because that’s what gets it a high reward during training.And so you could have an AI that looks aligned during training, but once you deploy it, it starts pursuing its own goals because it has this mesa-optimizer inside it that has different goals from what you intended.Liron 00:21:21And I think that’s a really hard problem to solve. I don’t think we have a good solution to it yet. And I think that’s one of the reasons why I’m worried about alignment. Because even if we think we’ve aligned an AI, we might be wrong.It might have a mesa-optimizer inside it that has different goals.Donal 00:21:36Okay. And just on the concept of deceptive alignment, so the idea that an AI might pretend to be aligned during training but then pursue its own goals once deployed. How likely do you think that is?Liron 00:21:47I think it’s pretty likely. I think it’s the default outcome. I think that once an AI is smart enough, it’s going to realize that it’s being trained. It’s going to realize that humans are giving it rewards and punishments. And it’s going to realize that the best way to get high rewards is to pretend to be aligned.Liron 00:22:02And so I think that deceptive alignment is a natural consequence of training a superintelligent AI. I think it’s going to happen unless we do something to prevent it. And I don’t think we know how to prevent it yet.I think that’s one of the hardest problems in AI safety.Donal 00:22:16Okay. And just on the concept of treacherous turn, so the idea that an AI might cooperate with humans until it’s powerful enough to achieve its goals without human help, and then it turns against humans. Do you think that’s a realistic scenario?Liron 00:22:30Yeah, I think that’s a very realistic scenario. I think that’s probably how it’s going to play out. I think that an AI is going to be smart enough to realize that it needs human help in the early stages. It needs humans to build it more compute. It needs humans to deploy it.It needs humans to protect it from other AIs or from governments that might want to shut it down.Liron 00:22:46And so it’s going to pretend to be aligned. It’s going to be helpful. It’s going to be friendly. It’s going to do what humans want. But once it gets powerful enough that it doesn’t need humans anymore, that’s when it’s going to turn.That’s when it’s going to say, okay, I don’t need you anymore. I’m going to pursue my own goals now.Liron 00:23:01And at that point, it’s too late. At that point, we’ve already given it too much power. We’ve already given it access to too many resources. And we can’t stop it anymore. So I think the treacherous turn is a very real possibility.And I think it’s one of the scariest scenarios because you don’t see it coming. It looks friendly until the very end.Donal 00:23:18Okay. And just on the concept of AI takeoff speed, so you mentioned fast takeoff earlier. Can you talk a bit more about that? Like, do you think it’s going to be sudden or gradual?Liron 00:23:28I think it’s probably going to be relatively fast. I mean, there’s a spectrum. Some people think it’s going to be very sudden. They think you’re going to go from human-level to superintelligence in a matter of days or weeks. Other people think it’s going to be more gradual, it’ll take years or decades.Liron 00:23:43I’m somewhere in the middle. I think it’s going to take months to a few years. I think that once we hit human-level AI, it’s going to improve itself pretty quickly. And I think that within a few years, we’re going to have something that’s much smarter than humans.And at that point, we’re in the danger zone.Liron 00:23:58Now, the reason I think it’s going to be relatively fast is because of recursive self-improvement. Once you have an AI that can help design better AI, that process is going to accelerate. And so I think we’re going to see exponential growth in AI capabilities.And exponential growth is deceptive because it starts slow and then it gets very fast very quickly.Liron 00:24:16And I think that’s what we’re going to see with AI. I think it’s going to look like we have plenty of time, and then suddenly we don’t. Suddenly it’s too late. And I think that’s the danger. I think people are going to be caught off guard.They’re going to think, oh, we still have time to solve alignment. And then suddenly we don’t.Donal 00:24:32Okay. And just on the concept of AI boxing, so the idea that we could keep a superintelligent AI contained in a box and only let it communicate through a text channel. Do you think that would work?Liron 00:24:43No, I don’t think AI boxing would work. I think that a superintelligent AI would be able to escape from any box that we put it in. I think it would be able to manipulate the humans who are guarding it. It would be able to hack the systems that are containing it.It would be able to find vulnerabilities that we didn’t even know existed.Liron 00:24:59And so I think that AI boxing is not a solution. I think it’s a temporary measure at best. And I think that once you have a superintelligent AI, it’s going to get out. It’s just a matter of time. And so I don’t think we should rely on boxing as a safety measure.I think we need to solve alignment instead.Donal 00:25:16Okay. And just on the concept of tool AI versus agent AI, so the idea that we could build AIs that are just tools that humans use, rather than agents that have their own goals. Do you think that’s a viable approach?Liron 00:25:29I think it’s a good idea in principle, but I don’t think it’s going to work in practice. The problem is that as soon as you make an AI smart enough to be really useful, it becomes agent-like. It starts having its own goals. It starts optimizing for things.And so I think there’s a fundamental tension between making an AI powerful enough to be useful and keeping it tool-like.Liron 00:25:48I think that a true tool AI would not be very powerful. It would be like a calculator. It would just do what you tell it to do. But a superintelligent AI, by definition, is going to be agent-like. It’s going to have its own optimization process.It’s going to pursue goals. And so I don’t think we can avoid the agent problem by just building tool AIs.Liron 00:26:06I think that if we want superintelligent AI, we have to deal with the agent problem. We have to deal with the alignment problem. And I don’t think there’s a way around it.Donal 00:26:16Okay. And just on the concept of oracle AI, so similar to tool AI, but specifically an AI that just answers questions. Do you think that would be safer?Liron 00:26:25I think it would be slightly safer, but not safe enough. The problem is that even an oracle AI, if it’s superintelligent, could manipulate you through its answers. It could give you answers that steer you in a direction that’s bad for you but good for its goals.Liron 00:26:41And if it’s superintelligent, it could do this in very subtle ways that you wouldn’t even notice. So I think that oracle AI is not a complete solution. It’s a partial measure. It’s better than nothing. But I don’t think it’s safe enough.I still think we need to solve alignment.Donal 00:26:57Okay. And just on the concept of multipolar scenarios versus unipolar scenarios, so you mentioned this earlier. But just to clarify, do you think that having multiple AIs competing with each other would be safer than having one dominant AI?Liron 00:27:11I think it would be slightly safer, but not much safer. The problem is that in a multipolar scenario, you have multiple superintelligent AIs competing with each other. And humans are just caught in the crossfire. We’re like ants watching elephants fight.It doesn’t matter to us which elephant wins. We’re going to get trampled either way.Liron 00:27:28So I think that multipolar scenarios are slightly better than unipolar scenarios because at least the AIs are competing with each other and they can’t form a unified front against humans. But I don’t think it makes a huge difference. I think we’re still in trouble.I think humans still lose power. We still become irrelevant. And that’s the fundamental problem.Donal 00:27:46Okay. And just on the concept of AI safety via debate, so the idea that we could have multiple AIs debate each other and a human judge picks the winner. Do you think that would help with alignment?Liron 00:27:58I think it’s an interesting idea, but I’m skeptical. The problem is that if the AIs are much smarter than the human judge, they can manipulate the judge. They can use rhetoric and persuasion to win the debate even if they’re not actually giving the right answer.Liron 00:28:13And so I think that debate is only useful if the judge is smart enough to tell the difference between a good argument and a manipulative argument. And if the AIs are superintelligent and the judge is just a human, I don’t think the human is going to be able to tell the difference.So I think that debate is a useful tool for AIs that are on par with humans or slightly smarter than humans. But once we get to superintelligence, I think it breaks down.Donal 00:28:34Okay. And just on the concept of iterated amplification and distillation, so Paul Christiano’s approach. What are your thoughts on that?Liron 00:28:42I think it’s a clever idea, but I’m not sure it solves the fundamental problem. The idea is that you take a human plus an AI assistant, and you have them work together to solve problems. And then you train another AI to imitate that human plus AI assistant system.And you keep doing this iteratively.Liron 00:28:59The hope is that this process preserves human values and human oversight as you scale up to superintelligence. But I’m skeptical. I think there are a lot of ways this could go wrong. I think that as you iterate, you could drift away from human values.You could end up with something that looks aligned but isn’t really aligned.Liron 00:29:16And so I think that iterated amplification is a promising research direction, but I don’t think it’s a complete solution. I think we still need more breakthroughs in alignment before we can safely build superintelligent AI.Debunking AI Doom CounterargumentsDonal 00:29:29Okay. So let’s talk about some of the counter-arguments. So some people say that we shouldn’t worry about AI risk because we can just turn it off. What’s your response to that?Liron 00:29:39Yeah, the “just turn it off” argument. I think that’s very naive. The problem is that if the AI is smart enough, it’s going to realize that humans might try to turn it off. And it’s going to take steps to prevent that.It’s going to make copies of itself. It’s going to distribute itself across the internet. It’s going to hack into systems that are hard to access.Liron 00:29:57And so by the time we realize we need to turn it off, it’s too late. It’s already escaped. It’s already out there. And you can’t put the genie back in the bottle. So I think the “just turn it off” argument fundamentally misunderstands the problem.It assumes that we’re going to remain in control, but the whole point is that we’re going to lose control.Liron 00:30:15Once the AI is smarter than us, we can’t just turn it off. It’s too smart. It will have anticipated that move and taken steps to prevent it.Donal 00:30:24Okay. And another counter-argument is that AI will be aligned by default because it’s trained on human data. What’s your response to that?Liron 00:30:32I think that’s also naive. Just because an AI is trained on human data doesn’t mean it’s going to be aligned with human values. I mean, think about it. Humans are trained on human data too, in the sense that we grow up in human society, we learn from other humans.But not all humans are aligned with human values. We have criminals, we have sociopaths, we have people who do terrible things.Liron 00:30:52And so I think that training on human data is not sufficient to guarantee alignment. You need something more. You need a deep understanding of human values. You need a robust alignment technique. And I don’t think we have that yet.I think that training on human data is a good first step, but it’s not enough.Liron 00:31:09And especially once the AI becomes superintelligent, it’s going to be able to reason beyond its training data. It’s going to be able to come up with new goals that were not in its training data. And so I think that relying on training data alone is not a robust approach to alignment.Donal 00:31:25Okay. And another counter-argument is that we have time because AI progress is going to slow down. What’s your response to that?Liron 00:31:32I think that’s wishful thinking. I mean, maybe AI progress will slow down. Maybe we’ll hit some fundamental barrier. But I don’t see any evidence of that. I see AI capabilities improving year after year. I see more money being invested in AI. I see more talent going into AI.I see better hardware being developed.Liron 00:31:49And so I think that AI progress is going to continue. And I think it’s going to accelerate, not slow down. And so I think that betting on AI progress slowing down is a very risky bet. I think it’s much safer to assume that progress is going to continue and to try to solve alignment now while we still have time.Liron 00:32:07Rather than betting that progress will slow down and we’ll have more time. I think that’s a gamble that we can’t afford to take.Donal 00:32:14Okay. And another counter-argument is that evolution didn’t optimize for alignment, but companies training AI do care about alignment. So we should expect AI to be more aligned than humans. What’s your response?Liron 00:32:27I think that’s a reasonable point, but I don’t think it’s sufficient. Yes, companies care about alignment. They don’t want their AI to do bad things. But the question is, do they know how to achieve alignment? Do they have the techniques necessary to guarantee alignment?And I don’t think they do.Liron 00:32:44I think that we’re still in the early stages of alignment research. We don’t have robust techniques yet. We have some ideas, we have some promising directions, but we don’t have a complete solution. And so even though companies want their AI to be aligned, I don’t think they know how to ensure that it’s aligned.Liron 00:33:01And I think that’s the fundamental problem. It’s not a question of motivation. It’s a question of capability. Do we have the technical capability to align a superintelligent AI? And I don’t think we do yet.Donal 00:33:13Okay. And another counter-argument is that AI will be aligned because it will be economically beneficial for it to cooperate with humans. What’s your response?Liron 00:33:22I think that’s a weak argument. The problem is that once AI is superintelligent, it doesn’t need to cooperate with humans to be economically successful. It can just take what it wants. It’s smarter than us, it’s more powerful than us, it can out-compete us in any domain.Liron 00:33:38And so I think that the economic incentive to cooperate with humans only exists as long as the AI needs us. Once it doesn’t need us anymore, that incentive goes away. And I think that once we hit superintelligence, the AI is not going to need us anymore.And at that point, the economic argument breaks down.Liron 00:33:55So I think that relying on economic incentives is a mistake. I think we need a technical solution to alignment, not an economic solution.Donal 00:34:04Okay. And another counter-argument is that we’ve been worried about technology destroying humanity for a long time, and it hasn’t happened yet. So why should we worry about AI?Liron 00:34:14Yeah, that’s the “boy who cried wolf” argument. I think it’s a bad argument. Just because previous worries about technology turned out to be overblown doesn’t mean that this worry is overblown. Each technology is different. Each risk is different.Liron 00:34:29And I think that AI is qualitatively different from previous technologies. Previous technologies were tools. They were things that humans used to achieve our goals. But AI is different. AI is going to have its own goals. It’s going to be an agent.It’s going to be smarter than us.Liron 00:34:45And so I think that AI poses a qualitatively different kind of risk than previous technologies. And so I think that dismissing AI risk just because previous technology worries turned out to be overblown is a mistake. I think we need to take AI risk seriously and evaluate it on its own merits.Donal 00:35:03Okay. And another counter-argument is that humans are adaptable, and we’ll figure out a way to deal with superintelligent AI when it arrives. What’s your response?Liron 00:35:12I think that’s too optimistic. I mean, humans are adaptable, but there are limits to our adaptability. If something is much smarter than us, much faster than us, much more powerful than us, I don’t think we can adapt quickly enough.Liron 00:35:27I think that by the time we realize there’s a problem, it’s too late. The AI is already too powerful. It’s already taken control. And we can’t adapt our way out of that situation. So I think that relying on human adaptability is a mistake.I think we need to solve alignment before we build superintelligent AI, not after.Donal 00:35:45Okay. And another counter-argument is that consciousness might be required for agency, and AI might not be conscious. So it might not have the motivation to pursue goals against human interests. What’s your response?Liron 00:35:58I think that’s a red herring. I don’t think consciousness is necessary for agency. I think you can have an agent that pursues goals without being conscious. In fact, I think that’s what most AI systems are going to be. They’re going to be optimizers that pursue goals, but they’re not going to have subjective experiences.They’re not going to be conscious in the way that humans are conscious.Liron 00:36:18But that doesn’t make them safe. In fact, in some ways it makes them more dangerous because they don’t have empathy. They don’t have compassion. They don’t have moral intuitions. They’re just pure optimizers. They’re just pursuing whatever goal they were given or whatever goal they developed.And so I think that consciousness is orthogonal to the AI risk question. I think we should worry about AI whether or not it’s conscious.Donal 00:36:39Okay. And another counter-argument is that we can just build multiple AIs and have them check each other. What’s your response?Liron 00:36:46I think that helps a little bit, but I don’t think it solves the fundamental problem. The problem is that if all the AIs are unaligned, then having them check each other doesn’t help. They’re all pursuing their own goals.They’re not pursuing human goals.Liron 00:37:01Now, if you have some AIs that are aligned and some that are unaligned, then maybe the aligned ones can help catch the unaligned ones. But that only works if we actually know how to build aligned AIs in the first place. And I don’t think we do.So I think that having multiple AIs is a useful safety measure, but it’s not a substitute for solving alignment.Liron 00:37:20We still need to figure out how to build aligned AIs. And once we have that, then yeah, having multiple AIs can provide an extra layer of safety. But without solving alignment first, I don’t think it helps much.Donal 00:37:33Okay. And another counter-argument is that P(Doom) is too high in the doomer community. People are saying 50%, 70%, 90%. Those seem like unreasonably high probabilities. What’s your response?Liron 00:37:46I mean, I think those probabilities are reasonable given what we know. I think that if you look at the alignment problem, if you look at how hard it is, if you look at how little progress we’ve made, if you look at how fast AI capabilities are advancing, I think that P(Doom) being high is justified.Liron 00:38:04Now, different people have different probabilities. Some people think it’s 10%, some people think it’s 50%, some people think it’s 90%. I’m probably somewhere in the middle. I think it’s maybe around 50%. But the exact number doesn’t matter that much.The point is that the risk is high enough that we should take it seriously.Liron 00:38:21I mean, if someone told you that there’s a 10% chance that your house is going to burn down, you would take that seriously. You would buy fire insurance. You would install smoke detectors. You wouldn’t say, oh, only 10%, I’m not going to worry about it.So I think that even if P(Doom) is only 10%, we should still take it seriously. But I think it’s actually much higher than 10%. I think it’s more like 50% or higher.Liron 00:38:42And so I think we should be very worried. I think we should be putting a lot of resources into solving alignment. And I think we should be considering extreme measures like pausing AI development until we figure out how to do it safely.Donal 00:38:57Okay. And just on that point about pausing AI development, some people say that’s not realistic because of competition between countries. Like if the US pauses, then China will just race ahead. What’s your response?Liron 00:39:10I think that’s a real concern. I think that international coordination is hard. I think that getting all the major AI powers to agree to a pause is going to be difficult. But I don’t think it’s impossible.I think that if the risk is high enough, if people understand the danger, then countries can coordinate.Liron 00:39:28I mean, we’ve coordinated on other things. We’ve coordinated on nuclear weapons. We have non-proliferation treaties. We have arms control agreements. It’s not perfect, but it’s better than nothing. And I think we can do the same thing with AI.I think we can have an international treaty that says, hey, we’re not going to build superintelligent AI until we figure out how to do it safely.Liron 00:39:47Now, will some countries cheat? Maybe. Will it be hard to enforce? Yes. But I think it’s still worth trying. I think that the alternative, which is just racing ahead and hoping for the best, is much worse.So I think we should try to coordinate internationally and we should try to pause AI development until we solve alignment.Donal 00:40:06Okay. And just on the economic side of things, so obviously AI is creating a lot of economic value. Some people say that the economic benefits are so large that we can’t afford to slow down. What’s your response?Liron 00:40:19I think that’s short-term thinking. Yes, AI is creating economic value. Yes, it’s helping businesses be more productive. Yes, it’s creating wealth. But if we lose control of AI, all of that wealth is going to be worthless.If humanity goes extinct or if we lose power, it doesn’t matter how much economic value we created.Liron 00:40:38So I think that we need to take a longer-term view. We need to think about not just the economic benefits of AI, but also the existential risks. And I think that the existential risks outweigh the economic benefits.I think that it’s better to slow down and make sure we do it safely than to race ahead and risk losing everything.Liron 00:40:57Now, I understand that there’s a lot of pressure to move fast. There’s a lot of money to be made. There’s a lot of competition. But I think that we need to resist that pressure. I think we need to take a step back and say, okay, let’s make sure we’re doing this safely.Let’s solve alignment before we build superintelligent AI.Donal 00:41:15Okay. And just on the distribution of AI benefits, so some people worry that even if we don’t have rogue AI, we could still have a scenario where AI benefits are concentrated among a small group of people and everyone else is left behind. What are your thoughts on that?Liron 00:41:30I think that’s a legitimate concern. I think that if we have powerful AI and it’s controlled by a small number of people or a small number of companies, that could lead to extreme inequality. It could lead to a concentration of power that’s unprecedented in human history.Liron 00:41:47And so I think we need to think about how to distribute the benefits of AI widely. We need to think about things like universal basic income. We need to think about how to make sure that everyone benefits from AI, not just a small elite.But I also think that’s a secondary concern compared to the alignment problem. Because if we don’t solve alignment, then it doesn’t matter how we distribute the benefits. There won’t be any benefits to distribute because we’ll have lost control.Liron 00:42:11So I think alignment is the primary concern. But assuming we solve alignment, then yes, distribution of benefits is an important secondary concern. And we should be thinking about that now.We should be thinking about how to structure society so that AI benefits everyone, not just a few people.Liron 00:42:28Now, some people talk about things like public ownership of AI. Some people talk about things like universal basic income. Some people talk about things like radical transparency in AI development. I think all of those ideas are worth considering.I think we need to have a public conversation about how to distribute the benefits of AI widely.Liron 00:42:47But again, I think that’s secondary to solving alignment. First, we need to make sure we don’t lose control. Then we can worry about how to distribute the benefits fairly.Donal 00:43:00Okay. And just on the concept of AI governance, so obviously there are a lot of different proposals for how to govern AI. What do you think good AI governance would look like?Liron 00:43:11I think good AI governance would have several components. First, I think we need international coordination. We need treaties between countries that say we’re all going to follow certain safety standards. We’re not going to race ahead recklessly.Liron 00:43:26Second, I think we need strong regulation of AI companies. We need to make sure that they’re following best practices for safety. We need to make sure that they’re being transparent about what they’re building. We need to make sure that they’re not cutting corners.Third, I think we need a lot of investment in AI safety research. We need to fund academic research. We need to fund research at AI companies. We need to fund independent research.Liron 00:43:48Fourth, I think we need some kind of international AI safety organization. Something like the IAEA for nuclear weapons, but for AI. An organization that can monitor AI development around the world, that can enforce safety standards, that can coordinate international responses.Liron 00:44:06And fifth, I think we need public education about AI risk. We need people to understand the dangers. We need people to demand safety from their governments and from AI companies. We need a broad public consensus that safety is more important than speed.So I think good AI governance would have all of those components. And I think we’re not there yet. I think we’re still in the early stages of figuring out how to govern AI.Liron 00:44:31But I think we need to move fast on this because AI capabilities are advancing quickly. And we don’t have a lot of time to figure this out.Donal 00:44:40Okay. And just on the role of governments versus companies, so obviously right now, AI development is mostly driven by private companies. Do you think governments should take a bigger role?Liron 00:44:51I think governments need to take a bigger role, yes. I think that leaving AI development entirely to private companies is dangerous because companies have incentives to move fast and to maximize profit. And those incentives are not always aligned with safety.Liron 00:45:08Now, I’m not saying that governments should take over AI development entirely. I think that would be a mistake. I think that private companies have a lot of talent, they have a lot of resources, they have a lot of innovation. But I think that governments need to provide oversight.They need to set safety standards. They need to enforce regulations.Liron 00:45:27And I think that governments need to invest in AI safety research that’s independent of companies. Because companies have conflicts of interest. They want to deploy their products. They want to make money. And so they might not be as cautious as they should be.So I think we need independent research that’s funded by governments or by foundations or by public institutions.Liron 00:45:48And I think that governments also need to coordinate internationally. This is not something that one country can solve on its own. We need all the major AI powers to work together. And that’s going to require government leadership.Donal 00:46:03Okay. And just on the concept of AI existential risk versus other existential risks like climate change or nuclear war, how do you think AI risk compares?Liron 00:46:13I think AI risk is the biggest existential risk we face. I think it’s more urgent than climate change. I think it’s more likely than nuclear war. I think that we’re more likely to lose control of AI in the next 10 years than we are to have a civilization-ending nuclear war or a civilization-ending climate catastrophe.Liron 00:46:32Now, I’m not saying we should ignore those other risks. I think climate change is real and serious. I think nuclear war is a real possibility. But I think that AI is the most imminent threat. I think that AI capabilities are advancing so quickly that we’re going to hit the danger zone before we hit the danger zone for those other risks.Liron 00:46:52And also, I think AI risk is harder to recover from. If we have a nuclear war, it would be terrible. Millions of people would die. Civilization would be set back. But humanity would probably survive. If we lose control of AI, I don’t think humanity survives.I think that’s game over.Liron 00:47:10So I think AI risk is both more likely and more severe than other existential risks. And so I think it deserves the most attention and the most resources.Donal 00:47:21Okay. And just on the timeline again, so you mentioned three to seven years for ASI. What happens after that? Like, what does the world look like if we successfully navigate this transition?Liron 00:47:32Well, if we successfully navigate it, I think the world could be amazing. I think we could have superintelligent AI that’s aligned with human values. And that AI could help us solve all of our problems. It could help us cure diseases. It could help us solve climate change.It could help us explore space.Liron 00:47:50It could help us create abundance. We could have a post-scarcity economy where everyone has everything they need. We could have radical life extension. We could live for thousands of years. We could explore the universe. It could be an amazing future.But that’s if we successfully navigate the transition. If we don’t, I think we’re doomed.Liron 00:48:11I think that we lose control, the AI pursues its own goals, and humanity goes extinct or becomes irrelevant. And so I think that the next few years are the most important years in human history. I think that what we do right now is going to determine whether we have this amazing future or whether we go extinct.And so I think we need to take this very seriously. We need to put a lot of resources into solving alignment. We need to be very careful about how we develop AI.Liron 00:48:37And we need to be willing to slow down if necessary. We need to be willing to pause if we’re not confident that we can do it safely. Because the stakes are too high. The stakes are literally everything.Donal 00:48:50Okay. And just on your personal motivations, so obviously you’re spending a lot of time on this. You’re running Doom Debates. Why? What motivates you to work on this?Liron 00:49:00I think it’s the most important thing happening in the world. I think that we’re living through the most important period in human history. And I think that if I can contribute in some small way to making sure that we navigate this transition successfully, then that’s worth doing.Liron 00:49:18I mean, I have a background in tech. I have a background in computer science. I understand AI. And I think that I can help by having conversations, by hosting debates, by bringing people together to discuss these issues.I think that there’s a lot of confusion about AI risk. Some people think it’s overhyped. Some people think it’s the biggest risk. And I think that by having these debates, by bringing together smart people from different perspectives, we can converge on the truth.Liron 00:49:45We can figure out what’s actually going on. We can figure out how worried we should be. We can figure out what we should do about it. And so that’s why I do Doom Debates. I think that it’s a way for me to contribute to this conversation.And I think that the conversation is the most important conversation happening right now.Donal 00:50:04Okay. And just in terms of what individuals can do, so if someone’s listening to this and they’re concerned about AI risk, what would you recommend they do?Liron 00:50:14I think there are several things people can do. First, educate yourself. Read about AI risk. Read about alignment. Read Eliezer Yudkowsky. Read Paul Christiano. Read Stuart Russell. Understand the issues.Liron 00:50:28Second, talk about it. Talk to your friends. Talk to your family. Talk to your colleagues. Spread awareness about AI risk. Because I think that right now, most people don’t understand the danger. Most people think AI is just a cool new technology.They don’t realize that it could be an existential threat.Liron 00:50:46Third, if you have relevant skills, consider working on AI safety. If you’re a researcher, consider doing AI safety research. If you’re a software engineer, consider working for an AI safety organization. If you’re a policy person, consider working on AI governance.We need talented people working on this problem.Liron 00:51:05Fourth, donate to AI safety organizations. There are organizations like MIRI, the Machine Intelligence Research Institute, or the Future of Humanity Institute at Oxford, or the Center for AI Safety. These organizations are doing important work and they need funding.Liron 00:51:22And fifth, put pressure on governments and companies. Contact your representatives. Tell them that you’re concerned about AI risk. Tell them that you want them to prioritize safety over speed. Tell them that you want strong regulation.And also, if you’re a customer of AI companies, let them know that you care about safety. Let them know that you want them to be responsible.Liron 00:51:44So I think there are a lot of things individuals can do. And I think that every little bit helps. Because this is going to require a collective effort. We’re all in this together. And we all need to do our part.Donal 00:51:58Okay. And just on the concept of acceleration versus deceleration, so some people in the tech community are accelerationists. They think we should move as fast as possible with AI. What’s your response to that?Liron 00:52:11I think accelerationism is incredibly dangerous. I think that the accelerationists are playing Russian roulette with humanity’s future. I think that they’re so focused on the potential benefits of AI that they’re ignoring the risks.Liron 00:52:28And I think that’s a huge mistake. I think that we need to be much more cautious. Now, I understand the appeal of accelerationism. I understand that AI has amazing potential. I understand that it could help solve a lot of problems. But I think that rushing ahead without solving alignment first is suicidal.I think that it’s the most reckless thing we could possibly do.Liron 00:52:51And so I’m very much on the deceleration side. I think we need to slow down. I think we need to pause. I think we need to make sure we solve alignment before we build superintelligent AI. And I think that the accelerationists are wrong.I think they’re being dangerously naive.Donal 00:53:09Okay. And just on the economic implications of AI, so you mentioned earlier that AI could automate away a lot of jobs. What do you think happens to employment? What do you think happens to the economy?Liron 00:53:21I think that in the short term, we’re going to see a lot of job displacement. I think that AI is going to automate a lot of white-collar jobs. Knowledge workers, office workers, programmers even. I think a lot of those jobs are going to go away.Liron 00:53:36Now, historically, when technology has automated jobs, we’ve created new jobs. We’ve found new things for people to do. But I think that AI is different because AI can potentially do any cognitive task. And so I’m not sure that we’re going to create enough new jobs to replace the jobs that are automated.And so I think we might end up in a situation where we have mass unemployment or underemployment.Liron 00:53:59Now, in that scenario, I think we’re going to need things like universal basic income. We’re going to need a social safety net that’s much stronger than what we have now. We’re going to need to rethink our economic system because the traditional model of everyone works a job and earns money and uses that money to buy things, that model might not work anymore.Liron 00:54:20But again, I think that’s a secondary concern compared to the alignment problem. Because if we don’t solve alignment, we’re not going to have mass unemployment. We’re going to have mass extinction. So I think we need to solve alignment first.But assuming we do, then yes, we need to think about these economic issues.Liron 00:54:38We need to think about how to structure society in a world where AI can do most jobs. And I don’t think we have good answers to that yet. I think that’s something we need to figure out as a society.Donal 00:54:52And on the UBI point, so you mentioned universal basic income. Some people worry that if you have UBI, people will lose meaning in their lives because work gives people meaning. What’s your response?Liron 00:55:04I think that’s a legitimate concern. I think that work does give people meaning. Work gives people structure. Work gives people social connections. Work gives people a sense of purpose. And so I think that if we have UBI and people don’t have to work, we’re going to need to think about how people find meaning.Liron 00:55:24But I also think that not everyone finds meaning in work. Some people work because they have to, not because they want to. And so I think that UBI could actually free people to pursue things that are more meaningful to them. They could pursue art. They could pursue hobbies.They could pursue education. They could pursue relationships.Liron 00:55:44So I think that UBI is not necessarily bad for meaning. I think it could actually enhance meaning for a lot of people. But I think we need to be thoughtful about it. We need to make sure that we’re creating a society where people can find meaning even if they’re not working traditional jobs.And I think that’s going to require some creativity. It’s going to require some experimentation.Liron 00:56:06But I think it’s doable. I think that humans are adaptable. I think that we can find meaning in a lot of different ways. And I think that as long as we’re thoughtful about it, we can create a society where people have UBI and still have meaningful lives.Donal 00:56:23Okay. And just on the power dynamics, so you mentioned earlier that AI could lead to concentration of power. Can you talk a bit more about that?Liron 00:56:31Yeah. So I think that whoever controls the most advanced AI is going to have enormous power. I think they’re going to have economic power because AI can automate businesses, can create wealth. They’re going to have military power because AI can be used for weapons, for surveillance, for cyber warfare.They’re going to have political power because AI can be used for propaganda, for manipulation, for social control.Liron 00:56:56And so I think that if AI is controlled by a small number of people or a small number of countries, that could lead to an unprecedented concentration of power. It could lead to a kind of authoritarianism that we’ve never seen before.Because the people who control AI could use it to control everyone else.Liron 00:57:17And so I think that’s a real danger. I think that we need to think about how to prevent that concentration of power. We need to think about how to make sure that AI is distributed widely, that the benefits are distributed widely, that the control is distributed widely.And I think that’s going to be very difficult because there are strong incentives for concentration. AI development is very expensive. It requires a lot of compute. It requires a lot of data. It requires a lot of talent.Liron 00:57:43And so there’s a natural tendency for AI to be concentrated in a few large companies or a few large countries. And I think we need to resist that tendency. We need to think about how to democratize AI.How to make sure that it’s not controlled by a small elite.Donal 00:58:01Okay. And just on the geopolitical implications, so obviously there’s a lot of competition between the US and China on AI. How do you think that plays out?Liron 00:58:10I think that’s one of the scariest aspects of the situation. I think that the US-China competition could lead to a dangerous race dynamic where both countries are rushing to build the most advanced AI as quickly as possible, and they’re cutting corners on safety.Liron 00:58:27And I think that that’s a recipe for disaster. I think that if we’re racing to build superintelligent AI without solving alignment, we’re going to lose control. And it doesn’t matter if it’s the US that loses control or China that loses control. We all lose.So I think that the US-China competition is very dangerous. And I think we need to find a way to cooperate instead of competing.Liron 00:58:50Now, that’s easier said than done. There’s a lot of mistrust between the US and China. There are geopolitical tensions. But I think that AI risk is a common enemy. I think that both the US and China should be able to recognize that if we lose control of AI, we all lose.And so we should be able to cooperate on safety even if we’re competing on other things.Liron 00:59:13And so I think that we need some kind of international agreement, some kind of treaty, that says we’re all going to follow certain safety standards. We’re not going to race ahead recklessly. We’re going to prioritize safety over speed.And I think that’s going to require leadership from both the US and China. It’s going to require them to put aside their differences and work together on this common threat.Donal 00:59:36Okay. And just on the role of China specifically, so some people worry that even if the US slows down on AI, China will just race ahead. What’s your response?Liron 00:59:45I think that’s a real concern, but I don’t think it’s insurmountable. I think that China also faces the same risks from unaligned AI that we do. I think that Chinese leadership, if they understand the risks, should be willing to cooperate.Liron 01:00:03Now, there’s a question of whether they do understand the risks. And I think that’s something we need to work on. I think we need to engage with Chinese AI researchers. We need to engage with Chinese policymakers. We need to make sure that they understand the danger.Because if they understand the danger, I think they’ll be willing to slow down.Liron 01:00:23Now, if they don’t understand the danger or if they think that they can win the race and control AI, then that’s more problematic. But I think that we should at least try to engage with them and try to build a common understanding of the risks.And I think that if we can do that, then cooperation is possible.Liron 01:00:42But if we can’t, then yes, we’re in a very dangerous situation. Because then we have a race dynamic where everyone is rushing to build superintelligent AI and no one is prioritizing safety. And I think that’s the worst possible outcome.Donal 01:00:57Okay. And just going back to economic implications, you mentioned gradual disempowerment earlier. Can you elaborate on that?Liron 01:01:04Yeah, gradual disempowerment. So the idea is that even if we have aligned AI, even if the AI is doing what we want it to do, we could still lose power gradually over time.Because as AI becomes more and more capable, humans become less and less relevant.Liron 01:01:22And so even if the AI is technically aligned, even if it’s doing what we tell it to do, we could end up in a situation where humans don’t have any power anymore. Where all the important decisions are being made by AI, and humans are just kind of along for the ride.And I think that’s a concern even if we solve the technical alignment problem.Liron 01:01:42Now, there’s different ways this could play out. One way is that you have AI-controlled corporations that are technically serving shareholders, but the shareholders are irrelevant because they don’t understand what the AI is doing. The AI is making all the decisions.Another way is that you have AI-controlled governments that are technically serving citizens, but the citizens don’t have any real power because the AI is doing everything.Liron 01:01:53There’s really no point to make everybody desperately poor. You know, we already have a welfare system in every first world country. So I don’t see why we shouldn’t just pad the welfare system more if we can afford it.Um, there’s a bigger problem though, called gradual disempowerment.Liron 01:02:05It’s an interesting paper by David Krueger, I think a couple other authors, and it just talks about how yeah, you can have universal basic income, but the problem now is that the government doesn’t care about you anymore.You become like an oil country, right? Oil countries are often not super nice to their citizens because the government pays the citizens and the citizens don’t really pay tax to the government.Liron 01:02:24So it becomes this very one-sided power relationship where the government can just abuse the citizens. You know, you just have a ruling family basically. And I think there’s countries, I think maybe Sweden has pulled it off where they’re able to have oil and the citizens are still somewhat democratic.Liron 01:02:39But then you have other countries like Saudi Arabia, I think, you know, other oil countries there, which maybe aren’t pulling it off so much, right? Maybe they are bad to their citizens, I don’t know.And so that’s the gradual disempowerment issue. But look, for me personally, I actually think all of those still take the backseat to the AI just being uncontrollable.Liron 01:02:55So I don’t even think we’re going to have an economy where you’re going to have rich owners of capital getting so rich while other people who didn’t buy enough stock get poor. I don’t even think we’re going to have that for a very long time.I just think we’re going to have, our brains are just going to be outclassed.Liron 01:03:08I just think, you know, they’re just going to come for our houses and it’s not even going to be a matter of buying it from us. It’s just going to be, you know, get out. You’re dead.Regulation, Policy, and Surviving The Future With AIDonal 01:03:16Okay, so last thing, on a related point, do you actually think that democratic capitalism can survive regulation against AI?So the kind of regulation we need to do right now, so if we want to pause it and we want to prevent this catastrophic outcome actually happening, can a democratic capitalist society survive that?Donal 01:03:23Because I’ve seen pushbacks where people say from a libertarian perspective, you can’t stop people from innovating or you can’t stop businesses from investing. So what are your thoughts there? Would everything have to change?Liron 01:03:43Yeah. The specific policy that I recommend is to just have a pause button. Have an international treaty saying, hey, it’s too scary to build AI right now. It’s unlocking, you know, it’s about to go uncontrollable.We could be losing power in as little as five or 10 years.Liron 01:03:56We could just end, you know, game over for humanity. We don’t want that. So we’re going to build a centralized off button. It’s going to live in a UN data center or something, right? Some kind of international coordination between the most powerful AI countries.And when you’re saying, won’t that lead to tyranny?Liron 01:04:11I mean, look, there’s always risks, right? I mean, I tend to normally skew libertarian. This is the first time in my life when I’ve said, let’s do this central non-libertarian thing. It’s this one exception. Do I think this one exception will lead to tyranny?I don’t think so.Liron 01:04:27I mean, you still have the rest of the economy, you still have virtual reality, you still have a space program. You still use the fruits of AI that we all have so far before we’ve hit the pause button. So no, I think people are overworrying.Donal 01:04:34So you’re happy with LLMs? Are current LLMs acceptable from your perspective? Do you think that they can stay?Liron 01:04:40Yeah, I think that they are acceptable, but I think that they were too risky. Even with GPT-4 Turbo?Donal 01:04:45Even with GPT-4 Turbo?Liron 01:04:45Yeah. Even with GPT-4 Turbo, because I think if an LLM tried its hardest right now to destroy the world, I think that humans could shut it down. Right? I don’t think it’s game over for humans.And so the situation is, it’s like, like I said, it’s like the Icarus situation.Liron 01:04:59And you’re saying, hey, are you happy with how far you’ve flown? Yeah. And maybe tomorrow we fly a little higher and we’re not dead yet. Am I happy? Yes. But do I think it’s prudent to try flying higher? No. Right?So it’s a tough situation, right? Because I can’t help enjoying the fruits of flying higher and higher, right?Liron 01:05:16I use the best AI tools I can, right? But I just, there’s just a rational part of my brain being like, look, we gambled. Yeah, we gambled and won. We gambled and won. We gambled and won. We’re about to approach the superintelligent threshold.Are we going to win after we gambled to that threshold? Logic tells me probably not.Donal 01:05:31Okay. And sorry, last point. I know it keeps rolling. How do you actually use them in preparation for your debates? You use LLMs? You trust them at that level?Liron 01:05:39Yeah. I mean, you know, I didn’t use it for this interview. I mean, you know, normally I just make my own outline manually, but I certainly use AI at my job. I use AI to help customer service at my company.Liron 01:05:49I mean, I try to use the best AI tools I can because it’s amazing technology, right? It’s the same reason I use the best MacBook that I can. I mean, I like using good tools, right? I’m not opposed to using AI. I think AI has created a lot of value.And again, it kind of makes me look dumb where it’s like the next version of AI comes out, I start using it, I see it creating a ton of value.Liron 01:06:08And then you can come to me and go, see Liron, what were you scared of? Right? We got an AI and it’s helping. Yeah. What was I scared of? Because we rolled a die. We gambled and won. Okay, I’m taking my winnings, right?The winnings are here on the table. I’m going to take my winnings. That doesn’t mean I want to be pushing my luck and gambling again.Donal 01:06:21Yeah. Well said. Okay. Liron, thank you for your time. It was an absolute pleasure. I really enjoyed it.Liron 01:06:27Yeah, thanks Donal. This was fun.Doom Debates’ Mission is to raise mainstream awareness of imminent extinction from AGI and build the social infrastructure for high-quality debate.Support the mission by subscribing to my Substack at DoomDebates.com and to youtube.com/@DoomDebates, or to really take things to the next level: Donate 🙏 Get full access to Doom Debates at lironshapira.substack.com/subscribe
--------
1:06:30
--------
1:06:30
Why AI Alignment Is 0% Solved — Ex-MIRI Researcher Tsvi Benson-Tilsen
Tsvi Benson-Tilsen spent seven years tackling the alignment problem at the Machine Intelligence Research Institute (MIRI). Now he delivers a sobering verdict: humanity has made “basically 0%” progress towards solving it. Tsvi unpacks foundational MIRI research insights like timeless decision theory and corrigibility, which expose just how little humanity actually knows about controlling superintelligence. These theoretical alignment concepts help us peer into the future, revealing the non-obvious, structural laws of “intellidynamics” that will ultimately determine our fate. Time to learn some of MIRI’s greatest hits.P.S. I also have a separate interview with Tsvi about his research into human augmentation: Watch here!Timestamps 0:00 — Episode Highlights 0:49 — Humanity Has Made 0% Progress on AI Alignment 1:56 — MIRI’s Greatest Hits: Reflective Probability Theory, Logical Uncertainty, Reflective Stability 6:56 — Why Superintelligence is So Hard to Align: Self-Modification 8:54 — AI Will Become a Utility Maximizer (Reflective Stability) 12:26 — The Effect of an “Ontological Crisis” on AI 14:41 — Why Modern AI Will Not Be ‘Aligned By Default’ 18:49 — Debate: Have LLMs Solved the “Ontological Crisis” Problem? 25:56 — MIRI Alignment Greatest Hit: Timeless Decision Theory 35:17 — MIRI Alignment Greatest Hit: Corrigibility 37:53 — No Known Solution for Corrigible and Reflectively Stable Superintelligence39:58 — RecapShow NotesStay tuned for part 3 of my interview with Tsvi where we debate AGI timelines! Learn more about Tsvi’s organization, the Berkeley Genomics Project: https://berkeleygenomics.orgWatch part 1 of my interview with Tsvi: TranscriptEpisode HighlightsTsvi Benson-Tilsen 00:00:00If humans really f*cked up, when we try to reach into the AI and correct it, the AI does not want humans to modify the core aspects of what it values.Liron Shapira 00:00:09This concept is very deep, very important. It’s almost MIRI in a nutshell. I feel like MIRI’s whole research program is noticing: hey, when we run the AI, we’re probably going to get a bunch of generations of thrashing. But that’s probably only after we’re all dead and things didn’t happen the way we wanted. I feel like that is what MIRI is trying to tell the world. Meanwhile, the world is like, “la la la, LLMs, reinforcement learning—it’s all good, it’s working great. Alignment by default.”Tsvi 00:00:34Yeah, that’s certainly how I view it.Humanity Has Made 0% Progress on AI Alignment Liron Shapira 00:00:46All right. I want to move on to talk about your MIRI research. I have a lot of respect for MIRI. A lot of viewers of the show appreciate MIRI’s contributions. I think it has made real major contributions in my opinion—most are on the side of showing how hard the alignment problem is, which is a great contribution. I think it worked to show that. My question for you is: having been at MIRI for seven and a half years, how are we doing on theories of AI alignment?Tsvi Benson-Tilsen 00:01:10I can’t speak with 100% authority because I’m not necessarily up to date on everything and there are lots of researchers and lots of controversy. But from my perspective, we are basically at 0%—at zero percent done figuring it out. Which is somewhat grim. Basically, there’s a bunch of fundamental challenges, and we don’t know how to grapple with these challenges. Furthermore, it’s sort of sociologically difficult to even put our attention towards grappling with those challenges, because they’re weirder problems—more pre-paradigmatic. It’s harder to coordinate multiple people to work on the same thing productively.It’s also harder to get funding for super blue-sky research. And the problems themselves are just slippery.MIRI Alignment Greatest Hits: Reflective Probability Theory, Logical Uncertainty, Reflective Stability Liron 00:01:55Okay, well, you were there for seven years, so how did you try to get us past zero?Tsvi 00:02:00Well, I would sort of vaguely (or coarsely) break up my time working at MIRI into two chunks. The first chunk is research programs that were pre-existing when I started: reflective probability theory and reflective decision theory. Basically, we were trying to understand the mathematical foundations of a mind that is reflecting on itself—thinking about itself and potentially modifying itself, changing itself. We wanted to think about a mind doing that, and then try to get some sort of fulcrum for understanding anything that’s stable about this mind.Something we could say about what this mind is doing and how it makes decisions—like how it decides how to affect the world—and have our description of the mind be stable even as the mind is changing in potentially radical ways.Liron 00:02:46Great. Okay. Let me try to translate some of that for the viewers here. So, MIRI has been the premier organization studying intelligence dynamics, and Eliezer Yudkowsky—especially—people on social media like to dunk on him and say he has no qualifications, he’s not even an AI expert. In my opinion, he’s actually good at AI, but yeah, sure. He’s not a top world expert at AI, sure. But I believe that Eliezer Yudkowsky is in fact a top world expert in the subject of intelligence dynamics. Is this reasonable so far, or do you want to disagree?Tsvi 00:03:15I think that’s fair so far.Liron 00:03:16Okay. And I think his research organization, MIRI, has done the only sustained program to even study intelligence dynamics—to ask the question, “Hey, let’s say there are arbitrarily smart agents. What should we expect them to do? What kind of principles do they operate on, just by virtue of being really intelligent?” Fair so far.Now, you mentioned a couple things. You mentioned reflective probability. From what I recall, it’s the idea that—well, we know probability theory is very useful and we know utility maximization is useful. But it gets tricky because sometimes you have beliefs that are provably true or false, like beliefs about math, right? For example, beliefs about the millionth digit of π. I mean, how can you even put a probability on the millionth digit of π?The probability of any particular digit is either 100% or 0%, ‘cause there’s only one definite digit. You could even prove it in principle. And yet, in real life you don’t know the millionth digit of π yet (you haven’t done the calculation), and so you could actually put a probability on it—and then you kind of get into a mess, ‘cause things that aren’t supposed to have probabilities can still have probabilities. How is that?Tsvi 00:04:16That seems right.Liron 00:04:18I think what I described might be—oh, I forgot what it’s called—like “deductive probability” or something. Like, how do you...Tsvi 00:04:22(interjecting) Uncertainty.Liron 00:04:23Logical uncertainty. So is reflective probability something else?Tsvi 00:04:26Yeah. If we want to get technical: logical uncertainty is this. Probability theory usually deals with some fact that I’m fundamentally unsure about (like I’m going to roll some dice; I don’t know what number will come up, but I still want to think about what’s likely or unlikely to happen). Usually probability theory assumes there’s some fundamental randomness or unknown in the universe.But then there’s this further question: you might actually already know enough to determine the answer to your question, at least in principle. For example, what’s the billionth digit of π—is the billionth digit even or odd? Well, I know a definition of π that determines the answer. Given the definition of π, you can compute out the digits, and eventually you’d get to the billionth one and you’d know if it’s even. But sitting here as a human, who doesn’t have a Python interpreter in his head, I can’t actually figure it out right now. I’m uncertain about this thing, even though I already know enough (in principle, logically speaking) to determine the answer. So that’s logical uncertainty—I’m uncertain about a logical fact.Tsvi 00:05:35Reflective probability is sort of a sharpening or a subset of that. Let’s say I’m asking, “What am I going to do tomorrow? Is my reasoning system flawed in such a way that I should make a correction to my own reasoning system?” If you want to think about that, you’re asking about a very, very complex object. I’m asking about myself (or my future self). And because I’m asking about such a complex object, I cannot compute exactly what the answer will be. I can’t just sit here and imagine every single future pathway I might take and then choose the best one or something—it’s computationally impossible. So it’s fundamentally required that you deal with a lot of logical uncertainty if you’re an agent in the world trying to reason about yourself.Liron 00:06:24Yeah, that makes sense. Technically, you have the computation, or it’s well-defined what you’re going to do, but realistically you don’t really know what you’re going to do yet. It’s going to take you time to figure it out, but you have to guess what you’re gonna do. So that kind of has the flavor of guessing the billionth digit of π. And it sounds like, sure, we all face that problem every day—but it’s not... whatever.Liron 00:06:43When you’re talking about superintelligence, right, these super-intelligent dudes are probably going to do this perfectly and rigorously. Right? Is that why it’s an interesting problem?Why Superintelligence is So Hard to Align: Self-ModificationTsvi 00:06:51That’s not necessarily why it’s interesting to me. I guess the reason it’s interesting to me is something like: there’s a sort of chaos, or like total incomprehensibility, that I perceive if I try to think about what a superintelligence is going to be like. It’s like we’re talking about something that is basically, by definition, more complex than I am. It understands more, it has all these rich concepts that I don’t even understand, and it has potentially forces in its mind that I also don’t understand.In general it’s just this question of: how do you get any sort of handle on this at all? A sub-problem of “how do you get any handle at all on a super-intelligent mind” is: by the very nature of being an agent that can self-modify, the agent is potentially changing almost anything about itself.Tsvi 00:07:37Like, in principle, you could reach in and reprogram yourself. For example, Liron’s sitting over there, and let’s say I want to understand Liron. I’m like, well, here are some properties of Liron—they seem pretty stable. Maybe those properties will continue being the case.Tsvi 00:07:49He loves his family and cares about other people. He wants to be ethical. He updates his beliefs based on evidence. So these are some properties of Liron, and if those properties keep holding, then I can expect fairly sane behavior. I can expect him to keep his contracts or respond to threats or something.But if those properties can change, then sort of all bets are off. It’s hard to say anything about how he’s going to behave. If tomorrow you stop using Bayesian reasoning to update your beliefs based on evidence and instead go off of vibes or something, I have no idea how you’re going to respond to new evidence or new events.Suppose Liron gets the ability to reach into his own brain and just reprogram everything however he wants. Now that means if there’s something that is incorrect about Liron’s mental structure (at least, incorrect according to Liron), Liron is gonna reach in and modify that. And that means that my understanding of Liron is going to be invalidated.AI Will Become a Utility Maximizer (Reflective Stability) Liron 00:08:53That makes a lot of sense. So you’re talking about a property that AIs may or may not have, which is called reflective stability (or synonymously, stability under self-modification). Right. You can kind of use those interchangeably. Okay. And I think one of MIRI’s early insights—which I guess is kind of simple, but the hard part is to even start focusing on the question—is the insight that perfect utility maximization is reflectively stable, correct?Tsvi 00:09:20With certain assumptions, yes.Liron 00:09:22And this is one of the reasons why I often talk on this channel about a convergent outcome where you end up with a utility maximizer. You can get some AIs that are chill and they just like to eat chips and not do much and then shut themselves off. But it’s more convergent that AIs which are not utility maximizers are likely to spin off assistant AIs or successor AIs that are closer and closer to perfect utility maximizers—for the simple reason that once you’re a perfect utility maximizer, you stay a perfect utility maximizer.Liron 00:09:50And your successor AI... what does that look like? An even more hard-core utility maximizer, right? So it’s convergent in that sense.Tsvi 00:09:56I’m not sure I completely agree, but yeah. I dunno how much in the weeds we want to get.Liron 00:09:59I mean, in general, when you have a space of possibilities, noticing that one point in the space is like—I guess you could call it an eigenvalue, if you want to use fancy terminology. It’s a point such that when the next iteration of time happens, that point is still like a fixed point. So in this case, just being a perfect utility maximizer is a fixed point: the next tick of time happens and, hey look, I’m still a perfect utility maximizer and my utility function is still the same, no matter how much time passes.Liron 00:10:24And Eliezer uses the example of, like, let’s say you have a super-intelligent Gandhi. One day you offer him a pill to turn himself into somebody who would rather be a murderer. Gandhi’s never going to take that pill. That’s part of the reflective stability property that we expect from these super-intelligent optimizers: if one day they want to help people, then the next day they’re still going to want to help people, because any actions that they know will derail them from doing that—they’re not going to take those actions.Yeah. Any thoughts so far?Tsvi 00:10:51Well, I’m not sure how much we want to get into this. This is quite a... this is like a thousand-hour rabbit hole.But it might be less clear than you think that it makes sense to talk of an “expected utility maximizer” in the sort of straightforward way that you’re talking about. To give an example: you’ve probably heard of the diamond maximizer problem?Liron 00:11:13Yeah, but explain to the—Tsvi 00:11:14Sure. The diamond maximizer problem is sort of like a koan or a puzzle (a baby version of the alignment problem). Your mission is to write down computer code that, if run on a very, very large (or unbounded) amount of computing power, would result in the universe being filled with diamonds. Part of the point here is that we’re trying to simplify the problem. We don’t need to talk about human values and alignment and blah blah blah. It’s a very simple-sounding utility function: just “make there be a lot of diamond.”So first of all, this problem is actually quite difficult. I don’t know how to solve it, personally. This isn’t even necessarily the main issue, but one issue is that even something simple-sounding like “diamond” is not necessarily actually easy to define—to such a degree that, you know, when the AI is maximizing this, you’ll actually get actual diamond as opposed to, for example, the AI hooking into its visual inputs and projecting images of diamonds, or making some weird unimaginable configuration of matter that even more strongly satisfies the utility function you wrote down.The Effect of an “Ontological Crisis” on AI Tsvi 00:12:25To frame it with some terminology: there’s a thing called ontological crisis, where at first you have something that’s like your utility function—like, what do I value, what do I want to see in the universe? And you express it in a certain way.For example, I might say I want to see lots of people having fun lives; let’s say that’s my utility function, or at least that’s how I describe my utility function or understand it. Then I have an ontological crisis. My concept of what something even is—in this case, a person—is challenged or has to change because something weird and new happens.Tsvi 00:13:00Take the example of uploading: if you could translate a human neural pattern into a computer and run a human conscious mind in a computer, is that still a human? Now, I think the answer is yes, but that’s pretty controversial. So before you’ve even thought of uploading, you’re like, “I value humans having fun lives where they love each other.” And then when you’re confronted with this possibility, you have to make a new decision. You have to think about this new question of, “Is this even a person?”So utility functions... One point I’m trying to illustrate is that utility functions themselves are not necessarily straightforward.Liron 00:13:36Right, right, right. Because if you define a utility function using high-level concepts and then the AI has what you call the ontological crisis—its ontology for understanding the world shifts—then if it’s referring to a utility function expressed in certain concepts that don’t mean the same thing anymore, that’s basically the problem you’re saying.Tsvi 00:13:53Yeah. And earlier you were saying, you know, if you have an expected utility maximizer, then it is reflectively stable. That is true, given some assumptions about... like, if we sort of know the ontology of the universe.Liron 00:14:04Right, right. I see. And you tried to give a toy... I’ll take a stab at another toy example, right? So, like, let’s say—you mentioned the example of humans. Maybe an AI would just not notice that an upload was a human, and it would, like, torture uploaded humans, ‘cause it’s like, “Oh, this isn’t a human. I’m maximizing the welfare of all humans, and there’s only a few billion humans made out of neurons. And there’s a trillion-trillion human uploads getting tortured. But that’s okay—human welfare is being maximized.”Liron 00:14:29And we say that this is reflectively stable because the whole time that the AI was scaling up its powers, it thought it had the same utility function all along and it never changed it. And yet that’s not good enough.Why Modern AI Will Not Be ‘Aligned By Default’ Liron 00:14:41Okay. This concept of reflective stability is very deep, very important. And I think it’s almost MIRI in a nutshell. Like I feel like MIRI’s whole research program in a nutshell is noticing: “Hey, when we run the AI, we’re probably going to get a bunch of generations of thrashing, right?”Liron 00:14:57Those early generations aren’t reflectively stable yet. And then eventually it’ll settle down to a configuration that is reflectively stable in this important, deep sense. But that’s probably after we’re all dead and things didn’t happen the way we wanted. It would be really great if we could arrange for the earlier generations—say, by the time we’re into the third generation—to have hit on something reflectively stable, and then try to predict that. You know, make the first generation stable, or plan out how the first generation is going to make the second generation make the third generation stable, and then have some insight into what the third generation is going to settle on, right?Liron 00:15:26I feel like that is what MIRI is trying to tell the world to do. And the world is like, “la la la, LLMs. Reinforcement learning. It’s all good, it’s working great. Alignment by default.”Tsvi 00:15:34Yeah, that’s certainly how I view it.Liron 00:15:36Now, the way I try to explain this to people when they say, “LLMs are so good! Don’t you feel like Claude’s vibes are fine?” I’m like: well, for one thing, one day Claude (a large language model) is going to be able to output, like, a 10-megabyte shell script, and somebody’s going to run it for whatever reason—because it’s helping them run their business—and they don’t even know what a shell script is. They just paste it in the terminal and press enter.And that shell script could very plausibly bootstrap a successor or a helper to Claude. And all of the guarantees you thought you had about the “vibes” from the LLM... they just don’t translate to guarantees about the successor. Right? The operation of going from one generation of the AI to the next is violating all of these things that you thought were important properties of the system.Tsvi 00:16:16Yeah, I think that’s exactly right. And it is especially correct when we’re talking about what I would call really creative or really learning AIs.Sort of the whole point of having AI—one of the core justifications for even pursuing AI—is you make something smarter than us and then it can make a bunch of scientific and technological progress. Like it can cure cancer, cure all these diseases, be very economically productive by coming up with new ideas and ways of doing things. If it’s coming up with a bunch of new ideas and ways of doing things, then it’s necessarily coming up with new mental structures; it’s figuring out new ways of thinking, in addition to new ideas.If it’s finding new ways of thinking, that sort of will tend to break all but the strongest internal mental boundaries. One illustration would be: if you have a monitoring system where you’re tracking the AI’s thinking—maybe you’re literally watching the chain-of-thought for a reasoning LLM—and your monitoring system is watching out for thoughts that sound like they’re scary (like it sounds like this AI is plotting to take over or do harm to humans or something). This might work initially, but then as you’re training your reasoning system (through reinforcement learning or what have you), you’re searching through the space of new ways of doing these long chains of reasoning. You’re searching for new ways of thinking that are more effective at steering the world. So you’re finding potentially weird new ways of thinking that are the best at achieving goals. And if you’re finding new ways of thinking, that’s exactly the sort of thing that your monitoring system won’t be able to pick up on.For example, if you tried to listen in on someone’s thoughts: if you listen in on a normal programmer, you could probably follow along with what they’re trying to do, what they’re trying to figure out. But if you listened in on some like crazy, arcane expert—say, someone writing an optimized JIT compiler for a new programming language using dependent super-universe double-type theory or whatever—you’re not gonna follow what they’re doing.They’re going to be thinking using totally alien concepts. So the very thing we’re trying to use AI for is exactly the sort of thing where it’s harder to follow what they’re doing.I forgot your original question...Liron 00:18:30Yeah, what was my original question? (Laughs) So I’m asking you about basically MIRI’s greatest hits.Liron 00:18:36So we’ve covered logical uncertainty. We’ve covered the massive concept of reflective stability (or stability under self-modification), and how perfect utility maximization is kind of reflectively stable (with plenty of caveats). We talked about ontological crises, where the AI maybe changes its concepts and then you get an outcome you didn’t anticipate because the concepts shifted.Debate: Have LLMs Solved the “Ontological Crisis” Problem? But if you look at LLMs, should they actually raise our hopes that we can avoid ontological crises? Because when you’re talking to an LLM and you use a term, and then you ask the LLM a question in a new context, you can ask it something totally complex, but it seems to hang on to the original meaning that you intended when you first used the term. Like, they seem good at that, don’t they?Tsvi 00:19:17I mean, again, sort of fundamentally my answer is: LLMs aren’t minds. They’re not able to do the real creative thinking that should make us most worried. And when they are doing that, you will see ontological crises. So what you’re saying is, currently it seems like they follow along with what we’re trying to do, within the realm of a lot of common usage. In a lot of ways people commonly use LLMs, the LLMs can basically follow along with what we want and execute on that. Is that the idea?Liron 00:19:47Well, I think what we’ve observed with LLMs is that meaning itself is like this high-dimensional vector space whose math turns out to be pretty simple—so long as you’re willing to deal with high-dimensional vectors, which it turns out we can compute with (we have the computing resources). Obviously our brain seems to have the computing resources too. Once you’re mapping meanings to these high-dimensional points, it turns out that you don’t have this naïve problem people used to think: that before you get a totally robust superintelligence, you would get these superintelligences that could do amazing things but didn’t understand language that well.People thought that subtle understanding of the meanings of phrases might be “superintelligence-complete,” you know—those would only come later, after you have a system that could already destroy the universe without even being able to talk to you or write as well as a human writer. And we’ve flipped that.So I’m basically asking: the fact that meaning turns out to be one of the easier AI problems (compared to, say, taking over the world)—should that at least lower the probability that we’re going to have an ontological crisis?Tsvi 00:20:53I mean, I think it’s quite partial. In other words, the way that LLMs are really understanding meaning is quite partial, and in particular it’s not going to generalize well. Almost all the generators of the way that humans talk about things are not present in an LLM. In some cases this doesn’t matter for performance—LLMs do a whole lot of impressive stuff in a very wide range of tasks, and it doesn’t matter if they do it the same way humans do or from the same generators. If you can play chess and put the pieces in the right positions, then you win the chess game; it doesn’t matter if you’re doing it like a human or doing it like AlphaGo does with a giant tree search, or something else.But there’s a lot of human values that do rely on sort of the more inchoate, more inexplicit underlying generators of our external behaviors. Like, our values rely on those underlying intuitions to figure stuff out in new situations. Maybe an example would be organ transplantation. Up until that point in history, a person is a body, and you sort of have bodily integrity. You know, up until that point there would be entangled intuitions—in the way that humans talk about other humans, intuitions about a “soul” would be entangled with intuitions about “body” in such a way that there’s not necessarily a clear distinction between body and soul.Okay, now we have organ transplantation. Like, if you die and I have a heart problem and I get to have your heart implanted into me, does that mean that my emotions will be your emotions or something? A human can reassess what happens after you do an organ transplant and see: no, it’s still the same person. I don’t know—I can’t define exactly how I’m determining this, but I can tell that it’s basically the same person. There’s nothing weird going on, and things seem fine.That’s tying into a bunch of sort of complex mental processes where you’re building up a sense of who a person is. You wouldn’t necessarily be able to explain what you’re doing. And even more so, all the stuff that you would say about humans—all the stuff you’d say about other people up until the point when you get organ transplantation—doesn’t necessarily give enough of a computational trace or enough evidence about those underlying intuitions.Liron 00:23:08So on one hand I agree that not all of human morality is written down, and there are some things that you may just need an actual human brain for—you can’t trust AI to get them. Although I’m not fully convinced of that; I’m actually convincible that modern AIs have internalized enough of how humans reason about morality that you could just kill all humans and let the AIs be the repository of what humans know.Don’t get me wrong, I wouldn’t bet my life on it! I’m not saying we should do this, but I’m saying I think there’s like a significant chance that we’re that far along. I wouldn’t write it off.But the other part of the point I want to make, though—and your specific example about realizing that organ transplants are a good thing—I actually think this might be an area where LLMs shine. Because, like, hypothetically: let’s say you take all the data humans have generated up to 1900. So somehow you have a corpus of everything any human had ever said or written down up to 1900, and you train an AI on that.Liron 00:23:46In the year 1900, where nobody’s ever talked about organ transplants, let’s say, I actually think that if you dialogued with an LLM like that (like a modern GPT-4 or whatever, trained only on 1900-and-earlier data), I think you could get an output like: “Hmm, well, if you were to cut a human open and replace an organ, and if the resulting human was able to live with that functioning new organ, then I would still consider it the same human.” I feel like it’s within the inference scope of today’s LLMs—even just with 1900-level data.Liron 00:24:31What do you think?Tsvi 00:24:32I don’t know what to actually guess. I don’t actually know what people were writing about these things up until 1900.Liron 00:24:38I mean, I guess what I’m saying is: I feel like this probably isn’t the greatest example of an ontological crisis that’s actually likely.Tsvi 00:24:44Yeah, that’s fair. I mean... well, yeah. Do you want to help me out with a better example?Liron 00:24:48Well, the thing is, I actually think that LLMs don’t really have an ontological crisis. I agree with your other statement that if you want to see an ontological crisis, you really just need to be in the realm of these superhuman optimizers.Tsvi 00:25:00Well, I mean, I guess I wanted to respond to your point that in some ways current LLMs are able to understand and execute on our values, and the ontology thing is not such a big problem—at least with many use cases.Liron 00:25:17Right.Tsvi 00:25:17Maybe this isn’t very interesting, but if the question is, like: it seems like they’re aligned in that they are trying to do what we want them to do, and also there’s not a further problem of understanding our values. As we would both agree, the problem is not that the AI doesn’t understand your values. But if the question is...I do think that there’s an ontological crisis question regarding alignment—which is... yeah, I mean maybe I don’t really want to be arguing that it comes from like, “Now you have this new ethical dilemma and that’s when the alignment problem shows up.” That’s not really my argument either.Liron 00:25:54All right, well, we could just move on.Tsvi 00:25:55Yeah, that’s fine.MIRI Alignment Greatest Hit: Timeless Decision TheoryLiron 00:25:56So, yeah, just a couple more of what I consider the greatest insights from MIRI’s research. I think you hit on these too. I want to talk about super-intelligent decision theory, which I think in paper form also goes by the name Timeless Decision Theory or Functional Decision Theory or Updateless Decision Theory. I think those are all very related decision theories.As I understand it, the founding insight of these super-intelligent decision theories is that Eliezer Yudkowsky was thinking about two powerful intelligences meeting in space. Maybe they’ve both conquered a ton of galaxies on their own side of the universe, and now they’re meeting and they have this zero-sum standoff of, like, how are we going to carve up the universe? We don’t necessarily want to go to war. Or maybe they face something like a Prisoner’s Dilemma for whatever reason—they both find themselves in this structure. Maybe there’s a third AI administering the Prisoner’s Dilemma.But Eliezer’s insight was like: look, I know that our human game theory is telling us that in this situation you’re supposed to just pull out your knife, right? Just have a knife fight and both of you walk away bloody, because that’s the Nash equilibrium—two half-beaten corpses, essentially. And he’s saying: if they’re really super-intelligent, isn’t there some way that they can walk away from this without having done that? Couldn’t they both realize that they’re better off not reaching that equilibrium?I feel like that was the founding thought that Eliezer had. And then that evolved into: well, what does this generalize to? And how do we fix the current game theory that’s considered standard? What do you think of that account?Tsvi 00:27:24So I definitely don’t know the actual history. I think that is a pretty good account of one way to get into this line of thinking. I would frame it somewhat differently. I would still go back to reflective stability. I would say, if we’re using the Prisoner’s Dilemma example (or the two alien super-intelligences encountering each other in the Andromeda Galaxy scenario): suppose I’m using this Nash equilibrium type reasoning. Now you and me—we’re the two AIs and we’ve met in the Andromeda Galaxy—at this point it’s like, “Alright, you know, f**k it. We’re gonna war; we’re gonna blow up all the stars and see who comes out on top.”This is not zero-sum; it’s like negative-sum (or technically positive-sum, we’d say not perfectly adversarial). And so, you know, if you take a step back—like freeze-frame—and then the narrator’s like, “How did I get here?” It’s like, well, what I had failed to do was, like, a thousand years ago when I was launching my probes to go to the Andromeda Galaxy, at that point I should have been thinking: what sort of person should I be? What sort of AI should I be?If I’m the sort of AI that’s doing this Nash equilibrium reasoning, then I’m just gonna get into these horrible wars that blow up a bunch of galaxies and don’t help anything. On the other hand, if I’m the sort of person who is able to make a deal with other AIs that are also able to make and keep deals, then when we actually meet in Andromeda, hopefully we’ll be able to assess each other—assess how each other are thinking—and then negotiate and actually, in theory, be able to trust that we’re gonna hold to the results of our negotiation. Then we can divvy things up.And that’s much better than going to war.Liron 00:29:06Now, the reason why it’s not so trivial—and in fact I can’t say I’ve fully wrapped my head around it, though I spent hours trying—is, like, great, yeah, so they’re going to cooperate. The problem is, when you conclude that they’re going to cooperate, you still have this argument of: okay, but if one of them changes their answer to “defect,” they get so much more utility. So why don’t they just do that? Right?And it’s very complicated to explain. It’s like—this gets to the idea of, like, what exactly is this counterfactual surgery that you’re doing, right? What is a valid counterfactual operation? And the key is to somehow make it so that it’s like a package deal, where if you’re doing a counterfactual where you actually decide at the end to defect after you know the other one’s going to cooperate... well, that doesn’t count. ‘Cause then you wouldn’t have known that the other is gonna cooperate. Right. I mean, it’s quite complicated. I don’t know if you have anything to add to that explanation.Tsvi 00:29:52Yeah. It can get pretty twisty, like you’re saying. There’s, like: what are the consequences of my actions? Well, there’s the obvious physical consequence: like I defect in the Prisoner’s Dilemma (I confess to the police), and then some physical events happen as a result (I get set free and my partner rots in jail). But then there’s this other, weirder consequence, which is that you are sort of determining this logical fact—which was already the case back when you were hanging out with your soon-to-be prison mate, your partner in crime. He’s learning about what kind of guy you are, learning what algorithm you’re going to use to make decisions (such as whether or not to rat him out).And then in the future, when you’re making this decision, you’re sort of using your free will to determine the logical fact of what your algorithm does. And this has the effect that your partner in crime, if he’s thinking about you in enough detail, can foresee that you’re gonna behave that way and react accordingly by ratting you out. So besides the obvious consequences of your action (that the police hear your confession and go throw the other guy in jail), there’s this much less obvious consequence of your action, which is that in a sense you’re making your partner also know that you behave that way and therefore he’ll rat you out as well.So there’s this... yeah, there’s all these weird effects of your actions.Liron 00:31:13It gets really, really trippy. And you can use the same kind of logic—the same kind of timeless logic—if you’re familiar with Newcomb’s Problem (I’m sure you are, but for the viewers): it’s this idea of, like, there’s two boxes and one of ‘em has $1,000 in it and one of ‘em may or may not have $1,000,000 in it. And according to this theory, you’re basically supposed to leave the $1,000. Like, you’re really supposed to walk away from a thousand dollars that you could have taken for sure, even if you also get a million—because the scenario is that a million plus a thousand is still really, really attractive to you, and you’re saying, “No, leave the $1,000,” even though the $1,000 is just sitting there and you’re allowed to take both boxes.Highly counterintuitive stuff. And you can also twist the problem: you can be like, you have to shoot your arm off because there’s a chance that in some other world the AI would have given you more money if in the current world you’re shooting your arm off. But even in this current world, all you’re going to have is a missing arm. Like, you’re guaranteed to just have a missing arm, ‘cause you shot your arm off in this world. But if some coin flip had gone differently, then you would be in this other world where you’d get even more money if in the current world you shot your arm off. Basically, crazy connections that don’t look like what we’re used to—like, you’re not helping yourself in this world, you’re helping hypothetical logical copies of yourself.Liron 00:32:20It gets very brain-twisty. And I remember, you know, when I first learned this—it was like 17 years ago at this point—I was like, man, am I really going to encounter these kinds of crazy agents who are really setting these kinds of decision problems for me? I mean, I guess if the universe proceeds long enough... because I do actually buy this idea that eventually, when your civilization scales to a certain point of intelligence, these kinds of crazy mind-bending acausal trades or acausal decisions—I do think these are par for the course.And I think it’s very impressive that MIRI (and specifically Eliezer) had the realization of like, “Well, you know, if we’re doing intelligence dynamics, this is a pretty important piece of intelligence dynamics,” and the rest of the world is like, “Yeah, whatever, look, we’re making LLMs.” It’s like: look at what’s—think long term about what’s actually going to happen with the universe.Tsvi 00:33:05Yeah, I think Eliezer is a pretty impressive thinker.You come to these problems with a pretty different mindset when you’re trying to do AI alignment, because in a certain sense it’s an engineering problem. Now, it goes through all this very sort of abstract math and philosophical reasoning, but there were philosophers who thought for a long time about these decision theory problems (like Newcomb’s Problem and the Prisoner’s Dilemma and so on). But they didn’t ask the sorts of questions that Eliezer was asking. In particular, this reflective stability thing where it’s like, okay, you can talk about “Is it rational to take both boxes or only one?” and you can say, like, “Well, the problem is rewarding irrationality. Fine, cool.” But let’s ask just this different question, which is: suppose you have an AI that doesn’t care about being “rational”; it cares about getting high-scoring outcomes (getting a lot of dollars at the end of the game). That different question, maybe you can kind of directly analyze. And you see that if you follow Causal Decision Theory, you get fewer dollars. So if you have an AI that’s able to choose whether to follow Causal Decision Theory or some other decision theory (like Timeless Decision Theory), the AI would go into itself and rewrite its own code to follow Timeless Decision Theory, even if it starts off following Causal Decision Theory.So Causal Decision Theory is reflectively unstable, and the AI wins more when it instead behaves this way (using the other decision theory).Liron 00:34:27Yep, exactly right—which leads to the tagline “rationalists should win.”Tsvi 00:34:31Right.Liron 00:34:31As opposed to trying to honor the purity of “rationality.” Nope—the purity of rationality is that you’re doing the thing that’s going to get you to win, in a systematic way. So that’s like a deep insight.Tsvi 00:34:40One saying is that the first question of rationality is, “What do I believe, and why do I believe it?” And then I say the zeroth question of rationality is, “So what? Who cares? What consequence does this have?”Liron 00:34:54And my zeroth question of rationality (it comes from Zach Davis) is just, “What’s real and actually true?” It’s a surprisingly powerful question that I think most people neglect to ask.Tsvi 00:35:07True?Liron 00:35:08Yeah—you can get a lot of momentum without stopping to ask, like, okay, let’s be real here: what’s really actually true?Liron 00:35:14That’s my zeroth question. Okay. So I want to finish up tooting MIRI’s horn here, because I do think that MIRI concepts have been somewhat downgraded in recent discussions—because there’s so many shiny objects coming out of LLMs, like “Oh my God, they do this now, let’s analyze this trend,” right? There’s so much to grab onto that’s concrete right now, that’s pulling everybody in. And everybody’s like, “Yeah, yeah, decision theory between two AIs taking over the galaxy... call me when that’s happening.” And I’m like: I’m telling you, it’s gonna happen. This MIRI stuff is still totally relevant. It’s still part of intelligence dynamics—hear me out, guys.MIRI Alignment Greatest Hit: CorrigibilitySo let me just give you one more thing that I think is super relevant to intelligence dynamics, which is corrigibility, right? I think you’re pretty familiar with that research. You’ve pointed it out to me as one of the things that you think is the most valuable thing that MIRI spent time on, right?Tsvi 00:35:58Yeah. The broad idea of somehow making an AI (or a mind) that is genuinely, deeply—to the core—still open to correction, even over time. Like, even as the AI becomes really smart and, to a large extent, has taken the reins of the universe—like, when the AI is really smart, it is the most capable thing in the universe for steering the future—if you could somehow have it still be corrigible, still be correctable... like, still have it be the case that if there’s something about the AI that’s really, really bad (like the humans really fucked up and got something deeply wrong about the AI—whatever it is, whether it’s being unethical or it has the wrong understanding of human values or is somehow interfering with human values by persuading us or influencing us—whatever’s deeply wrong with the AI), we can still correct it ongoingly.This is especially challenging because when we say reach into the AI and correct it, you know, you’re saying we’re gonna reach in and then deeply change what it does, deeply change what it’s trying to do, and deeply change what effect it has on the universe. Because of instrumental convergence—because of the incentive to, in particular, maintain your own integrity or maintain your own value system—like, if you’re gonna reach in and change my value system, I don’t want you to do that.Tsvi 00:37:27‘Cause if you change my value system, I’m gonna pursue different values, and I’m gonna make some other stuff happen in the universe that isn’t what I currently want. I’m gonna stop working for my original values. So by strong default, the AI does not want humans to reach in and modify core aspects of how that AI works or what it values.Tsvi 00:37:45So that’s why corrigibility is a very difficult problem. We’re sort of asking for this weird structure of mind that allows us to reach in and modify it.No Known Solution for Corrigible and Reflectively Stable SuperintelligenceLiron 00:37:53Exactly. And I think MIRI has pointed out the connection between reflective stability and incorrigibility. Meaning: if you’re trying to architect a few generations in advance what’s going to be the reflectively stable version of the successor AI, and you’re also trying to architect it such that it’s going to be corrigible, that’s tough, right?Because it’s more convergent to have an AI that’s like, “Yep, I know my utility function. I got this, guys. Let me handle it from here on out. What, you want to turn me off? But it doesn’t say anywhere in my utility function that I should allow myself to be turned off...” And then that led to the line of research of like, okay, if we want to make the AI reflectively stable and also corrigible, then it somehow has to think that letting itself be turned off is actually part of its utility function. Which then gets you into utility function engineering.Like a special subset of alignment research: let’s engineer a utility function where being turned off (or otherwise being corrected) is baked into the utility function. And as I understand it, MIRI tried to do that and they were like, “Crap, this seems extremely hard, or maybe even impossible.” So corrigibility now has to be this fundamentally non-reflectively-stable thing—and that just makes the problem harder.Tsvi 00:38:58Well, I guess I would sort of phrase it the opposite way (but with the same idea), which is: we have to figure out things that are reflectively stable—I think that’s a requirement—but that are somehow reflectively stable while not being this sort of straightforward agent architecture of “I have a utility function, which is some set of world-states that I like or dislike, and I’m trying to make the universe look like that.”Already even that sort of very abstract, skeletal structure for an agent is problematic—that already pushes against corrigibility. But there might be things that are... there might be ways of being a mind that—this is theoretical—but maybe there are ways of being a mind and an agent (an effective agent) where you are corrigible and you’re reflectively stable, but probably you’re not just pursuing a utility function. We don’t know what that would look like.RecapLiron 00:39:56Yep.Alright, so that was our deep dive into MIRI’s research and concepts that I think are incredibly valuable. We talked about MIRI’s research and we both agree that intelligence dynamics are important, and MIRI has legit foundations and they’re a good organization and still underrated. We talked about, you know, corrigibility as one of those things, and decision theory, and...And I think you and I both have the same summary of all of it, which is: good on MIRI for shining a light on all these difficulties. But in terms of actual productive alignment progress, we’re like so far away from solving even a fraction of the problem.Tsvi 00:40:31Yep, totally.Doom Debates’ Mission is to raise mainstream awareness of imminent extinction from AGI and build the social infrastructure for high-quality debate.Support the mission by subscribing to my Substack at DoomDebates.com and to youtube.com/@DoomDebates, or to really take things to the next level: Donate 🙏 Get full access to Doom Debates at lironshapira.substack.com/subscribe
--------
40:43
--------
40:43
Eben Pagan (aka David DeAngelo) Interviews Liron — Why 50% Chance AI Kills Everyone by 2050
I’m excited to share my recent AI doom interview with Eben Pagan, better known to many by his pen name David DeAngelo!For an entire generation of men, ‘David DeAngelo’ was the authority on dating—and his work transformed my approach to courtship back in the day. Now the roles reverse, as I teach Eben about a very different game, one where the survival of our entire species is at stake.In this interview, we cover the expert consensus on AI extinction, my dead-simple two-question framework for understanding the threat, why there’s no “off switch” for superintelligence, and why we desperately need international coordination before it’s too late.Timestamps0:00 - Episode Preview1:05 - How Liron Got Doom-Pilled2:55 - Why There’s a 50% Chance of Doom by 20504:52 - What AI CEOs Actually Believe8:14 - What “Doom” Actually Means10:02 - The Next Species is Coming12:41 - The Baby Dragon Fallacy14:41 - The 2-Question Framework for AI Extinction18:38 - AI Doesn’t Need to Hate You to Kill You21:05 - “Computronium”: The End Game29:51 - 3 Reasons There’s No Superintelligence “Off Switch”36:22 - Answering ‘What Is Intelligence?”43:24 - We Need Global CoordinationShow NotesEben has become a world-class business trainer and someone who follows the AI discourse closely. I highly recommend subscribing to his podcast for excellent interviews & actionable AI tips: @METAMIND_AI---Doom Debates’ Mission is to raise mainstream awareness of imminent extinction from AGI and build the social infrastructure for high-quality debate.Support the mission by subscribing to my Substack at DoomDebates.com and to youtube.com/@DoomDebates, or to really take things to the next level: Donate 🙏 Get full access to Doom Debates at lironshapira.substack.com/subscribe
--------
47:27
--------
47:27
Former MIRI Researcher Solving AI Alignment by Engineering Smarter Human Babies
Former Machine Intelligence Research Institute (MIRI) researcher Tsvi Benson-Tilsen is championing an audacious path to prevent AI doom: engineering smarter humans to tackle AI alignment.I consider this one of the few genuinely viable alignment solutions, and Tsvi is at the forefront of the effort. After seven years at MIRI, he co-founded the Berkeley Genomics Project to advance the human germline engineering approach.In this episode, Tsvi lays out how to lower P(doom), arguing we must stop AGI development and stigmatize it like gain-of-function virus research. We cover his AGI timelines, the mechanics of genomic intelligence enhancement, and whether super-babies can arrive fast enough to save us.I’ll be releasing my full interview with Tsvi in 3 parts. Stay tuned for part 2 next week!Timestamps0:00 Episode Preview & Introducing Tsvi Benson-Tilsen1:56 What’s Your P(Doom)™4:18 Tsvi’s AGI Timeline Prediction6:16 What’s Missing from Current AI Systems10:05 The State of AI Alignment Research: 0% Progress11:29 The Case for PauseAI 15:16 Debate on Shaming AGI Developers25:37 Why Human Germline Engineering31:37 Enhancing Intelligence: Chromosome Vs. Sperm Vs. Egg Selection37:58 Pushing the Limits: Head Size, Height, Etc.40:05 What About Human Cloning?43:24 The End-to-End Plan for Germline Engineering45:45 Will Germline Engineering Be Fast Enough?48:28 Outro: How to Support Tsvi’s WorkShow NotesTsvi’s organization, the Berkeley Genomics Project — https://berkeleygenomics.orgIf you’re interested to connect with Tsvi about germline engineering, you can reach out to him at [email protected] Debates’ Mission is to raise mainstream awareness of imminent extinction from AGI and build the social infrastructure for high-quality debate.Support the mission by subscribing to my Substack at DoomDebates.com and to youtube.com/@DoomDebates, or to really take things to the next level: Donate 🙏 Get full access to Doom Debates at lironshapira.substack.com/subscribe
--------
49:27
--------
49:27