Some conceptual highlights from “Disjunctive Scenarios of Catastrophic AI Risk”

My forthcoming paper, “Disjunctive Scenarios of Catastrophic AI Risk”, attempts to introduce a number of considerations to the analysis of potential risks from Artificial General Intelligence (AGI). As the paper is long and occasionally makes for somewhat dry reading, I thought that I would briefly highlight a few of the key points raised in the paper.

The main idea here is that most of the discussion about risks of AGI has been framed in terms of a scenario that goes something along the lines of “a research group develops AGI, that AGI develops to become superintelligent, escapes from its creators, and takes over the world”. While that is one scenario that could happen, focusing too much on any single scenario makes us more likely to miss out alternative scenarios. It also makes the scenarios susceptible to criticism from people who (correctly!) point out that we are postulating very specific scenarios that have lots of burdensome details.

To address that, I discuss here a number of considerations that suggest disjunctive paths to catastrophic outcomes: paths that are of the form “A or B or C could happen, and any one of them happening could have bad consequences”.

Superintelligence versus Crucial Capabilities

Bostrom’s Superintelligence, as well as a number of other sources, basically make the following argument:

  1. An AGI could become superintelligent
  2. Superintelligence would enable the AGI to take over the world

This is an important argument to make and analyze, since superintelligence basically represents an extreme case: if an individual AGI may become as powerful as it gets, how do we prepare for that eventuality? As long as there is a plausible chance for such an extreme case to be realized, it must be taken into account.

However, it is probably a mistake to focus only on the case of superintelligence. Basically, the reason why we are interested in a superintelligence is that, by assumption, it has the cognitive capabilities necessary for a world takeover. But what about an AGI which also had the cognitive capabilities necessary for taking over the world, and only those?

Such an AGI might not count as a superintelligence in the traditional sense, since it would not be superhumanly capable in every domain. Yet, it would still be one that we should be concerned about. If we focus too much on just the superintelligence case, we might miss the emergence of a “dumb” AGI which nevertheless had the crucial capabilities necessary for a world takeover.

That raises the question of what might be such crucial capabilities. I don’t have a comprehensive answer; in my paper, I focus mostly on the kinds of capabilities that could be used to inflict major damage: social manipulation, cyberwarfare, biological warfare. Others no doubt exist.

A possibly useful framing for future investigations might be, “what level of capability would an AGI need to achieve in a crucial capability in order to be dangerous”, where the definition of “dangerous” is free to vary based on how serious of a risk we are concerned about. One complication here is that this is a highly contextual question – with a superintelligence we can assume that the AGI may get basically omnipotent, but such a simplifying assumption won’t help us here. For example, the level of offensive biowarfare capability that would pose a major risk, depends on the level of the world’s defensive biowarfare capabilities. Also, we know that it’s possible to inflict enormous damage to humanity even with just human-level intelligence: whoever is authorized to control the arsenal of a nuclear power could trigger World War III, no superhuman smarts needed.

Crucial capabilities are a disjunctive consideration because they show that superintelligence isn’t the only level of capability that would pose a major risk: and there many different combinations of various capabilities – including ones that we don’t even know about yet – that could pose the same level of danger as superintelligence.

Incidentally, this shows one reason why the common criticism of “superintelligence isn’t something that we need to worry about because intelligence isn’t unidimensional” is misfounded – the AGI doesn’t need to be superintelligent in every dimension of intelligence, just the ones we care about.

How would the AGI get free and powerful?

In the prototypical AGI risk scenario, we are assuming that the developers of the AGI want to keep it strictly under control, whereas the AGI itself has a motive to break free. This has led to various discussions about the feasibility of “oracle AI” or “AI confinement” – ways to restrict the AGI’s ability to act freely in the world, while still making use of it. This also means that the AGI might have a hard time acquiring the resources that it needs for a world takeover, since it either has to do so while it is under constant supervision by its creators, or while on the run from them.

However, there are also alternative scenarios where the AGI’s creators voluntarily let it free – or even place it in control of e.g. a major corporation, free to use that corporation’s resources as it desires! My chapter discusses several ways by which this could happen: i) economic benefit or competitive pressure, ii) criminal or terrorist reasons, iii) ethical or philosophical reasons, iv) confidence in the AI’s safety, as well as v) desperate circumstances such as being otherwise close to death. See the chapter for more details on each of these. Furthermore, the AGI could remain theoretically confined but be practically in control anyway – such as in a situation where it was officially only giving a corporation advice, but its advice had never been wrong before and nobody wanted to risk their jobs by going against the advice.

Would the Treacherous Turn involve a Decisive Strategic Advantage?

Looking at crucial capabilities in a more fine-grained manner also raises the question of when an AGI would start acting against humanity’s interests. In the typical superintelligence scenario, we assume that it will do so once it is in a position to achieve what Bostrom calls a Decisive Strategic Advantage (DSA): “a level of technological and other advantages sufficient to enable [an AI] to achieve complete world domination”. After all, if you are capable of achieving superintelligence and a DSA, why act any earlier than that?

Even when dealing with superintelligences, however, the case isn’t quite as clear-cut. Suppose that there are two AGI systems, each potentially capable of achieving a DSA if they prepare for long enough. But the longer that they prepare, the more likely it becomes that the other AGI sets its plans in motion first, and achieves an advantage over the other. Thus, if several AGI projects exist, each AGI is incentivized to take action at such a point which maximizes its overall probability of success – even if the AGI only had rather slim chances of succeeding in the takeover, if it thought that waiting for longer would make its chances even worse.

Indeed, an AGI which defects on its creators may not be going for a world takeover in the first place: it might, for instance, simply be trying to maneuver itself into a position where it can act more autonomously and defeat takeover attempts by other, more powerful AGIs. The threshold for the first treacherous turn could vary quite a bit, depending on the goals and assets of the different AGIs; various considerations are discussed in the paper.

A large reason for analyzing these kinds of scenarios is that, besides caring about existential risks, we also care about catastrophic risks – such as an AGI acting too early and launching a plan which resulted in “merely” hundreds of millions of deaths. My paper introduces the term Major Strategic Advantage, defined as “a level of technological and other advantages sufficient to pose a catastrophic risk to human society”. A catastrophic risk is one that might inflict serious damage to human well-being on a global scale and cause ten million or more fatalities.

“Mere” catastrophic risks could also turn into existential ones, if they contribute to global turbulence (Bostrom et al. 2017), a situation in which existing institutions are challenged, and coordination and long-term planning become more difficult. Global turbulence could then contribute to another out-of-control AI project failing even more catastrophically and causing even more damage

Summary table and example scenarios

The table below summarizes the various alternatives explored in the paper.

AI’s level of strategic advantage
  • Decisive
  • Major
AI’s capability threshold for non-cooperation
  • Very low to very high, depending on various factors
Sources of AI capability
  • Individual takeoff
    • Hardware overhang
    • Speed explosion
    • Intelligence explosion
  • Collective takeoff
  • Crucial capabilities
    • Biowarfare
    • Cyberwarfare
    • Social manipulation
    • Something else
  • Gradual shift in power
Ways for the AI to achieve autonomy
  • Escape
    • Social manipulation
    • Technical weakness
  • Voluntarily released
    • Economic or competitive reasons
    • Criminal or terrorist reasons
    • Ethical or philosophical reasons
    • Desperation
    • Confidence
      • in lack of capability
      • in values
  • Confined but effectively in control
Number of AIs
  • Single
  • Multiple

And here are some example scenarios formed by different combinations of them:

The classic takeover

(Decisive strategic advantage, high capability threshold, intelligence explosion, escaped AI, single AI)

The “classic” AI takeover scenario: an AI is developed, which eventually becomes better at AI design than its programmers. The AI uses this ability to undergo an intelligence explosion, and eventually escapes to the Internet from its confinement. After acquiring sufficient influence and resources in secret, it carries out a strike against humanity, eliminating humanity as a dominant player on Earth so that it can proceed with its own plans unhindered.

The gradual takeover

(Major strategic advantage, high capability threshold, gradual shift in power, released for economic reasons, multiple AIs)

Many corporations, governments, and individuals voluntarily turn over functions to AIs, until we are dependent on AI systems. These are initially narrow-AI systems, but continued upgrades push some of them to the level of having general intelligence. Gradually, they start making all the decisions. We know that letting them run things is risky, but now a lot of stuff is built around them, it brings a profit and they’re really good at giving us nice stuff—for the while being.

The wars of the desperate AIs

(Major strategic advantage, low capability threshold, crucial capabilities, escaped AIs, multiple AIs)

Many different actors develop AI systems. Most of these prototypes are unaligned with human values and not yet enormously capable, but many of these AIs reason that some other prototype might be more capable. As a result, they attempt to defect on humanity despite knowing their chances of success to be low, reasoning that they would have an even lower chance of achieving their goals if they did not defect. Society is hit by various out-of-control systems with crucial capabilities that manage to do catastrophic damage before being contained.

Is humanity feeling lucky?

(Decisive strategic advantage, high capability threshold, crucial capabilities, confined but effectively in control, single AI)

Google begins to make decisions about product launches and strategies as guided by their strategic advisor AI. This allows them to become even more powerful and influential than they already are. Nudged by the strategy AI, they start taking increasingly questionable actions that increase their power; they are too powerful for society to put a stop to them. Hard-to-understand code written by the strategy AI detects and subtly sabotages other people’s AI projects, until Google establishes itself as the dominant world power.

This blog post was written as part of work for the Foundational Research Institute.

On not getting swept away by mental content

There’s a specific subskill of meditation that I call “not getting swept away by the content”, that I think is generally valuable.

It goes like this. You sit down to meditate and focus on your breath or whatever, and then a worrying thought comes to your mind. And it’s a real worry, something important. And you are tempted to start thinking about it and pondering it and getting totally distracted from your meditation… because this is something that you should probably be thinking about, at some point.

So there’s a mental motion that you make, where you note that you are getting distracted by the content of a thought. The worry, even if valid, is content. If you start thinking about whether you should be engaging with the worry, those thoughts are also content.

And you are meditating, meaning that this is the time when you shouldn’t be focusing on content. Anything that is content, you dismiss, without examining what that content is.

So you dismiss the worry. It was real and important, but it was content, so you are not going to think about it now.

You feel happy about having dismissed the content, and you start thinking about how good of a meditator you are, and… realize that this, too, is a thought that you are getting distracted by.

So you dismiss that thought, too. Doesn’t matter what the content of the thought is, now is not the time.

And then you keep letting go of thoughts that came to your mind, but that doesn’t seem to do anything and you start to wonder whether you are doing this meditation thing right… and aha, that’s content too. So you dismiss that…

The thing that is going on here is that usually, when you experience a distracting thought and want to get rid of it, you often start engaging in an evaluation process of whether that thought should be dismissed or not. By doing so, you may end up engaging with the thought’s own internal logic – which might be totally wrong for the situation.

Yes, maybe your relationship is in tatters and your partner is about to leave you. And maybe there are things that you can do to avoid that fate. Or maybe there are not. But if you try to dismiss the thought by disputing the truth or importance of those things, you will fail. Because they are true and important.

The way to short-circuit that is to move the evaluation a meta-level up and just decide that whatever is content, gets dismissed on that basis. Doesn’t matter if it’s true. It’s content, so not what you are doing now. You avoid getting entangled up with the thought’s internal logic, because you never engage with the internal logic in the first place.

Having this mental motion available to you is also useful outside meditation, if you are prone to having any other thoughts that aren’t actually useful.

As I write this, I’m sitting at a food place, eating the food and watching the traffic outside. And, like I often am, I am bothered by pessimistic thoughts about the future of humanity, and all the different disasters that could befall the world.

Yeah, I could live to see the day when AIs destroy the world, or worse.

That’s true.

That’s also content. I’m not going to engage with that content right now.


I look outside the window, watch cars pass by, and finish my dinner.

The food is tasty.

Papers for 2017

I had three new papers either published or accepted into publication last year; all of them are now available online:

  • How Feasible is the Rapid Development of Artificial Superintelligence? Physica Scripta 92 (11), 113001.
    • Abstract: What kinds of fundamental limits are there in how capable artificial intelligence (AI) systems might become? Two questions in particular are of interest: 1) How much more capable could AI become relative to humans, and 2) how easily could superhuman capability be acquired? To answer these questions, we will consider the literature on human expertise and intelligence, discuss its relevance for AI, and consider how AI could improve on humans in two major aspects of thought and expertise, namely simulation and pattern recognition. We find that although there are very real limits to prediction, it seems like AI could still substantially improve on human intelligence.
    • Links: published version (paywalled), free preprint.
  • Disjunctive Scenarios of Catastrophic AI Risk. AI Safety and Security (Roman Yampolskiy, ed.), CRC Press. Forthcoming.
    • Abstract: ​ Artificial intelligence (AI) safety work requires an understanding of what could cause AI to become unsafe. This chapter seeks to provide a broad look at the various ways in which the development of AI sophisticated enough to have general intelligence could lead to it becoming powerful enough to cause a catastrophe. In particular, the present chapter seeks to focus on the way that various risks are disjunctive—on how there are multiple different ways by which things could go wrong, any one of which could lead to disaster. We cover different levels of a strategic advantage an AI might acquire, alternatives for the point where an AI might decide to turn against humanity, different routes by which an AI might become dangerously capable, ways by which the AI might acquire autonomy, and scenarios with varying number of AIs. Whereas previous work has focused on risks specifically only from superintelligent AI, this chapter also discusses crucial capabilities that could lead to catastrophic risk and which could emerge anywhere on the path from near-term “narrow AI” to full-blown superintelligence.
    • Links: free preprint.
  • Superintelligence as a Cause or Cure for Risks of Astronomical Suffering. Informatica 41 (4).
    • (with Lukas Gloor)
    • Abstract: Discussions about the possible consequences of creating superintelligence have included the possibility of existential risk , often understood mainly as the risk of human extinction. We argue that suffering risks (s-risks) , where an adverse outcome would bring about severe suffering on an astronomical scale, are risks of a comparable severity and probability as risks of extinction. Preventing them is the common interest of many different value systems. Furthermore, we argue that in the same way as superintelligent AI both contributes to existential risk but can also help prevent it, superintelligent AI can both be a suffering risk or help avoid it. Some types of work aimed at making superintelligent AI safe will also help prevent suffering risks, and there may also be a class of safeguards for AI that helps specifically against s-risks.
    • Links: published version (open access).

In addition, my old paper Responses to Catastrophic AGI Risk (w/ Roman Yampolskiy) was republished, with some minor edits, as the book chapters “Risks of the Journey to the Singularity” and “Responses to the Journey to the Singularity”, in The Technological Singularity: Managing the Journey (Victor Callaghan et al, eds.), Springer-Verlag.

Fixing science via a basic income

I ran across Ed Hagen’s article “Academic success is either a crapshoot or a scam”, which pointed out that all the methodological discussion about science’s replication crisis is kinda missing the point: yes, all of the methodological stuff like p-hacking is something that would be valuable to fix, but the real problem is in the incentives created by the crazy publish-or-perish culture:

In my field of anthropology, the minimum acceptable number of pubs per year for a researcher with aspirations for tenure and promotion is about three. This means that, each year, I must discover three important new things about the world. […]

Let’s say I choose to run 3 studies that each has a 50% chance of getting a sexy result. If I run 3 great studies, mother nature will reward me with 3 sexy results only 12.5% of the time. I would have to run 9 studies to have about a 90% chance that at least 3 would be sexy enough to publish in a prestigious journal.

I do not have the time or money to run 9 new studies every year.

I could instead choose to investigate phenomena that are more likely to yield strong positive results. If I choose to investigate phenomena that are 75% likely to yield such results, for instance, I would only have to run about 5 studies (still too many) for mother nature to usually grace me with at least 3 positive results. But then I run the risk that these results will seem obvious, and not sexy enough to publish in prestigious journals.

To put things in deliberately provocative terms, empirical social scientists with lots of pubs in prestigious journals are either very lucky, or they are p-hacking.

I don’t really blame the p-hackers. By tying academic success to high-profile publications, which, in turn, require sexy results, we academic researchers have put our fates in the hands of a fickle mother nature. Academic success is therefore either a crapshoot or, since few of us are willing to subject the success or failure of our careers to the roll of the dice, a scam.

The article then suggests that the solution would be to have better standards for research, and also blames prestigious journal publishers for exploiting their monopoly on the field. I think that looking at the researcher incentives is indeed the correct thing to do here, but I’m not sure the article goes deep enough with it. Mainly, it doesn’t ask the obvious question of why researchers have such a crazy pressure to publish: it’s not the journals that set the requirements for promotion or getting to the tenure track, that’s the universities and research institutions. The journals are just exploiting a lucrative situation that someone else created.

Rather my understanding is that the real problem is that there are simply too many PhD graduates who want to do research, relative to the number of researcher positions available. It’s a basic fact of skill measurement that if you try to measure skill and then pick people based on how well they performed on your measure, you’re actually selecting for skill + luck rather than pure skill. If the number of people you pick is small enough relative to the number of applicants, anyone you pick has to be both highly skilled and highly lucky; simply being highly skilled isn’t enough to make it to the top. This is the situation we have with current science, and as Hagen points out, it leads to rampant cheating when people realize that they have to cheat in order to make the cut. As long as this is the situation, there will remain an incentive to cheat.

This looks hard to fix; two obvious solutions would be to reduce the number of graduate students or to massively increase the number of research jobs. The first is politically challenging, especially since it would require international coordination and lots of nations view the number of graduating PhDs as a status symbol. The second would be expensive and thus also politically challenging. One thing that some of my friends also suggested was some kind of a researchers’ basic income (or just a universal basic income in general); for fields in which doing research isn’t much more expensive than covering the researchers’ cost of living, a lot of folks would probably be happy to do research just on the basic income.

A specific suggestion that was thrown out was to give some number of post-docs a 10-year grant of 2000 euros/month; depending on the exact number of grants given out, this could fund quite a number of researchers while still being cheap in comparison to any given country’s general research and education expenses. The existence of better-paid and more prestigious formal research positions like university professorships would still exist as an incentive to actually do the research, and historically quite a lot of research has been done by people with no financial incentive for it anyway (Einstein doing his research on the side while working at the patent office maybe being the most famous example); the fact that most researchers are motivated by the pure desire to do science is already shown by the fact that anyone at all decides to go to academia today. A country being generous handing out these kinds of grants also has the potential to be made into an international status symbol, creating the incentive to actually do this. Alternatively, this could just be viewed as yet another reason to just push for a universal basic income for everyone.

EDIT: Jouni Sirén made the following interesting comment in response to this article: “I think the root issue goes deeper than that. There are too many PhD graduates who want to do research, because money and prestige are insufficient incentives for a large part of the middle class. Too many people want a job that is interesting or meaningful, and nobody is willing to support all of them financially.” That’s an even deeper reason than the one I was thinking of!