Disjunctive AI risk scenarios: AIs gaining the power to act autonomously

Previous post in series: AIs gaining a decisive advantage

Series summary: Arguments for risks from general AI are sometimes criticized on the grounds that they rely on a series of linear events, each of which has to occur for the proposed scenario to go through. For example, that a sufficiently intelligent AI could escape from containment, that it could then go on to become powerful enough to take over the world, that it could do this quickly enough without being detected, etc. The intent of this series of posts is to briefly demonstrate that AI risk scenarios are in fact disjunctive: composed of multiple possible pathways, each of which could be sufficient by itself. To successfully control the AI systems, it is not enough to simply block one of the pathways: they all need to be dealt with.

Previously, I drew on arguments from my and Roman Yampolskiy’s paper Responses to Catastrophic AGI Risk, to argue that there are several alternative ways by which AIs could gain a decisive advantage over humanity, any one of which could lead to that outcome. In this post, I will draw on arguments from the same paper to examine another question: what different routes are there for an AI to gain the capability to act autonomously? (this post draws on sections 4.1. and 5.1. of our paper, as well adding some additional material)

Autonomous AI capability

A somewhat common argument concerning AI risk is that AI systems aren’t a threat because we will keep them contained, or “boxed”, thus limiting what they are allowed to do. How might this line of argument fail?

1. The AI escapes


A common response is that a sufficiently intelligent AI will somehow figure out a way to escape, either by social engineering or by finding an exploitable weakness in the physical security arrangements. This possibility has been extensively discussed in a number of papers, including Chalmers (2012) and Armstrong, Sandberg &  Bostrom (2012)Writers have generally been cautious about making strong claims of our ability to keep a mind much smarter than ourselves contained against its will. However, with cautious design it may still be possible to design an AI combining some internal motivation to stay contained, and combine that with a number of external safeguards monitoring the AI.

2. The AI is voluntarily released


AI confinement assumes that the people building it are motivated to actually keep the AI confined. If a group of cautious researchers builds and successfully contains their AI, this may be of limited benefit if another group later builds an AI that is intentionally set free. Why would anyone do this?

2a. Voluntarily released for economic benefit or competitive pressure

As already discussed in the previous post, the historical trend has been to automate everything that can be automated, both to reduce costs and because machines can do things better than humans can. If you have any kind of a business, you could potentially make it run better by putting a sufficiently sophisticated AI in charge – or even replace all the human employees with one. The AI can think faster and smarter, deal with more information at once, and work for a unified purpose rather than have its efficiency weakened by the kinds of office politics that plague any large organization.

The trend towards automation has been going on throughout history, doesn’t show any signs of stopping, and inherently involves giving the AI systems whatever agency they need in order to run the company better. If your competitors are having AIs run their company and you don’t, you’re likely to be outcompeted, so you’ll want to make sure your AIs are smarter and more capable of acting autonomously than the AIs of the competitors. These pressures are likely to first show up when AIs are still comfortably narrow, and intensify even as the AIs gradually develop towards general intelligence.

The trend towards giving AI systems more power and autonomy might be limited by the fact that doing this poses large risks for the company if the AI malfunctions. This limits the extent to which major, established companies might adopt AI-based control, but incentivizes startups to try to invest in autonomous AI in order to outcompete the established players. There currently also exists the field of algorithmic trading, where AI systems are trusted with enormous sums of money despite the potential to make enormous losses – in 2012, Knight Capital lost $440 million due to a glitch in their software. This suggests that even if a malfunctioning AI could potentially cause major risks, some companies will still be inclined to invest in placing their business under autonomous AI control if the potential profit is large enough.

The trend towards giving AI systems more autonomy can also be seen in the military domain. Wallach and Allen (2012) discuss the topic of autonomous robotic weaponry and note that the US military is seeking to eventually transition to a state where the human operators of robot weapons are “on the loop” rather than “in the loop.” In other words, whereas a human was previously required to explicitly give the order before a robot was allowed to initiate possibly lethal activity, in the future humans are meant to merely supervise the robot’s actions and interfere if something goes wrong.

Human Rights Watch (2012) reports on a number of military systems which are becoming increasingly autonomous, with the human oversight for automatic weapons defense systems—designed to detect and shoot down incoming missiles and rockets— already being limited to accepting or overriding the computer’s plan of action in a matter of seconds, which may be too little to make a meaningful decision in practice. Although these systems are better described as automatic, carrying out preprogrammed sequences of actions in a structured environment, than autonomous, they are a good demonstration of a situation where rapid decisions are needed and the extent of human oversight is limited. A number of militaries are considering the future use of more autonomous weapons.

2b. Voluntarily released for aesthetic, ethical, or philosophical reasons

A few thinkers (such as Gunkel 2012) have raised the question of moral rights for machines, and not everyone necessarily agrees that confining an AI is ethically acceptable. Even if the designer of an AI knew that it did not have a process that corresponded to the ability to suffer, they might come to view it as something like their child, and feel that it deserved the right to act autonomously.

2c. Voluntarily released due to confidence in the AI’s safety

For a research team to keep an AI confined, they need to take seriously the possibility of it being dangerous in the first place. Current AI research doesn’t involve any confinement safeguards, as the researchers reasonably believe that their systems are nowhere near general intelligence yet. Many systems are also connected directly to the Internet. Hopefully safeguards will begin to be implemented once the researchers feel that their system might start having more general capability, but this will depend on the safety culture of the AI research community in general, and the specific research group in particular.

In addition to believing that the AI is insufficiently capable of being a threat, the researchers may also (correctly or incorrectly) believe that they have succeeded in making the AI aligned with human values, so that it will not have any motivation to harm humans.

2d. Voluntarily released due to desperation

Miller (2012) points out that if a person was close to death, due to natural causes, being on the losing side of a war, or any other reason, they might turn even a potentially dangerous AGI system free. This would be a rational course of action as long as they primarily valued their own survival and thought that even a small chance of the AGI saving their life was better than a near-certain death.

3. The AI remains contained, but ends up effectively in control anyway

Even if humans were technically kept in the loop, they might not have the time, opportunity, motivation, intelligence, or confidence to verify the advice given by an AI. This would particularly be the case after the AI had functioned for a while, and established a reputation as trustworthy. It may become common practice to act automatically on the AI’s recommendations, and it may become increasingly difficult to challenge the ‘authority’ of the recommendations. Eventually, the AI may in effect begin to dictate decisions (Friedman and Kahn 1992).

Likewise, Bostrom and Yudkowsky (2011) point out that modern bureaucrats often follow established procedures to the letter, rather than exercising their own judgment and allowing themselves to be blamed for any mistakes that follow. Dutifully following all the recommendations of an AI system would be an even better way of avoiding blame.

Wallach and Allen (2012) note the existence of robots which attempt to automatically detect the locations of hostile snipers and to point them out to soldiers. To the extent that these soldiers have come to trust the robots, they could be seen as carrying out the robots’ orders. Eventually, equipping the robot with its own weapons would merely dispense with the formality of needing to have a human to pull the trigger.



Merely developing ways to keep AIs confined is not a sufficient route to ensure that they cannot become an existential risk – even if we knew that those ways worked. Various groups may have different reasons to create autonomously-acting AIs that are intentionally allowed to act by themselves, and even an AI that was successfully kept contained might still end up dictating human decisions in practice. All of these issues will need to be considered in order to keep advanced AIs safe.

This blog post was written as part of research funded by the Foundational Research Institute.

No comments


  1. AI risk model: single or multiple AIs? | Kaj Sotala - […] previous posts have basically been discussing a scenario where a single AI becomes powerful enough to threaten […]

Leave a Reply