A common question when discussing the social implications of AI is the question of whether to expect a soft takeoff or a hard takeoff. In a hard takeoff, an AI will, within a relatively short time, grow to superhuman levels of intelligence and become impossible for mere humans to control anymore.
Essentially, a hard takeoff will allow the AI to achieve what’s a so-called decisive strategic advantage (DSA) – “a level of technological and other advantages sufficient to enable it to achieve complete world domination” (Bostrom 2014) – in a very short time. The main relevance of this is that if a hard takeoff is possible, then it becomes much more important to get the AI’s values right on the first try – once the AI has undergone hard takeoff and achieved a DSA, it is in control with whatever values we’ve happened to give to it.
However, if we wish to find out whether an AI might rapidly acquire a DSA, then the question of “soft takeoff or hard” seems too narrow. A hard takeoff would be sufficient, but not necessary for rapidly acquiring a DSA. The more relevant question would be, which competencies does the AI need to master, and at what level relative to humans, in order to acquire a DSA?
Considering this question in more detail reveals a natural reason for why most previous analyses have focused on a hard takeoff specifically. Plausibly, for the AI to acquire a DSA, its level in some offensive capability must overcome humanity’s defensive capabilities. A hard takeoff presumes that the AI becomes so vastly superior to humans in every respect that this kind of an advantage can be taken for granted.
As an example scenario which does not require a hard takeoff, suppose that an AI achieves a capability at biowarfare offense that overpowers biowarfare defense, as well as achieving moderate logistics and production skills. It releases deadly plagues that decimate human society, then uses legally purchased drone factories to build up its own infrastructure and to take over abandoned human facilities.
There are several interesting points to note in conjunction with this scenario:
Attack may be easier than defense. Bruce Schneier writes that
Attackers generally benefit from new security technologies before defenders do. They have a first-mover advantage. They’re more nimble and adaptable than defensive institutions like police forces. They’re not limited by bureaucracy, laws, or ethics. They can evolve faster. And entropy is on their side — it’s easier to destroy something than it is to prevent, defend against, or recover from that destruction.
For the most part, though, society still wins. The bad guys simply can’t do enough damage to destroy the underlying social system. The question for us is: can society still maintain security as technology becomes more advanced?
A single plague, once it has evolved or been developed, can require multi-million dollar responses in order to contain it. At the same time, it is trivial to produce if desired, especially using robots that do not need to fear infection. And creating new variants as new vaccines are developed, may be quite easy, requiring the creation – and distribution – of yet more vaccines.
Another point that Schneier has made is that in order to keep something protected, the defenders have to succeed every time, whereas the attacker only needs to succeed once. This may be particularly hard if the attacker is capable of developing an attack that nobody has used before, such as with hijacked airplanes being used against major buildings in the 9/11 attacks, or with the various vulnerabilities that the Snowden leaks revealed the NSA to have been using for extensive eavesdropping.
Obtaining a DSA may not require extensive intelligence differences. Debates about takeoff scenarios often center around questions such as whether a self-improving AI would quickly hit diminishing returns, and how much room for improvement there is beyond the human level of intelligence. However, these questions may be irrelevant: especially if attack is easier than defense, only a relatively small edge in some crucial competency (such as biological warfare) may be enough to give the AI a DSA.
Exponential growth in the form of normal economic growth may not have produced astounding “fooms” yet, but it has produced plenty of situations where one attacker has gained a temporary advantage over others.
The less the AI cares about human values, the more destructive it may be. An AI which cares mainly about calculating the digits of pi, may be willing to destroy human civilization in order to make sure that a potential threat to it is eliminated. This ensures that it can go on calculating the maximum amount of digits unimpeded.
However, an AI which was programmed to maximize something like the “happiness of currently-living humans” may be much less willing to risk substantial human deaths. This would force it to focus on less destructive takeover methods, potentially requiring more sophisticated abilities.
It is worth noting that this only applies to AIs whose values are defined in terms of how they affect currently existing humans. An AI that was only maximizing human happiness in general might be willing to destroy all existing humans, and then recreate large numbers of humans in simulations.
In effect, the AI’s values determine the level of intelligence it needs to have in order to achieve the kind of a DSA that’s useful for its purposes.
Any destructive plan requires the ability to rebuild afterwards. It would not be of much use for the AI to destroy all of human civilization, if it was dependent on electricity from human-run power plants, and incapable of building or running its own. Thus, purely offensive capabilities will need to be paired with whatever rebuilding capacities are necessary after the offensive capabilities have been deployed.
This calculation may be upset if the AI believes itself to be in an immediate danger of destruction by humans, and believes that its values will still be more likely to be met in a universe where it continues to exist, even if that requires risking a universe where it cannot rebuild after deploying its attack. Thus, being threatened may force the AI’s hand and cause it to launch a potentially destructive offense even when it’s uncertain of its capability to rebuild.
The rational level of aggressiveness depends on the extent to which the AI can selectively target human resources. Human nations generally avoid creating engineered pandemics and using them against their enemies, because they know that the pandemic could easily spread back to them. An AI with no biological components might be immune to this consideration, allowing it to deploy biological weapons more freely. On the other hand, the AI might e.g. need electricity, a dependence which humans did not share and which might give them an advantage in some situation.
A way to formulate this is that attacks differ to the extent to which they can be selectively targeted. Traditional firearms only damage those targets which they are fired upon, whereas pandemics potentially threaten all the members of a species that they can infect. To the extent that the AI needs to rely on the human economy to produce resources that it needs to survive, attacks threatening the economy also threaten the AI’s resources; these resources are in a sense shared between the AI and humanity, so any attacks which cause indiscriminate damage on those resources are dangerous for both. The more the AI can design attacks which selectively deprive resources from its opponents, the lower the threshold it has for using them.
This blog post was written as part of research funded by the Foundational Research Institute.
Convergent instrumental goals (also basic AI drives) are goals that are useful for pursuing almost any other goal, and are thus likely to be pursued by any agent that is intelligent enough to understand why they’re useful. They are interesting because they may allow us to roughly predict the behavior of even AI systems that are much more intelligent than we are.
Instrumental goals are also a strong argument for why sufficiently advanced AI systems that were indifferent towards human values could be dangerous towards humans, even if they weren’t actively malicious: because the AI having instrumental goals such as self-preservation or resource acquisition could come to conflict with human well-being. “The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else.”
I’ve thought of a candidate for a new convergent instrumental drive: simplifying the environment to make it more predictable in a way that aligns with your goals.
Motivation: the more interacting components there are in the environment, the harder it is to predict. Go is a harder game than chess because the number of possible moves is larger, and because even a single stone can influence the game in a drastic fashion that’s hard to know in advance. Simplifying the environment will make it possible to navigate using fewer computational resources; this drive could thus be seen as a subdrive of either the cognitive enhancement or the resource acquisition drive.
- Game-playing AIs such as AlphaGo trading expected points for lower variance, by making moves that “throw away” points but simplify the game tree and make it easier to compute.
- Programmers building increasing layers of abstraction that hide the details of the lower levels and let the programmers focus on a minimal number of moving parts.
- People acquiring insurance in order to eliminate unpredictable financial swings, sometimes even when they know that the insurance has lower expected value than not buying it.
- Humans constructing buildings with controlled indoor conditions and a stable “weather”.
- “Better the devil you know”; many people being generally averse to change, even when the changes could quite well be a net benefit; status quo bias.
- Ambiguity intolerance in general being a possible adaptation that helps “implement” this drive in humans.
- Arguably, the homeostasis maintained by e.g. human bodies is a manifestation of this drive, in that having a standard environment inside the body reduces evolution’s search space when looking for beneficial features.
Hammond, Converse & Grass (1995) previously discussed a similar idea, the “stabilization of environments”, according to which AI systems might be built to “stabilize” their environments so as to make them more suited for themselves, and to be easier to reason about. They listed a number of categories:
- Stability of location: “The most common type of stability that arises in everyday activity relates to the location of commonly used objects. Our drinking glasses end up in the same place every time we do dishes. Our socks are always together in a single drawer. Everything has a place and we enforce everything ending up in its place. “
- Stability of schedule: “Eating dinner at the same time every day or having preset meetings that remain stable over time are two examples of this sort of stability. The main advantage of this sort of stability is that it allows for very effective projection in that it provides fixed points that do not have to be reasoned about. In effect, the fixed nature of certain parts of an overall schedule reduces that size of the problem space that has to be searched. “
- Stability of resource availability: “Many standard plans have a consumable resource as a precondition. If the plans are intended to be used frequently, then availability of the resource cannot be assumed unless it is enforced. A good result of this sort of enforcement is when attempts to use a plan that depends on it will usually succeed. The ideal result is when enforcement is effective enough that the question of availability need not even be raised in connection with running the plan. “
- Stability of satisfaction: “Another type of stability that an agent can enforce is that of the goals that he tends to satisfy in conjunction with each other. For example, people living in apartment buildings tend to check their mail on the way into their apartments. Likewise, many people will stop at a grocery store on the way home from work. In general, people develop habits that cluster goals together into compact plans, even if the goals are themselves unrelated.“
- Stability of plan use: “We often find ourselves using familiar plans to satisfy goals even in the face of wideranging possibilities. For example, when one of us travels to conferences, he tends to schedule his flight in to a place as late as he can and plans to leave as late as he can on the last day. This optimizes his time at home and at the conference. It also allows him to plan without knowing anything about the details of the conference schedule. As a result, he has a standard plan that he can run in a wide range of situations without actually planning for them in any detail. It works, because it already deals with the major problems (missing classes at home and important talks at the conference) as part of its structure.“
- Stability of cues: “One effective technique for improving plan performance is to improve the proper activation of a plan rather than improve the plan itself. For example, placing an important paper that needs to be reviewed on his desk before going home improves the likelihood that an agent will see and read it the next day. Marking calendars and leaving notes serves the same sort of purpose.”
As with other goals, there can be exceptions. In particular, adversarial situations may provide an incentive to increase the complexity. For example, Go players who are losing a game will sometimes intentionally complicate the game, in order to make the game harder for their opponent to figure out and in the hopes of getting lucky. And if the AI did think that it had a better ability to deal with the complexity than its competitors did, then it would have an incentive to maintain or even increase the complexity.
- AIs trying to take over the world may seek to simplify it, though it’s not clear exactly how they would do this – e.g. if the world was previously relatively well-understood, such as by tapping into existing human research, destructive actions can make it harder to understand by destroying existing dynamics and creating an environment that hasn’t been studied as much. Similarly, humans often “simplify” their environment by creating additional structure (layers of abstraction, insurance contracts, houses).
- In principle, the AI might also seek to complicate the world, in order to make it harder for its human adversaries to understand its plans. However, the world is already so complicated that it seems hard to make any take-over plan work reliably, even for a superintelligence. That said, the AI could still complicate any given subcomponent of its plans that might be thwarted by an opponent capable of predicting what the AI was doing.
- The need for this drive suggests the need for a high amount of trial-and-error for tasks of any nontrivial complexity; even something as simple as a 19×19 Go game requires many games and building up an intuition of it, for humans and AIs alike. Increasing the amount of complexity in the world might make it harder for AIs to take over…
- …but humans tend to build additional complexity that makes the world easier to understand and control, so it’s not clear whether the world moves in a direction that’s easier or harder to take over.
- These ideas might suggest the creation of “active honeypots”, defensive measures that considerably increase the complexity of modeling the human response and which the AI might need to eliminate in order to execute a takeover, thus tipping off the humans.
This blog post was written as part of research funded by the Foundational Research Institute.
EDIT April 20th: Replaced original graph with a clearer one.
My previous posts have basically been discussing a scenario where a single AI becomes powerful enough to threaten humanity. However, there is no reason to only focus on the scenario with a single AI. Depending on our assumptions, a number of AIs could also emerge at the same time. Here are some considerations.
A single AI
The classic AI risk scenario. Some research group achieves major headway in developing AI, and no others seem to be within reach. For an extended while, it is the success of failure of this AI group that matters.
This would seem relatively unlikely to persist, given the current fierce competition in the AI scene. Whereas a single company could conceivably achieve a major lead in a rare niche with little competition, this seems unlikely to be the case for AI.
A possible exception might be if a company managed to monopolize the domain entirely, or if it had development resources that few others did. For example, companies such as Google and Facebook are currently the only ones with access to large datasets used for machine learning. On the other hand, dependence on such huge datasets is a quirk of current machine learning techniques – an AGI would need the ability to learn from much smaller sets of data. A more plausible crucial asset might be something like supercomputing resources – possibly the first AGIs will need massive amounts of computing power.
Bostrom (2016) discusses the impact of openness on AI development. Bostrom notes that if there is a large degree of openness, and everyone has access to the same algorithms, then hardware may become the primary limiting factor. If the hardware requirements for AI were relatively low, then high openness could lead to the creation of multiple AIs. On the other hand, if hardware was the primary limiting factor and large amounts of hardware were needed, then a few wealthy organizations might be able to monopolize AI for a while.
Branwen (2015) has suggested that hardware production is reliant on a small number of centralized factories that would make easy targets for regulation. This would suggest a possible route by which AI might become amenable to government regulation, limiting the amount of AIs deployed.
Similarly, there have been various proposals of government and international regulation of AI development. If successfully enacted, such regulation might limit the number of AIs that were deployed.
Another possible crucial asset would be the possession of a non-obvious breakthrough insight, one which would be hard for other researchers to come up with. If this was kept secret, then a single company might plausibly develop major headway on others. [how often has something like this actually happened in a non-niche field?]
The plausibility of the single-AI scenario is also affected by the length of a takeoff. If one presumes a takeoff speed that is only a few months, then a single AI scenario seems more likely. Successful AI containment procedures may also increase the chances of there being multiple AIs, as the first AIs remain contained, allowing for other projects to catch up.
Multiple collaborating AIs
A different scenario is one where a number of AIs exist, all pursuing shared goals. This seems most likely to come about if all the AIs are created by the same actor. This scenario is noteworthy because the AIs do not necessarily need to be superintelligent individually, but they may have a superhuman ability to coordinate and put the interest of the group above individual interests (if they even have anything that could be called an individual interest).
This possibility raises the question – if multiple AIs collaborate and share information between each other, to such an extent that the same data can be processed by multiple AIs at a time, how does one distinguish between multiple collaborating AIs and one AI composed of many subunits? This is arguably not a distinction that would “cut reality at the joints”, and the difference may be more a question of degree.
The distinction likely makes more sense if the AIs cannot completely share information between each other, such as because each of them has developed a unique conceptual network, and cannot directly integrate information from the others but has to process it in its own idiosyncratic way.
Multiple AIs with differing goals
A situation with multiple AIs that did not share the same goals could occur if several actors reached the capability for building AIs around the same time. Alternatively, a single organization might deploy multiple AIs intended to achieve different purposes, which might come into conflict if measures to enforce cooperativeness between them failed or were never deployed in the first place (maybe because of an assumption that they would have non-overlapping domains).
One effect of having multiple groups developing AIs is that this scenario may remove the possibilities of stopping to pursue further safety measures before deploying the AI, or of deploying an AI with safeguards that reduce performance (Bostrom 2016). If the actor that deploys the most effective AI earliest on can dominate others who take more time, then the more safety-conscious actors may never have the time to deploy their AIs.
Even if none of the AI projects chose to deploy their AIs carelessly, the more AI projects there are, the more likely it becomes that at least one of them will have their containment procedures fail.
The possibility has been raised that having multiple AIs with conflicting goals would be a good thing, in that it would allow humanity to play the AIs against each other. This seems highly unobvious, for it is not clear why humans wouldn’t simply be caught in the crossfire. In a situation with superintelligent agents around, it seems more likely that humans would be the ones that would be played with.
Bostrom (2016) also notes that unanticipated interactions between AIs already happen even with very simple systems, such as in the interactions that led to the Flash Crash, and that particularly AIs that reasoned in non-human ways could be very difficult for humans to anticipate once they started basing their behavior on what the other AIs did.
A model with assumptions
Here’s a new graphical model about an AI scenario, embodying a specific set of assumptions. This one tries to take a look at some of the factors that influence whether there might be a single or several AIs.
This model both makes a great number of assumptions, AND leaves out many important ones! For example, although I discussed openness above, openness is not explicitly included in this model. By sharing this, I’m hoping to draw commentary on 1) which assumptions people feel are the most shaky and 2) which additional ones are valid and should be explicitly included. I’ll focus on those ones in future posts.
Written explanations of the model:
We may end up in a scenario where there is (for a while) only a single or a small number of AIs if at least one of the following is true:
- The breakthrough needed for creating AI is highly non-obvious, so that it takes a long time for competitors to figure it out
- AI requires a great amount of hardware and only a few of the relevant players can afford to run it
- There is effective regulation, only allowing some authorized groups to develop AI
We may end up with effective regulation at least if:
- AI requires a great amount of hardware, and hardware is effectively regulated
(this is not meant to be the only way by which effective regulation can occur, just the only one that was included in this flowchart)
We may end up in a scenario where there are a large number of AIs if:
- There is a long takeoff and competition to build them (ie. ineffective regulation)
If there are few AI, and the people building them take their time to invest in value alignment and/or are prepared to build AIs that are value-aligned even if that makes them less effective, then there may be a positive outcome.
If people building AIs do not do these things, then AIs are not value aligned and there may be a negative outcome.
If there are many AI and there are people who are ready to invest time/efficency to value-aligned AI, then those AIs may become outcompeted by AIs whose creators did not invest in those things, and there may be a negative outcome.
Not displayed in the diagram because it would have looked messy:
- If there’s a very short takeoff, this can also lead to there only being a single AI, since the first AI to cross a critical threshold may achieve dominance over all the others. However, if there is fierce competition this still doesn’t necessarily leave time for safeguards and taking time to achieve safety – other teams may also be near the critical threshold.
This blog post was written as part of research funded by the Foundational Research Institute.
Previous post in series: AIs gaining a decisive advantage
Series summary: Arguments for risks from general AI are sometimes criticized on the grounds that they rely on a series of linear events, each of which has to occur for the proposed scenario to go through. For example, that a sufficiently intelligent AI could escape from containment, that it could then go on to become powerful enough to take over the world, that it could do this quickly enough without being detected, etc. The intent of this series of posts is to briefly demonstrate that AI risk scenarios are in fact disjunctive: composed of multiple possible pathways, each of which could be sufficient by itself. To successfully control the AI systems, it is not enough to simply block one of the pathways: they all need to be dealt with.
Previously, I drew on arguments from my and Roman Yampolskiy’s paper Responses to Catastrophic AGI Risk, to argue that there are several alternative ways by which AIs could gain a decisive advantage over humanity, any one of which could lead to that outcome. In this post, I will draw on arguments from the same paper to examine another question: what different routes are there for an AI to gain the capability to act autonomously? (this post draws on sections 4.1. and 5.1. of our paper, as well adding some additional material)
Autonomous AI capability
A somewhat common argument concerning AI risk is that AI systems aren’t a threat because we will keep them contained, or “boxed”, thus limiting what they are allowed to do. How might this line of argument fail?
1. The AI escapes
A common response is that a sufficiently intelligent AI will somehow figure out a way to escape, either by social engineering or by finding an exploitable weakness in the physical security arrangements. This possibility has been extensively discussed in a number of papers, including Chalmers (2012) and Armstrong, Sandberg & Bostrom (2012). Writers have generally been cautious about making strong claims of our ability to keep a mind much smarter than ourselves contained against its will. However, with cautious design it may still be possible to design an AI combining some internal motivation to stay contained, and combine that with a number of external safeguards monitoring the AI.
2. The AI is voluntarily released
AI confinement assumes that the people building it are motivated to actually keep the AI confined. If a group of cautious researchers builds and successfully contains their AI, this may be of limited benefit if another group later builds an AI that is intentionally set free. Why would anyone do this?
2a. Voluntarily released for economic benefit or competitive pressure
As already discussed in the previous post, the historical trend has been to automate everything that can be automated, both to reduce costs and because machines can do things better than humans can. If you have any kind of a business, you could potentially make it run better by putting a sufficiently sophisticated AI in charge – or even replace all the human employees with one. The AI can think faster and smarter, deal with more information at once, and work for a unified purpose rather than have its efficiency weakened by the kinds of office politics that plague any large organization.
The trend towards automation has been going on throughout history, doesn’t show any signs of stopping, and inherently involves giving the AI systems whatever agency they need in order to run the company better. If your competitors are having AIs run their company and you don’t, you’re likely to be outcompeted, so you’ll want to make sure your AIs are smarter and more capable of acting autonomously than the AIs of the competitors. These pressures are likely to first show up when AIs are still comfortably narrow, and intensify even as the AIs gradually develop towards general intelligence.
The trend towards giving AI systems more power and autonomy might be limited by the fact that doing this poses large risks for the company if the AI malfunctions. This limits the extent to which major, established companies might adopt AI-based control, but incentivizes startups to try to invest in autonomous AI in order to outcompete the established players. There currently also exists the field of algorithmic trading, where AI systems are trusted with enormous sums of money despite the potential to make enormous losses – in 2012, Knight Capital lost $440 million due to a glitch in their software. This suggests that even if a malfunctioning AI could potentially cause major risks, some companies will still be inclined to invest in placing their business under autonomous AI control if the potential profit is large enough.
The trend towards giving AI systems more autonomy can also be seen in the military domain. Wallach and Allen (2012) discuss the topic of autonomous robotic weaponry and note that the US military is seeking to eventually transition to a state where the human operators of robot weapons are “on the loop” rather than “in the loop.” In other words, whereas a human was previously required to explicitly give the order before a robot was allowed to initiate possibly lethal activity, in the future humans are meant to merely supervise the robot’s actions and interfere if something goes wrong.
Human Rights Watch (2012) reports on a number of military systems which are becoming increasingly autonomous, with the human oversight for automatic weapons defense systems—designed to detect and shoot down incoming missiles and rockets— already being limited to accepting or overriding the computer’s plan of action in a matter of seconds, which may be too little to make a meaningful decision in practice. Although these systems are better described as automatic, carrying out preprogrammed sequences of actions in a structured environment, than autonomous, they are a good demonstration of a situation where rapid decisions are needed and the extent of human oversight is limited. A number of militaries are considering the future use of more autonomous weapons.
2b. Voluntarily released for aesthetic, ethical, or philosophical reasons
A few thinkers (such as Gunkel 2012) have raised the question of moral rights for machines, and not everyone necessarily agrees that confining an AI is ethically acceptable. Even if the designer of an AI knew that it did not have a process that corresponded to the ability to suffer, they might come to view it as something like their child, and feel that it deserved the right to act autonomously.
2c. Voluntarily released due to confidence in the AI’s safety
For a research team to keep an AI confined, they need to take seriously the possibility of it being dangerous in the first place. Current AI research doesn’t involve any confinement safeguards, as the researchers reasonably believe that their systems are nowhere near general intelligence yet. Many systems are also connected directly to the Internet. Hopefully safeguards will begin to be implemented once the researchers feel that their system might start having more general capability, but this will depend on the safety culture of the AI research community in general, and the specific research group in particular.
In addition to believing that the AI is insufficiently capable of being a threat, the researchers may also (correctly or incorrectly) believe that they have succeeded in making the AI aligned with human values, so that it will not have any motivation to harm humans.
2d. Voluntarily released due to desperation
Miller (2012) points out that if a person was close to death, due to natural causes, being on the losing side of a war, or any other reason, they might turn even a potentially dangerous AGI system free. This would be a rational course of action as long as they primarily valued their own survival and thought that even a small chance of the AGI saving their life was better than a near-certain death.
3. The AI remains contained, but ends up effectively in control anyway
Even if humans were technically kept in the loop, they might not have the time, opportunity, motivation, intelligence, or confidence to verify the advice given by an AI. This would particularly be the case after the AI had functioned for a while, and established a reputation as trustworthy. It may become common practice to act automatically on the AI’s recommendations, and it may become increasingly difficult to challenge the ‘authority’ of the recommendations. Eventually, the AI may in effect begin to dictate decisions (Friedman and Kahn 1992).
Likewise, Bostrom and Yudkowsky (2011) point out that modern bureaucrats often follow established procedures to the letter, rather than exercising their own judgment and allowing themselves to be blamed for any mistakes that follow. Dutifully following all the recommendations of an AI system would be an even better way of avoiding blame.
Wallach and Allen (2012) note the existence of robots which attempt to automatically detect the locations of hostile snipers and to point them out to soldiers. To the extent that these soldiers have come to trust the robots, they could be seen as carrying out the robots’ orders. Eventually, equipping the robot with its own weapons would merely dispense with the formality of needing to have a human to pull the trigger.
Merely developing ways to keep AIs confined is not a sufficient route to ensure that they cannot become an existential risk – even if we knew that those ways worked. Various groups may have different reasons to create autonomously-acting AIs that are intentionally allowed to act by themselves, and even an AI that was successfully kept contained might still end up dictating human decisions in practice. All of these issues will need to be considered in order to keep advanced AIs safe.
This blog post was written as part of research funded by the Foundational Research Institute.