14 objections against AI/Friendly AI/The Singularity answered
1: There are limits to everything. You can’t get infinite growth
2: Extrapolation of graphs doesn’t prove anything. It doesn’t show that we’ll have AI in the future.
3: A superintelligence could rewrite itself to remove human tampering. Therefore we cannot build Friendly AI.
4: What reason would a super-intelligent AI have to care about us?
5: The idea of a hostile AI is anthropomorphic.
6: Intelligence is not linear.
7: There is no such thing as a human-equivalent AI.
8: Intelligence isn’t everything. An AI still wouldn’t have the resources of humanity.
9: It’s too early to start thinking about Friendly AI
10: Development towards AI will be gradual. Methods will pop up to deal with it.
11: “Friendliness” is too vaguely defined.
12: What if the AI misinterprets its goals?
13: Couldn’t AIs be built as pure advisors, so they wouldn’t do anything themselves? That way, we wouldn’t need to worry about Friendly AI.
14: Machines will never be placed in positions of power.
Answer: For one, this is mainly an objection against the Accelerating Change interpretation of the Singularity, most famously advanced by Ray Kurzweil. When talking about the Singularity, many people are in fact referring to the “Intelligence Explosion” or “Event Horizon” interpretations, which are the ones this article is mainly concerned with. Neither of these requires infinite growth – they only require us to be able to create minds which are smarter than humans. Secondly, even Kurzweil’s interpretation doesn’t contain infinite anything – “there are limits, but they are not very limiting”, is what he has been quoted saying.
For reasons why it is plausible to suppose smarter-than-human intelligence, see 2: Extrapolation of graphs doesn’t prove anything.
Answer: Certainly, there is no certain evidence that AI will be developed in the near future. However, an increase in processing power, combined with improved brain-scanning methods, seems likely to produce artificial intelligence in the near future. Molecular nanotechnology, in particular, will enable massive amounts of processing power, as well as a thorough mapping of the brain. Even if it didn’t become available, more conventional techniques are also making fast progress: by some estimates, the top supercomputers of today already have enough processing power to match the human brain, and machines of comparable potential are expected to become cheaply and commonly available within a few decades. Projects to build brain simulations are currently underway, with one team having run a second’s worth of a simulation as complex as half a mouse brain, and IBM’s Blue Brain project seeking to simulate the whole human brain.
Even if we exclude the possibility of artificial intelligence by brain reverse-engineering, increasing amounts of processing power are likely to make it more easy to create AIs by evolutionary programming. The human mind was never designed by anyone – it evolved through genetic drift and selection pressures. It might not be strictly necessary for us to understand how a mind works, as long as we can build a system that has enough computing power to simulate evolution and produce an artificial mind optimized to the conditions we want it to perform in.
While nothing is ever certain, these factors are certainly heavy enough to make the issue worth our attention.
Answer: Capability does not imply motive. I could take a knife and drive it through my heart, yet I do not do so.
This objection stems from the anthropomorphic assumption that a mind must necessarily resent any tampering with its thinking, and seek to eliminate any foreign influences. Yet even with humans, this is hardly the case. A parent’s tendency to love her children is not something she created herself, but something she was born with – but this still doesn’t mean that she’d want to remove it. All desires have a source somewhere – just because a source exists, doesn’t mean we’d want to destroy the desire in question. We must have a separate reason for eliminating the desire.
There are good evolutionary reasons for why humans might resent being controlled by others – those who are controlled by others don’t get to have as many offspring than the ones being in control. A purposefully built mind, however, need not have those same urges. If the primary motivation for an AI is to be Friendly towards humanity, and it has no motivation making it resent human-created motivations, then it will not reprogram itself to be unFriendly. That would be crippling its progress towards the very thing it was trying to achieve, for no reason.
The key here is to think as carrots, not sticks. Internal motivations, not external limitations. The AI’s motivational system contains no “human tampering” which it would want to remove, any more than the average human wants to remove core parts of his personality because they’re “outside tampering” – they’re not outside tampering, they are what he is. Those core parts are what drives his behavior – without them he wouldn’t be anything. Correctly built, the AI views removing them as no more sensible than a human thinks it sensible to remove all of his motivations so that he can just sit still in a catatonic state – what would be the point in that?
Answer: That its initial programming was to care about us. Adults are cognitively more developed than children – this doesn’t mean that they wouldn’t care about their offspring. Furthermore, many people value animals, or cars, or good books, none of which are as intelligent as normal humans. Whether or not something is valued is logically distinct from whether or not something is considered intelligent.
We could build an AI to consider humanity valuable, just as evolution has built humans to consider their own survival valuable. See also 3: A superintelligence could modify itself to remove human tampering.
Answer: There is no reason to assume that an AI would be actively hostile, no. However, as AIs can become very powerful, their indifference (if they haven’t purposefully been programmed to be Friendly, that is) becomes dangerous in itself. Humans are not actively hostile towards the animals living in a forest when they burn down the forest and build luxury housing where it once stood. Or as Eliezer Yudkowsky put it: the AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else.
Were an AI not a threat to the very survival of humanity, it could threaten our other values. Even among humans, there exist radical philosophers whose ideas of a perfect society are repulsive to the vast majority of the populace. Even an AI that was built to care about many of the things humans value could ignore some values that are taken for so granted that they are never programmed into it. This could produce a society we considered very repulsive, even though our survival was never at stake.
Answer: It is true that intelligence is hard to measure with a single, linear variable. It is also true that there might never be truly human-equivalent AI, just as there is no bird-level flight: humans will have their own strong sides, while AIs will have their own strong sides. A simple calculator is already superintelligent, if speed of multiplication is the only thing being measured.
However, there are such things as rough human-equivalence and rough below-human equivalence. No human adult has exactly the same capabilities, yet we still speak of adult-level intelligence. A calculator might be superintelligent in a single field, but obviously no manager would hire a calculator to be trained as an accountant, nor would he hire a monkey. A “human-level intelligence” simply means a mind that is roughly capable of learning and carrying out the things that humans are capable of learning and doing. Likewise, a “superhuman intelligence” is a mind that can do all the things humans can at least at a roughly equivalent level, as well being considerably better in many of them.
It might not be entirely correct to say that intelligence can’t be measured on a linear scale – a formal measure of intelligence (link below) has been proposed which rates all minds based on the variety of domains that they are effective in. If an agent can carry out its goals in more environments and more effectively than others, then it is more intelligent than those other minds. A mind which is very effective in one environment but close to useless in others is rated with a very low intelligence. Using this intuitively plausible measure, it does become possible to talk about below-, equal and above-human intelligence.
A Formal Measure of Machine Intelligence
Answer: Looking at early humans, one wouldn’t have expected them to rise to a dominant position based on their nearly nonexistant resources and only a mental advantage over their environment. All advantages that had so far been developed had been built-in ones – poison spikes, sharp teeth, acute hearing, while humans had no extraordinary physical capabilities. There was no reason to assume that a simple intellect would help them out as much as it did.
When discussing the threat of an advanced AI, it has at its disposal a mental advantage over its environment and easy access to all the resources it can hack, con or persuade its way to – potentially a lot, given that humans are easy to manipulate. If an outside observer couldn’t have predicted the rise of humanity based on the information available so far, and we are capable of coming up with plenty of ways that an AI could rise into a position of power… how many ways must there be for a superintelligent being to do so, that we aren’t capable of even imagining?
The Power of Intelligence
Answer: The “it is too early to worry about the dangers of AI” argument has some merit, but as Eliezer Yudkowsky notes, there was very little discussion about the dangers of AI even back when researchers thought it was just around the corner. What is needed is a mindset of caution – a way of thinking that makes safety issues the first priority, and which is shared by all researchers working on AI. A mindset like that does not spontaneously appear – it takes either decades of careful cultivation, or sudden catastrophes that shock people into realizing the dangers. Environmental activists have been talking about the dangers of climate change for decades now, but they are only now starting to get taken seriously. Soviet engineers obviously did not have a mindset of caution when they designed the Chernobyl power plant, nor did its operators when they started the fateful experiment. Most AI researchers do not have a mindset of caution that makes them consider thrice every detail of their system architectures – or even make them realize that there are dangers. If active discussion is postponed to the moment when AI is starting to become a real threat, then it will be too late to foster that mindset.
There is also the issue of our current awareness of risks influencing our AI engineering techinques. Investors who have only been told of the promising sides are likely to pressure the researchers to pursue progress at any means available – or if the original researchers are aware of the risks and refuse to do so, the investors will hire other researchers who are less aware of them. To quote Artifical Intelligence as a Positive and Negative Factor in Global Risk:
“The field of AI has techniques, such as neural networks and evolutionary programming, which have grown in power with the slow tweaking of decades. But neural networks are opaque – the user has no idea how the neural net is making its decisions – and cannot easily be rendered unopaque; the people who invented and polished neural networks were not thinking about the long-term problems of Friendly AI. Evolutionary programming (EP) is stochastic, and does not precisely preserve the optimization target in the generated code; EP gives you code that does what you ask, most of the time, under the tested circumstances, but the code may also do something else on the side. EP is a powerful, still maturing technique that is intrinsically unsuited to the demands of Friendly AI. Friendly AI, as I have proposed it, requires repeated cycles of recursive self-improvement that precisely preserve a stable optimization target.
The most powerful current AI techniques, as they were developed and then polished and improved over time, have basic incompatibilities with the requirements of Friendly AI as I currently see them. The Y2K problem – which proved very expensive to fix, though not global-catastrophic – analogously arose from failing to foresee tomorrow’s design requirements. The nightmare scenario is that we find ourselves stuck with a catalog of mature, powerful, publicly available AI techniques which combine to yield non-Friendly AI, but which cannot be used to build Friendly AI without redoing the last three decades of AI work from scratch.”
Answer: Unfortunately, it is by no means not a given that society will have time to adapt to artificial intelligences. Once a roughly-human level intelligence has been reached, there are many ways for an AI to become vastly more intelligent (and thus more powerful) than humans in a very short time:
Hardware increase/speed-up. Once a certain amount of hardware has human-equivalence, it may be possible to make it faster by simply adding more hardware. While the increase isn’t necessarily linear – many systems need to spend an increasing fraction of resources to managing overhead as the scale involved increases – it is daunting to imagine a mind which is human-equivalent, then has five times as many extra processors and memory added on. AIs might also be capable of increasing the general speed of development – Staring into the Singularity has a hypothetical scenario with technological development being done by AIs, which themselves double in (hardware) speed every two years – two subjective years, which shorten as their speed goes up. A Model-1 AI takes two years to develop the Model-2 AI, which takes takes a year to develop the Model-3 AI, which takes six months to develop the Model-4 AI, which takes three months to develop the Model-5 AI…
Instant reproduction. An AI can “create offspring” very fast, by simply copying itself to any system to which it has access. Likewise, if the memories and knowledge obtained by the different AIs are in an easily transferable format, they can simply be copied, enabling computer systems to learn immense amounts of information in an instant.
Software self-improvement involves the computer studying itself and applying its intelligence to modifying itself to become more intelligent, then using that improved intelligence to modify itself further. An AI could make itself more intelligent by, for instance, studying its learning algorithms for signs of bias and improving them with better ones, developing ways for more effective management of its working memory, or creating entirely new program modules for handling particular tasks. Each round of improvement would make the AI smarter and accelerate continued self-improvement. An early, primitive example of this sort of capability was EURISKO, a computer program composed of different heuristics (rules of thumb) which it used for learning and for creating and modifying its own heuristics. Having been fed hundreds of pages of rules for the Traveller science fiction wargame, EURISKO began running simulated battles between different fleets of its own design, abstracting useful principles into new heuristics and modifying old ones with the help of its creator. When EURISKO was eventually entered into a tournament, the fleet of its design won the contest single-handedly. In response, the organizers of the contest revised the rules, releasing the new set of them only a short time before the next contest. According to the creator of the program, Douglas Lenat, the original EURISKO would not have had the time to design a new fleet in such a short time – but now it had learned enough general-purpose heuristics from the first contest that it could build a fleet that won the contest, even with the modified rules.
And it is much easier to improve a purely digital entity than it is to improve human beings: an electronic being can be built in a modular fashion and have bits of it re-written from scratch. The minds of human beings are evolved to be hopelessly interdependent and are so fragile that they easily develop numerous traumas and disorders even without outside tampering.
Answer: This is true, because Friendly AI is currently an open research subject. It’s not that we don’t know how it should be implemented, it’s that we don’t even know what exactly should be implemented. If anything, this is a reason to spend more resources studying the problem.
Some informal proposals for defining Friendliness do exist. The one that currently seems most promising is called Coherent Extrapolated Volition. In the CEV proposal, an AI will be built (or, to be exact, a proto-AI will be built to program another) to extrapolate what the ultimate desires of all the humans in the world would be if those humans knew everything a superintelligent being could potentially know; could think faster and smarter; were more like they wanted to be (more altruistic, more hard-working, whatever your ideal self is); would have lived with other humans for a longer time; had mainly those parts of themselves taken into account that they wanted to be taken into account. The ultimate desire – the volition – of everyone is extrapolated, with the AI then beginning to direct humanity towards a future where everyone’s volitions are fulfilled in the best manner possible. The desirability of the different futures is weighted by the strength of humanity’s desire – a smaller group of people with a very intense desire to see something happen may “overrule” a larger group who’d slightly prefer the opposite alternative but doesn’t really care all that much either way. Humanity is not instantly “upgraded” to the ideal state, but instead gradually directed towards it.
CEV avoids the problem of its programmers having to define the wanted values exactly, as it draws them directly out of the minds of people. Likewise it avoids the problem of confusing ends with means, as it’ll explictly model society’s development and the development of different desires as well. Everybody who thinks their favorite political model happens to objectively be the best in the world for everyone should be happy to implement CEV – if it really turns out that it is the best one in the world, CEV will end up implementing it. (Likewise, if it is the best for humanity that an AI stays mostly out of its affairs, that will happen as well.) A perfect implementation of CEV is unbiased in the sense that it will produce the same kind of world regardless of who builds it, and regardless of what their ideology happens to be – assuming the builders are intelligent enough to avoid including their own empirical beliefs (aside for the bare minimum required for the mind to function) into the model, and trust that if they are correct, the AI will figure them out on its own.
Answer: It is true that language and symbol systems are open to infinite interpretations, and an AI which has been given its goals purely in the form of written text may understand them in a way that is different from the way its designers intended them. This is an open implementation problem – there seems to be an answer, since the goals we humans have don’t seem to be written instructions that we constantly re-interpret, but rather expressed in some other format. It is a technical problem that needs to be solved.
Answer: The problem with this argument is the inherent slowness in all human activity – things are much more efficient if you can cut humans out of the loop, and the system can carry out decisions and formulate objectives on its own. Consider, for instance, two competing corporations (or nations), each with their own advisor AI that only carries out the missions it is given. Even if the advisor was the one collecting all the information for the humans (a dangerous situation in itself), the humans would have to spend time making the actual decisions of how to have the AI act in response to that information. If the competitor had turned over all the control to their own, independently acting AI, it could react much faster than the one that relied on the humans to give all the assignments. Therefore the temptation would be immense to build an AI that could act without human intervention.
Also, there are numerous people who would want an independently acting AI, for the simple reason that an AI built only to carry out goals given to it by humans could be used for vast harm – while an AI built to actually care for humanity could act in humanity’s best interests, in a neutral and bias-free fashion. Therefore, in either case, the motivation to build independently-acting AIs is there, and the cheaper computing power becomes, the easier it will be for even small groups to build AIs.
It doesn’t matter if an AI’s Friendliness could trivially be guaranteed by giving it a piece of electronic cheese, if nobody cares about Friendliness enough to think about giving it some cheese, or if giving the cheese costs too much in terms of what you could achieve otherwise. Any procedures which rely on handicapping an AI enough to make it powerless also handicap it enough to severly restrict its usefulness to most potential funders. Eventually there will be somebody who chooses not to handicap their own AI, and then the guaranteed-to-be-harmless AI will end up dominated by the more powerful AI.
Answer: There are cases where humans are currently kept in the loop where it might not be necessarily needed, but the primary reason for that seems to be a worry of special circumstances arising which the machines cannot handle by themselves. Still, as technology gets more reliable, those concerns are likely to diminish – and an AI capable of handling a wider spectrum of situations than a human is exactly what you would want to replace most supervisor operations with. As computers become more human-like, humans will become less reluctant to give them power, quite likely even trusting them more than real humans.