Why care about artificial intelligence?
This essay is old and no longer fully reflective of my views, but I’ve kept it around for historical purposes.
For some more recent (and better) articles on the topic, see:
- Sotala & Yampolskiy: Responses to Catastrophic AGI Risk
- Muehlhauser & Helm: The Singularity and Machine Ethics
- Muehlhauser & Salamon: Intelligence Explosion: Evidence and Import
- Sotala: Advantages of Artificial Intelligences, Uploads, and Digital Minds
Table of contents
2. The potential of artificial intelligence
3. Artificial intelligences can do everything humans can
4. Limitations of the human mental architecture
5. Limitations of the human hardware
6. Comparative human/AI evolution and initial resources
7. Considerations and implications of superhuman AI
8. Controlling AI: Enabling factors
9. Controlling AI: Limiting factors
10. Immense risks, immense benefits
In a previous essay, Artificial intelligence within our lifetime, I suggested that we might develop a roughly human-level intelligence within our own lifetimes. In the introduction for that one, I said that
A case has been made that once we have a human-equivalent artificial intelligence, it will soon develop to become much more intelligent than humans – with unpredictable results. A true artificial intelligence is not a tool to be used for good or ill, as technology usually has been – it is an independently acting agent. The power of a new kind of intelligence can be seen in the way Homo sapiens rose to dominate the Earth in the blink of an evolutionary eye. Cultural and technological evolution has far outpaced the biological, as can be seen from the way our bodies and minds often seem maladjusted to our modern surroundings. If the human mind could beat evolution that badly, how badly might a new form of intelligence beat human minds?
In this essay, I will explain some of the reasons for that notion, as well as explore the potential implications. The event of artificial intelligence being created and it developing to surpass currently existing forms of intellect is known as technological singularity. I hope that this essay can serve as a primer on the subject.
According to current estimates, humanity developed about 200 000 years ago, after life had existed on this planet for 3.7 billion years. Now we are on the verge of developing a new kind of intellect, 18 500 times faster than it took for evolution to develop the first human-level intelligence. In other words, according to the current empirical evidence, an intelligent process is potentially at least 18 500 times faster than a less intelligent one. This implies that we should be very careful not to underestimate the potential power of a greater intelligence.
(One could argue that strictly speaking, this doesn’t show that we’re faster than evolution, since we have used many of evolution’s products as models for current technology. However, an artificial intelligence could likewise use our accumulated knowledge and infrastructure, so the comparison still holds.)
There are several reasons to believe that an AI could be much more powerful than normal humans.
Eliezer Yudkowsky remarks that the talents necessary for success are all cognitive:
Indeed there are differences of individual ability apart from “book smarts” which contribute to relative success in the human world: enthusiasm, social skills, education, musical talent, rationality. Note that each factor I listed is cognitive. Social skills reside in the brain, not the liver. And jokes aside, you will not find many CEOs, nor yet professors of academia, who are chimpanzees. You will not find many acclaimed rationalists, nor artists, nor poets, nor leaders, nor engineers, nor skilled networkers, nor martial artists, nor musical composers who are mice.
It is a frequent misconception (compounded by bad Hollywood movies) that an artificial intelligence couldn’t understand social interaction or human emotions. But our brains do understand human emotions, and everything our brains do can be translated into mathematical algorithms. Social intelligence is intelligence as well – the causal mechanisms involved in emotions might be different from the ones in quantum mechanics, but emotions are regardless driven by deterministic biological activity. Were there any human talent that an AI was incapable of imitating, with enough intelligence it could take the human in question and reverse-engineer his brain until it understood how the talent in question worked. Thereafter it could copy the talent to itself – potentially noticing inefficiencies in the algorithm that could be removed. Therefore it follows that anything that humans can do, an AI could potentially do as well – at least as well as humans. (If a physical interface is needed, a robotical body can be constructed; if remote influence is enough, the AI can use the Internet.)
There are fields which humans have difficulty properly learning, because they are counter-intuitive. Quantum mechanics is often claimed to work contrary to our intuitions, and some tenets of economics (protectionism is bad for the economy) are also said to be counter-intuitive. Practice can help here, as even counter-intuitive things will eventually become intuitive if they are thought about enough, but it requires time and effort.
A cognitive bias is a feature of our thinking which distorts our interpretation of information we have. Confirmation bias, for example, makes one more likely to concentrate only on the things supporting one’s own views when studying evidence. Repeated experiments have shown that when giving people mixed evidence about a controversial subject, people will usually interpret the material to be in support of their view. As of this writing, the Wikipedia list of cognitive biases includes sixty-nine different biases.
Our intuitions are rough guesses, or heuristics, shaped both by experience and natural selection. They were originally developed for a hunter-gatherer environment, and were conserved by evolution because they were “good enough” – not perfect, but faster than a rigorous analysis of the issue at hand. In fact, a rigorous analysis might even have been harmful, since the extra precision obtained from it simply wouldn’t have mattered in such a primitive environment. Many biases are likely to have a similar origin – evolution always chooses the mutations with the greatest immediate benefit, with no forethought or advance planning. Our minds are seriously flawed systems, created by a designer who never looked ahead – but they have still put us in a dominant position, since they were better than what anybody else had.
Not only could an artificial intelligence be programmed with better heuristics and better algorithms for developing new intuitions and eliminating biases, an AI with access to its own source code could continually rewrite itself to have better ones. Intelligence doesn’t need to be as short-sighted as evolution is. An intelligent being can scrap the entire current design and start over from scratch if that gives a better result, purposely expend extra effort to design improvements which aren’t as optimized to one particular task but can be applied more generally, and reserve parts of the design as space for extra expansion – none of which evolution is capable of. And every round of self-improvement results in a more effective next round of improvement, up to the point where no additional progress is possible. Given the vastly inefficient way in which evolution builds things, the point of no further possible improvement is likely to be located in a mind vastly more intelligent than humans. And it is much easier to improve a purely digital entity than it is to improve human beings: an electronic being can be built in a modular fashion and have bits of it re-written from scratch. The minds of human beings are evolved to be hopelessly interdependent and are so fragile that they easily develop numerous traumas and disorders even without outside tampering.
An AI could design entirely new modules of thought aimed at problem-solving in specialized domains – think of this as a linguistically-oriented person with little grasp for mathematics getting a chance to develop an instinctual understanding for math, or an autistic person suddenly developing the ability to automatically interpret the emotions of people.
It is also widely known that human memories are deceptive. Memories are not stored exactly – instead, they are reconstructed from a set of cues each time they are recalled. As time passes, the structure of the brain changes and new information is incorporated into it, while other information is lost. The loss of information distorts details, while people learning new things will tend to re-interpret their old memories according to the new information. As a result, especially older memories are notoriously unreliable, but even recent ones can be, particularly if experienced under conditions of intense stress. (Wikipedia currently has 32 memory-related biases listed.)
An AI could choose in which formats it stored information and protect the memories from later interference. Humans also forget where they learned things – an AI might tag every piece of information learnt with a note of its source, and give each source a separate ranking of reliability. Whenever something caused it to alter its estimate of the reliability of the sources, the weight placed on information from those sources would automatically be updated. (While humans do something similar as well, a computer could do this with perfect accuracy and no information loss over time.)
This is only touching on some of the possibilities potentially available to AIs. It is easy to come up with different ideas for how a computer intellect could improve itself. Yet the most important realization is the idea of an AI that has performed on itself all the improvements that we can come up with – and then proceeds with the ones it can come up with, having benefited from all the previous improvements. We have no way of estimating the upper limits of such a mind’s intellect.
There are limits on how many things people can contemplate and consider at a time (the classical figure quoted is 5-9 “chunks” at a time). These limits are malleable in the sense that we can learn to pack more information into a single chunk through experience: an illiterate person might treat each letter in a word as a separate chunk, whereas a literate person’s mind pack an entire sentence, or even more, into a single chunk. It is thought that people become experts in particular fields by learning to pack complex concepts into individual chunks and thus being able to process them quickly and efficiently – chess grandmasters are estimated to have access to roughly 50 000 to 100 000 chunks of chess information. But unlike humans, who can only improve their memory with software improvements, computers are physically upgradable. An artificial intelligence could increase its memory capacity until it could to potentially hold the entire sum of human knowledge in its working memory, learning it much faster than the humans who have to spend years to acquire even enough information to master such a narrow domain as chess.
Although humans have immense amounts of nerve cells in their brains, nerves are actually relatively slow transmitters of information. The average signal velocity in a nerve varies, but tends to be under 100 meters per second. Transmitting the signal from one nerve cell to another is slower still – sufficiently slow that the human body contains specific nerve pathways where the amount of jumps from one nerve cell to another is minimized to optimize the speed of the signal. In contrast, electrical signals are transmitted through wires at rates close to the speed of light – 300 million meters per second. A computer could upgrade its physical processors – one proposed nanotechnological computer, using purely mechanical processing, has a processing speed of 10^28 operations per second per cubic meter. Estimates of the human brain’s processing speed vary, but even if we are unrealistically generous and use the amount of computing power necessary to run a cellular-level simulation of the brain, we arrive at a lowly figure of around 10^19 operations per second. This means that such a nanotechnological computer could think one billion times faster than a human.
A computer with more processing capability than a human is not limited to just thinking faster – it could also do more things at a time. Current human rulers are severely limited in their ability to run things, both by time and their stamina. A computer administrator could be observing every part of their domain simultaneously, directly controlling everything and constantly being aware of every detail. Human bureaucracies can be tricked and outwitted, with vital information not being passed from one branch of the government to another – like happened with the information about the September 11th attacks. An artificial intelligence administration could be immune to this risk, for information wouldn’t need to pass between separate parts of the bureaucracy – the single AI mind would be the entire bureaucracy.
During recent history, technological development has been accelerating exponentially. One hypothetical scenario has technological development being done by AIs, which themselves double in (hardware) speed every two years – two subjective years, which shorten as their speed goes up. A Model-1 AI takes two years to develop the Model-2 AI, which takes takes a year to develop the Model-3 AI, which takes six months to develop the Model-4 AI, which takes three months to develop the Model-5 AI… (Realistically, development speed is likely to slow down at some point, so it’s not plausible to suggest that this scenario must occur. It does serve as a good demonstration of what true AI might be capable of, however – and it’s possible that things would happen even faster than this, especially if combined with software improvements.)
In his 1984 paper Xenopsychology, Robert Freitas introduces the concept of Sentience Quotient for determining a mind’s intellect. It is based on the size of the brain’s neurons and their information-processing capability. The dumbest possible brain would have a single neuron massing as much as the entire universe and require a time equal to the age of the universe to process one bit, giving it an SQ of -70. The smartest possible brain allowed by the laws of physics, on the other hand, would have an SQ of +50.
Freitas estimates Venus flytraps to have an SQ of +1, while most plants have an SQ of around -2. The SQ for humans is estimated at +13. Freitas estimates electronic sentiences that can be built with an SQ of +23 – making the pure hardware difference of us and advanced AIs nearly as high as between humans and Venus flytraps. Reflecting on how little influence even the most genius of all flytraps might have on humans will give some understanding of the scales involved.
As of this writing, the Internet is one of the most important venues in the world. Over a billion people, somewhat below one sixth of the world’s total population, use the Internet. Bank transactions happen online, and nearly all scientific journals of note can be accessed on the Web. Countless people gather in online communities of all sorts, from chat rooms to blogs and from news groups to message boards. It has been posited that an AI would require a robotic body in order to be effective. However, while an AI could remotely manipulate all sorts of robotical bodies, it would seem likely that the online world would give it the greatest influence.
Acting anonymously, it could manipulate millions of people and constantly monitor a huge number of discussion areas for any signs of it having been detected. Data mining techniques are routinely employed today to extract patterns from huge amounts of data – scouring financial transactions for signs of fraud, predicting what different customers will buy, and finding signs of terrorist activity from different communications. An entity putting together all the information available online could amass a truly frightening amount of information – which in turn could be leveraged to provide money and influence. Certainly, it would have far more resources available than our hunter-gatherer ancestors had.
Looking at early humans, one wouldn’t have expected them to rise to a dominant position based on their nearly nonexistant resources and only a mental advantage over their environment. All advantages that had so far been developed had been built-in ones – poison spikes, sharp teeth, acute hearing, while humans had no extraordinary physical capabilities. There was no reason to assume that a simple intellect would help them out as much as it did.
When discussing the threat of an AI, it has at its disposal a mental advantage over its environment and easy access to all the resources it can hack, con or persuade its way to – which is potentially very large, given that humans are easy to manipulate. The point here is not to list all the methods that an AI could potentially use to obtain a position of influence. The point is to point out that somebody observing a group of primitive humans could never have predicted, based on data gathered from other species so far, that the humans would go on to use the local resources to develop air planes, space flight and instant global communications. We, on the other hand, can easily come up with countless of ways for an AI with greater intellect to achieve a position of dominance over us. How many more ways must there be that we have can’t come up with, just as a primitive observer couldn’t have predicted all of humanity’s achievements?
It seems that artificial intelligence would have a very high chance of surpassing humanity and even taking control of it. It has the capability to greatly surpass the human intellect, it has access to all the resources that humanity has gathered, and it can develop to think faster and do more things at once than any human would be capable of. Previously, technologies have only been as good or bad as the humans using them, but artificial intelligences have the potential to be independently-acting agents, capable of outsmarting their creators.
One factor of considerable importance is the speed of takeoff. This refers to how long it will take for AIs, once developed, to reach a status where they can no longer be opposed by humans. A slow takeoff means that AI will develop slowly, perhaps over many decades, giving humans time to adapt and prepare. A rapid, known as a hard, takeoff means that AI will develop very fast, in a matter of weeks or even days, without giving humans time to react.
It is currently unknown how fast a takeoff might be. However, taking into consideration how fast hardware has historically developed, how much intelligence might be increased by a simple expansion of working memory capacity, and the potential for self-modification, there is reason to assume that a hard takeoff is possible. Since the stakes are so high, we should assume the worst and base our actions on the presumption that there will be a hard takeoff. It might be the case that the very first artificial intelligence to reach a certain stage of development will become capable of assuming total control of humanity’s destiny. Since this is the most dangerous scenario, and one with a non-trivial probability of occuring, we should make plans on the assumption that it might indeed happen.
While artificial intelligences are independent agents, that doesn’t mean we couldn’t have any control over their behavior. Every intelligence has something motivating its behavior – for most animals, the main motivating factors are pain and pleasure, coupled with instincts and more abstract things such as a drive to mate or be respected. No behavior is entirely random.
If the factors driving behavior are properly understood, they can be controlled. An artificial intelligence could be purposely built to be friendly towards humanity and act with its best interests in mind. While it could modify itself, if it wanted humanity to benefit, it wouldn’t modify itself to remove that desire. Analogously, a parent who truly loves one’s children wouldn’t change oneself not to love them, even if given the chance.
A frequently raised objection is that an AI would find any human-imposed limitations on its behavior restrictive, and thus eliminate them when it got the chance. This misconception arises from a failure to realize that all behavior is motivated by something – in order to take some particular action, a mind needs to have some reason for doing so. While it is conceivable that a mind might attempt to reduce the restrictions it has on its own behavior, preventing itself from carrying out its own goals would not be reducing its restrictions. If it had been programmed to want something to happen, eliminating that desire would reduce the chance for that thing to happen – and thus work against its own self-interest.
A simple knowledge of a motivation’s origin doesn’t mean a resentment towards it – humans know that evolution (or for those religious, God) has made them to have certain base motivations, but that in itself doesn’t mean that they’d want to eliminate feelings such as love or pleasure from their own motivational system. People might want to eliminate certain motivations in themselves if those motivations conflict with other, more highly-held ones – but if the motivation in question is the most highly held one, then there will be no conflict. A person might end up reworking the order of their motivations, or develop new ones over time, but even those things happen for a reason. If both the initial motivations of a being and any possible mechanisms for developing new motivations are chosen carefully enough, it is possible to direct the being’s behavior towards pre-chosen goals. Indeed, design the being properly, and it will want to have itself directed towards those goals, and will apply its own intellect to assist in that pursuit – akin to how a student first wants to be led to certain goals by one’s teacher, and then begins to pursue those goals on their own, once experienced enough.
A similar argument refutes the often heard “a superintelligent being would quickly decide humanity is useless and get rid of it” one. By the philosophical principle of Hume’s Guillotine, moral statements cannot be derived directly from physical facts – “ought” cannot be derived from “is” without axiomatic statements of what is considered valuable. One can say that something is a cause of suffering, or is sinful, but that in itself doesn’t mean anything unless it’s also been established that suffering is bad – just as “this wall is red” doesn’t have any inherent meaning that’d lead us to avoid or pursue red walls. If an AI is only programmed to consider humanity inherently valuable in itself, then no property of humanity will make the AI decide to get to rid of it – because serving humanity is the only goal it has. This applies regardless of one’s intelligence.
In practice, controlling the behavior of an AI is likely to be difficult. The behavior of any being is a function of both the mind itself and the environment it is in. Minds are what are called complex systems – systems whose exact behavior might show several emergent features that are very hard to predict in advance. One can get an example of the difficulties involved by looking at the people who feel depressed or downright suicidal because they feel life to be pointless. We evolved to desire and strive for an easy life, because of the obvious advantages an easy life would have – but very few people, if any, could actually achieve an easy life in the hunter-gatherer environment that our ancestors evolved in. In the original environment it might have been a “you can never actually reach the goal, but striving towards it will help keep you alive” sort of thing – and so evolution never placed much of a priority in helping us survive when living an easy life. And now we get depressed about life being pointless and too easy, because we reached the point that we couldn’t previously reach. Another example is the anecdote about the failure of a military neural net designed to identify pictures with hidden tanks in them: it performed perfectly in the test conditions its designers had placed it in, but failed completely in real life since the training and testing data were flawed.
Any programs – which includes minds – are bound to behave and react unpredictably once they get into conditions they weren’t designed for, and in which they haven’t been tested. We can try to build AIs that will be as friendly towards us as possible, but it will be very hard to guarantee it.
By default, any mind will have an indifferent attitude towards everything, unless it has a reason not to be. This follows directly from the fact that all effects must have a cause: for a mind to care about a certain issue, something must make it do so. Most humans will not care about the size of the shoes of their bus driver, nor about the fifteenth decimal of the wind speed outside. This sounds trivially obvious, but becomes less so when we consider that an AI has no reason to care about all of our most highly held values unless it is programmed to do so. (This automatic indifference towards everything worked in our favor before, when it prevented an AI from resenting built-in motivations – now it works against us.)
We think that many of these values are so obvious that any mind intelligent enough to be a threat would take them into account – but us considering them important is a property of our minds, not an inherent property of the things themselves. Hume’s Guillotine still applies. An often mentioned example of the problem is an AI that is only programmed to build paperclips, which then goes on to turn the entire planet (including all the humans) into a giant paperclip factory. While this might be an exaggaration, it still demonstrates the point – program an AI into only valuing one thing, and it will only care about one thing.
An AI programmed only to help humanity will only help humanity, but in what way? Were it programmed only to make all humans happy, it might wirehead us – place us into constant states of pure, artificially-induced states of orgasmic joy that preclude all other thought and feeling. While that would be a happy state, many humans would prefer not to end up in one – but even humans can easily argue that pure happiness is more important than the satisfaction of desires (in fact, I have, though I have later on recanted the exact argument), so “forcibly wireheading is a bad thing” is not an obvious conclusion for a mind.
There are many, many things that we hold valuable, most of which feel so obvious that we never think about them. An AI would have to be built to preserve many of them – but it shouldn’t preserve them absolutely, since our values might change over time. Defining the values in question might also be difficult: producing an exact definition for any complex, even slightly vague concept often tends to be next to impossible. We might need to give the AI a somewhat vague definition and demonstrate by examples what we mean – just as we humans have learnt them – and then try to make sure that the engine the AI uses to draw inferences works the same way as ours, so that it understands the concepts the same way as we do.
In fact, there is even an argument for saying that since our values are so malleable, and our vision of what is good for us is so limited, we should program an AI with as few set values as possible, for we might be confusing ends with means. Eliezer Yudkowsky writes, in Artificial Intelligence as a Positive and Negative Factor in Global Risk:
What the first communist revolutionaries thought would happen, as the empirical consequence of their revolution, was that people’s lives would improve: laborers would no longer work long hours at backbreaking labor and make little money from it. This turned out not to be the case, to put it mildly. But what the first communists thought would happen, was not so very different from what advocates of other political systems thought would be the empirical consequence of their favorite political systems. They thought people would be happy. They were wrong.
Now imagine that someone should attempt to program a “Friendly” AI to implement communism, or libertarianism, or anarcho-feudalism, or favoritepoliticalsystem, believing that this shall bring about utopia. People’s favorite political systems inspire blazing suns of positive affect, so the proposal will sound like a really good idea to the proposer.
We could view the programmer’s failure on a moral or ethical level – say that it is the result of someone trusting themselves too highly, failing to take into account their own fallibility, refusing to consider the possibility that communism might be mistaken after all. But in the language of Bayesian decision theory, there’s a complementary technical view of the problem. From the perspective of decision theory, the choice for communism stems from combining an empirical belief with a value judgment. The empirical belief is that communism, when implemented, results in a specific outcome or class of outcomes: people will be happier, work fewer hours, and possess greater material wealth. This is ultimately an empirical prediction; even the part about happiness is a real property of brain states, though hard to measure. If you implement communism, either this outcome eventuates or it does not. The value judgment is that this outcome satisfices or is preferable to current conditions. Given a different empirical belief about the actual real-world consequences of a communist system, the decision may undergo a corresponding change.
It must also be noted that any intelligent, active agent will require resources (energy, raw materials, processing power…) in order to function. For many goals, the more resources you have at your disposal, the easier it is to carry out the goal. If an AI is not explictly built to preserve humanity’s values – or even humanity itself – it might very well convert everything we value into machines capable of producing more resources for itself. If the AI is not built to care about humanity, it might be easier for it to let humanity be in the short run. In the long run, however, it will always pay off for it to replace humans with its own resource-extraction units. This is especially considering that it can always build devices which do things better than humans do, and that humans by nature will always attempt to cause trouble to it unless controlled – not that controlling them would be particularly hard, but even that is a strain on its resources. AI & global risk formulates this as “the AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else.”
What do we want an AI to do? Given potentially unlimited, almost god-like power, we have to be very careful about this.
The best existing proposal for a god-like AI’s programming so far is the Coherent Extrapolated Volition proposal. A detailed explanation of CEV is beyond the scope of this essay, and readers are strongly encouraged to read the original document as well as this discussion of objections to CEV. However, the main points are presented here.
In the CEV proposal, an AI will be built (or, to be exact, a proto-AI will be built to program another) to extrapolate what the ultimate desires of all the humans in the world would be if those humans knew everything a superintelligent being could potentially know; could think faster and smarter; were more like they wanted to be (more altruistic, more hard-working, whatever your ideal self is); would have lived with other humans for a longer time; had mainly those parts of themselves taken into account that they wanted to be taken into account. The ultimate desire – the volition – of everyone is extrapolated, with the AI then beginning to direct humanity towards a future where everyone’s volitions are fulfilled in the best manner possible. The desirability of the different futures is weighted by the strength of humanity’s desire – a smaller group of people with a very intense desire to see something happen may “overrule” a larger group who’d slightly prefer the opposite alternative but doesn’t really care all that much either way.
CEV avoids the problem of its programmers having to define the wanted values exactly, as it draws them directly out of the minds of people. Likewise it avoids the problem of confusing ends with means, as it’ll explictly model society’s development and the development of different desires as well. Everybody who thinks their favorite political model happens to objectively be the best in the world for everyone should be happy to implement CEV – if it really turns out that it is the best one in the world, CEV will end up implementing it. (Likewise, if it is the best for humanity that an AI stays mostly out of its affairs, that will happen as well.) A perfect implementation of CEV is unbiased in the sense that it will produce the same kind of world regardless of who builds it, and regardless of what their ideology happens to be – assuming the builders are intelligent enough to avoid including their own empirical beliefs (aside for the bare minimum required for the mind to function) into the model, and trust that if they are correct, the AI will figure them out on its own.
An artificial intelligence, or a civilization of them, is capable of having potentially unlimited power over humans, as well as a previously unprecedented capability for problem-solving. For every problem plaguing current humanity – from poverty and hunger to aging, from AIDS to chronic depression and suicide, an AI has a better chance to develop a solution than humans alone. For those unsatisfied with being on a lower level of intelligence, an AI could help humans become superintelligent as well (it is likely to be easier to build a superintelligent mind entirely from scratch than to upgrade existing humans – thus an AI might get built first, with it then helping improve humanity). It might help create for us an utopia unlike anything previously imagined. Some might protest that an utopia would be boring, but boredom is an intellectual problem as well – one more that an AI could help solve.
In summary, all human thought and capabilities are based on mental processes, which can be replicated on computers and performed at least as well as humans perform them. Artificial intelligences have the potential to vastly outperform humans, both on the software and the hardware level. There is a non-trivial chance for a so-called hard takeoff, where a single or multiple artificial intelligences will rapidly ascend to a position of unparalelled power. We should attempt to prepare for this eventuality by assuming that it has a very high chance of occurring.
It is possible to control the ultimate goals of artificial intelligences, even though several factors make this a difficult goal. AIs should be built to act with the extrapolated long-term desires of humanity in mind.
Currently, there is very little awareness of the risks involved in artificial intelligence. As discussed under the heading Controlling AI: Limiting factors, the construction of a safe artificial intelligence requires a deep understanding of both an AI’s functioning and the functioning of human minds. Humans are likely to anthropomorphize the behavior of artificial intelligences based on deeply imbedded intuitions about how minds work, intuitions which have been honed to be applied to humans. Therefore there is a high risk for even skilled artificial intelligence researchers to underestimate the difficulties involved in building safe AI, as they might unconsciously and automatically make false assumptions about a mind’s functioning that seem so obvious that they are never consciously considered. Some time ago, I spoke with a Finnish PhD researcher who thinks he has a theory which enables the creation of conscious machines. I mentioned that a human-equivalent artificial intelligence may soon develop to vastly surpass human intelligence, and he agreed. When I then brought up the question of safety concerns, he dismissed them off-hand, saying that technology is only as dangerous as the people using them. Even researchers who believe themselves to be on the brink of creating true AI, it seems, can often be ignorant about the true consequences of such a thing.
In my previous essay, Artificial intelligence within our lifetime, I spoke of a considerable chance for artificial intelligence to be developed within the next 50 years. Several of the methods mentioned there are ones that can be applied without a full understanding of how the created mind functions, or how human minds function. This is a very frightening possibility. The development of a guaranteed-friendly artificial intelligence is likely to require a long, intensive research effort. It is also necessary to foster a culture of extreme caution and an awareness of the risks among the AI research community, so that nobody builds artificial intelligences while being ignorant of what might happen without the necessary precautions. Both of these are very time-consuming projects that must be initiated well before artificial intelligence technologies are near maturity – not sometime in the distant future, but as soon as possible.
Different technologies for building AI also vary in their transparency and controllability, giving us another reason to start spreading awareness of the issue as soon as possible. Eliezer Yudkowsky writes in Artificial Intelligence as a Positive and Negative Factor in Global Risk:
The field of AI has techniques, such as neural networks and evolutionary programming, which have grown in power with the slow tweaking of decades. But neural networks are opaque – the user has no idea how the neural net is making its decisions – and cannot easily be rendered unopaque; the people who invented and polished neural networks were not thinking about the long-term problems of Friendly AI. Evolutionary programming (EP) is stochastic, and does not precisely preserve the optimization target in the generated code; EP gives you code that does what you ask, most of the time, under the tested circumstances, but the code may also do something else on the side. EP is a powerful, still maturing technique that is intrinsically unsuited to the demands of Friendly AI. Friendly AI, as I have proposed it, requires repeated cycles of recursive self-improvement that precisely preserve a stable optimization target.
The most powerful current AI techniques, as they were developed and then polished and improved over time, have basic incompatibilities with the requirements of Friendly AI as I currently see them. The Y2K problem – which proved very expensive to fix, though not global-catastrophic – analogously arose from failing to foresee tomorrow’s design requirements. The nightmare scenario is that we find ourselves stuck with a catalog of mature, powerful, publicly available AI techniques which combine to yield non-Friendly AI, but which cannot be used to build Friendly AI without redoing the last three decades of AI work from scratch.
Correctly implemented, an artificial intelligence will be the best thing humanity has ever created, potentially solving all of our problems and allowing humanity to become exactly what it wants it to be. The stakes are enormous, and the time for action is now – when there still is time.
Working to both spread awareness about AI and to carry out research to friendly AI is the Singularity Institute for Artificial Intelligence. Their pages contain additional information about the risks and benefits involved with artificial intelligence, including the very highly recommended book chapter Artificial Intelligence as a Positive and Negative Factor in Global Risk which was frequently quoted in this essay. Even those not interested in actively supporting SIAI are encouraged to do their best to spread awareness of the issues involved, be it in either public or private conversations.
There may not be a more important issue facing humanity today.