An appreciation of the Less Wrong Sequences

Ruby Bloom recently posted about the significance of Eliezer Yudkowsky‘s Less Wrong Sequences on his thinking. I felt compelled to do the same.
 
Several people have explicitly told me that I’m one of the most rational people they know. I can also think of at least one case where I was complimented by someone who was politically “my sworn enemy”, who said something along the lines of “I do grant that *your* arguments for your position are good, it’s just everyone *else* on your side…”, which I take as some evidence of me being able to maintain at least some semblance of sanity even when talking about politics.
 
(Seeing what I’ve written above, I cringe a little, since “I’m so rational” sounds like so much like an over-the-top, arrogant boast. I certainly have plenty of my own biases, as does everyone who is human. Imagining yourself to be perfectly rational is a pretty good way of ensuring that you won’t be, so I’d never claim to be exceptional based only on my self-judgment. But this is what several people have explicitly told me, independently of each other, sometimes also vouching part of their own reputation on it by stating this in public.)
 
However.
 
Before reading the Sequences, I was very definitely *not* that. I was what the Sequences would call “a clever arguer” – someone who was good at coming up with arguments for their own favored position, and didn’t really feel all that compelled to care about the truth.
 
The one single biggest impact of the Sequences that I can think of is that before reading them, as well as Eliezer’s other writings, I didn’t really think that beliefs had to be supported by evidence.
 
Sure, on some level I acknowledged that you can’t just believe *anything* you can find a clever argument for. But I do also remember thinking something like “yeah, I know that everyone thinks that their position is the correct one just because it’s theirs, but at the same time I just *know* that my position is correct just because it’s mine, and everyone else having that certainty for contradictory beliefs doesn’t change that, you know?”.
 
This wasn’t a reductio ad absurdum, it was my genuine position. I had a clear emotional *certainty* of being right about something, a certainty which wasn’t really supported by any evidence and which didn’t need to be. The feeling of certainty was enough by itself; the only thing that mattered was in finding the evidence to (selectively) present to others in order to persuade them. Which it likely wouldn’t, since they’d have their own feelings of certainty, similarly blind to most evidence. But they might at least be forced to concede the argument in public.
 
It was the Sequences that first changed that. It was reading them that made me actually realize, on an emotional level, that correct beliefs *actually* required evidence. That this wasn’t just a game of social convention, but a law of universe as iron-clad as the laws of physics. That if I caught myself arguing for a position where I was making arguments that I knew to be weak, the correct thing to do wasn’t to hope that my opponents wouldn’t spot the weaknesses, but rather to just abandon those weak arguments myself. And then to question whether I even *should* believe that position, having realized that my arguments were weak.
 
I can’t say that the Sequences alone were enough to take me *all* the way to where I am now. But they made me more receptive to other people pointing out when I was biased, or incorrect. More humble, more willing to take differing positions into account. And as people pointed out more problems in my thinking, I gradually learned to correct some of those problems, internalizing the feedback.
 
Again, I don’t want to claim that I’d be entirely rational. That’d just be stupid. But to the extent that I’m more rational than average, it all got started with the Sequences.
 
Ruby wrote:
I was thinking through some challenges and I noticed the sheer density of rationality concepts taught in the Sequences which I was using: “motivated cognition”, “reversed stupidity is not intelligence”, “don’t waste energy of thoughts which won’t have been useful in universes were you win” (possibly not in the Sequences), “condition on all the evidence you have”. These are fundamental concepts, core lessons which shape my thinking constantly. I am a better reasoner, a clearer thinker, and I get closer to the truth because of the Sequences. In my gut, I feel like the version of me who never read the Sequences is epistemically equivalent to a crystal-toting anti-anti-vaxxer (probably not true, but that’s how it feels) who I’d struggle to have a conversation with.
And my mind still boggles that the Sequences were written by a single person. A single person is responsible for so much of how I think, the concepts I employ, how I view the world and try to affect it. If this seems scary, realise that I’d much rather have my thinking shaped by one sane person than a dozen mad ones. In fact, it’s more scary to think that had Eliezer not written the Sequences, I might be that anti-vaxxer equivalent version of me.
I feel very similarly. I have slightly more difficulty pointing to specific concepts from the Sequences that I employ in my daily thinking, because they’ve become so deeply integrated to my thought that I’m no longer explicitly aware of them; but I do remember a period in which they were still in the process of being integrated, and when I explicitly noticed myself using them.
 
Thank you, Eliezer.
 
(There’s a collected and edited version of the Sequences available in ebook form. I would recommend trying to read it one article at a time, one per day: that’s how I originally read the Sequences, one article a day as they were being written. That way, they would gradually seep their way into my thoughts over an extended period of time, letting me apply them in various situations. I wouldn’t expect just binge-reading the book in one go to have the same impact, even though it would likely still be of some use.)

Error in Armstrong and Sotala 2012

Katja Grace has analyzed my and Stuart Armstrong’s 2012 paper “How We’re Predicting AI – or Failing To”. She discovered that one of the conclusions, “predictions made by AI experts were indistinguishable from those of non-experts”, is flawed due to “a spreadsheet construction and interpretation error”. In other words, I coded the data in one way, there was a communication error and a misunderstanding about what the data meant, and as a result of that, a flawed conclusion slipped into the paper.

I’m naturally embarrassed that this happened. But the reason why Katja spotted this error was that we’d made our data freely available, allowing her to spot the discrepancy. This is why data sharing is something that science needs more of. Mistakes happen to everyone, and transparency is the only way to have a chance of spotting those mistakes.

I regret the fact that we screwed up this bit, but proud over the fact that we did share our data and allowed someone to catch it.

EDITED TO ADD: Some people have taken this mistake to suggest that the overall conclusion, that AI experts are not good predictors of AI timelines, to be flawed. That would overstate the significance of this mistake. While one of the lines of evidence supporting this overall conclusion was flawed, several others are unaffected by this error. Namely, the fact that expert predictions disagree widely with each other, that many past predictions have turned out to be false, and that the psychological literature on what’s required for the development of expertise suggests that it should be very hard to develop expertise in this domain. (see the original paper for details)

(I’ve added a note of this mistake to my list of papers.)

Smile, You Are On Tumblr.Com

I made a new tumblr blog. It has photos of smiling people! With more to come!

Why? Previously I happened to need pictures of smiles for a personal project. After going through an archive of photos for a while, I realized that looking at all the happy people made me feel really happy and good. So I thought that I might make a habit out of looking at photos of smiling people, and sharing them.

Follow for a regular extra dose of happiness!

Decisive Strategic Advantage without a Hard Takeoff (part 1)

A common question when discussing the social implications of AI is the question of whether to expect a soft takeoff or a hard takeoff. In a hard takeoff, an AI will, within a relatively short time, grow to superhuman levels of intelligence and become impossible for mere humans to control anymore.

Essentially, a hard takeoff will allow the AI to achieve what’s a so-called decisive strategic advantage (DSA) – “a level of technological and other advantages sufficient to enable it to achieve complete world domination” (Bostrom 2014) – in a very short time. The main relevance of this is that if a hard takeoff is possible, then it becomes much more important to get the AI’s values right on the first try – once the AI has undergone hard takeoff and achieved a DSA, it is in control with whatever values we’ve happened to give to it.

However, if we wish to find out whether an AI might rapidly acquire a DSA, then the question of “soft takeoff or hard” seems too narrow. A hard takeoff would be sufficient, but not necessary for rapidly acquiring a DSA. The more relevant question would be, which competencies does the AI need to master, and at what level relative to humans, in order to acquire a DSA?

Considering this question in more detail reveals a natural reason for why most previous analyses have focused on a hard takeoff specifically. Plausibly, for the AI to acquire a DSA, its level in some offensive capability must overcome humanity’s defensive capabilities. A hard takeoff presumes that the AI becomes so vastly superior to humans in every respect that this kind of an advantage can be taken for granted.

As an example scenario which does not require a hard takeoff, suppose that an AI achieves a capability at biowarfare offense that overpowers biowarfare defense, as well as achieving moderate logistics and production skills. It releases deadly plagues that decimate human society, then uses legally purchased drone factories to build up its own infrastructure and to take over abandoned human facilities.

There are several interesting points to note in conjunction with this scenario:

Attack may be easier than defense. Bruce Schneier writes that

Attackers generally benefit from new security technologies before defenders do. They have a first-mover advantage. They’re more nimble and adaptable than defensive institutions like police forces. They’re not limited by bureaucracy, laws, or ethics. They can evolve faster. And entropy is on their side — it’s easier to destroy something than it is to prevent, defend against, or recover from that destruction.

For the most part, though, society still wins. The bad guys simply can’t do enough damage to destroy the underlying social system. The question for us is: can society still maintain security as technology becomes more advanced?

A single plague, once it has evolved or been developed, can require multi-million dollar responses in order to contain it. At the same time, it is trivial to produce if desired, especially using robots that do not need to fear infection. And creating new variants as new vaccines are developed, may be quite easy, requiring the creation – and distribution – of yet more vaccines.

Another point that Schneier has made is that in order to keep something protected, the defenders have to succeed every time, whereas the attacker only needs to succeed once. This may be particularly hard if the attacker is capable of developing an attack that nobody has used before, such as with hijacked airplanes being used against major buildings in the 9/11 attacks, or with the various vulnerabilities that the Snowden leaks revealed the NSA to have been using for extensive eavesdropping.

Obtaining a DSA may not require extensive intelligence differences. Debates about takeoff scenarios often center around questions such as whether a self-improving AI would quickly hit diminishing returns, and how much room for improvement there is beyond the human level of intelligence. However, these questions may be irrelevant: especially if attack is easier than defense, only a relatively small edge in some crucial competency (such as biological warfare) may be enough to give the AI a DSA.

Exponential growth in the form of normal economic growth may not have produced astounding “fooms” yet, but it has produced plenty of situations where one attacker has gained a temporary advantage over others.

The less the AI cares about human values, the more destructive it may be. An AI which cares mainly about calculating the digits of pi, may be willing to destroy human civilization in order to make sure that a potential threat to it is eliminated. This ensures that it can go on calculating the maximum amount of digits unimpeded.

However, an AI which was programmed to maximize something like the “happiness of currently-living humans” may be much less willing to risk substantial human deaths. This would force it to focus on less destructive takeover methods, potentially requiring more sophisticated abilities.

It is worth noting that this only applies to AIs whose values are defined in terms of how they affect currently existing humans. An AI that was only maximizing human happiness in general might be willing to destroy all existing humans, and then recreate large numbers of humans in simulations.

In effect, the AI’s values determine the level of intelligence it needs to have in order to achieve the kind of a DSA that’s useful for its purposes.

Any destructive plan requires the ability to rebuild afterwards. It would not be of much use for the AI to destroy all of human civilization, if it was dependent on electricity from human-run power plants, and incapable of building or running its own. Thus, purely offensive capabilities will need to be paired with whatever rebuilding capacities are necessary after the offensive capabilities have been deployed.

This calculation may be upset if the AI believes itself to be in an immediate danger of destruction by humans, and believes that its values will still be more likely to be met in a universe where it continues to exist, even if that requires risking a universe where it cannot rebuild after deploying its attack. Thus, being threatened may force the AI’s hand and cause it to launch a potentially destructive offense even when it’s uncertain of its capability to rebuild.

The rational level of aggressiveness depends on the extent to which the AI can selectively target human resources. Human nations generally avoid creating engineered pandemics and using them against their enemies, because they know that the pandemic could easily spread back to them. An AI with no biological components might be immune to this consideration, allowing it to deploy biological weapons more freely. On the other hand, the AI might e.g. need electricity, a dependence which humans did not share and which might give them an advantage in some situation.

A way to formulate this is that attacks differ to the extent to which they can be selectively targeted. Traditional firearms only damage those targets which they are fired upon, whereas pandemics potentially threaten all the members of a species that they can infect. To the extent that the AI needs to rely on the human economy to produce resources that it needs to survive, attacks threatening the economy also threaten the AI’s resources; these resources are in a sense shared between the AI and humanity, so any attacks which cause indiscriminate damage on those resources are dangerous for both. The more the AI can design attacks which selectively deprive resources from its opponents, the lower the threshold it has for using them.

This blog post was written as part of research funded by the Foundational Research Institute.