## Reality is broken, or, an XCOM2 review

Yesterday evening I went to the grocery store, and was startled to realize that I was suddenly in a totally different world.

Computer games have difficulty grabbing me these days. Many of the genres I used to enjoy as a kid have lost their appeal: point-and-click -style adventure requires patience and careful thought, but I already deal with plenty of things that require patience and careful thought in real life, so for games I want something different. 4X games mostly seem like pure numerical optimization exercises these days, and have lost that feel of discovery and sense of wonder. In general, I used to like genres like turn-based strategy or adventure that had no time constraints, but those now usually feel too slow-paced to pull me in; whereas pure action action games I’ve never been particularly good at. (I tried Middle-Earth: Shadow of Mordor for a bit recently, and quit after a very frustrating two hours where I attempted a simple beginning quest for about a dozen times, only to be killed by the same orc each time.)

Like the previous XCOM remake, Firaxis’s XCOM2 managed the magic of transporting me completely elsewhere, in the same way that some of my childhood classics did. I did not even properly realize how deeply I’d become immersed the game, until I went outside, and the sheer differentness of the real world and the game world startled me – somewhat similar to the shock of jumping into cold water, your body suddenly and obviously piercing through a surface that separates two different realms of existence.

A good description of my experience with the game comes, oddly enough, from Michael Vassar describing something that’s seemingly completely different. He talks about the way that two people, acting together, can achieve such a state of synchrony that they seem to meld into a single being:

In real-time domains, one rapidly assesses the difficulty of a challenge. If the difficulty seems manageable, one simply does, with no holding back, reflecting, doubting, or trying to figure out how one does. Figuring out how something is done implicitly by a neurological process which is integrated with doing. Under such circumstances, acting intuitively in real time, the question of whether an action is selfish or altruistic or both or neither never comes up, thus in such a flow state one never knows whether one is acting cooperatively, competitively, or predatorily. People with whom you are interacting […] depend on the fact that you and they are in a flow-state together. In so far as they and you become an integrated process, your actions flow from their agency as well as your own[.]

XCOM2 is not actually a real-time game: it is firmly turn-based. Yet your turns are short and intense, and the game’s overall aesthetics reinforce a feeling of rapid action and urgency. There is a sense in which it feels like the player and the game become melded together, there being a constant push-and-pull in which you act and the game responds; the game acts and you respond. A feeling of complete immersion and synchrony with your environment, with a perfect balance between the amount of time that it pays to think and the amount of time that it pays to act, so that the pace neither slows down to a crawl nor becomes one of rushed doing without understanding.

It is in some ways a scary effect: returning to the mundaneness of the real world, there was a strong sense of “it’s so sad that all of my existence can’t be spent playing games like that”, and a corresponding realization of how dangerous that sentiment was. Yet it felt very different from the archetypical addiction: there wasn’t that feel of an addict’s understanding of how ultimately dysfunctional the whole thing was, or struggling against something which you knew was harmful and of no real redeeming value. Rather, it felt like a taste of what human experience should be like, of how sublime and engaging our daily reality could be, but rarely is.

Jane McGonigal writes, in her book Reality is Broken:

Where, in the real world, is that gamer sense of being fully alive, focused, and engaged in every moment? Where is the gamer feeling of power, heroic purpose, and community? Where are the bursts of exhilarating and creative game accomplishment? Where is the heart-expanding thrill of success and team victory? While gamers may experience these pleasures occasionally in their real lives, they experience them almost constantly when they’re playing their favorite games. […]

Reality, compared to games, is broken. […]

The truth is this: in today’s society, computer and video games are fulfilling genuine human needs that the real world is currently unable to satisfy. Games are providing rewards that reality is not. They are teaching and inspiring and engaging us in ways that reality is not. They are bringing us together in ways that reality is not.

If enough good games were available, it would be easy to just get lost in games, to escape the brokeness of reality and retreat to a more perfect world. Perhaps I’m lucky in that I rarely encounter games of this caliber, that would be so much more moment-to-moment fulfilling than the real world is. Firaxis’s previous XCOM also had a similar immersive effect on me, but eventually I learned the game and it ceased to hold new surprises, and it lost its hold. Eventually the sequel will also have most of its magic worn away.

It’s likely better this way. This way it can function for me the way that art should: not as a mindless escape, but as a moment of beauty that reminds us that it’s possible to have a better world than this. As a reminder that we can work to bring the world closer to that.

McGonigal continues:

What if we decided to use everything we know about game design to fix what’s wrong with reality? What if we started to live our real lives like gamers, lead our real businesses and communities like game designers, and think about solving real-world problems like computer and video game theorists? […]

Instead of providing gamers with better and more immersive alternatives to reality, I want all of us to be responsible for providing the world at large with a better and more immersive reality […] take everything game developers have learned about optimizing human experience and organizing collaborative communities and apply it to real life

We can do that.

## Me and Star Wars

Unlike the other kids in my neighborhood, who went to the Finnish-speaking elementary school right near our suburban home, I went to a Swedish-speaking school much closer to the inner city. Because of this, my mom would come pick me up from school, and sometimes we would go do things in town, since we were already nearby.

At one point we developed a habit of making a video rental store the first stop after school. We’d return whatever we had rented the last time, and I’d get to pick one thing to rent next. The store had a whole rack devoted to NES games, and there was a time when I was systematically going through their whole collection, seeking to play everything that seemed interesting. But at times I would also look at their VHS collection, and that was how I first found Star Wars.

I don’t have a recollection of what it was to see any of the Star Wars movies for the very first time. But I do have various recollections of how they influenced my life, afterwards.

For many years, there was “Sotala Force”, an imaginary space army in a setting of make believe that combined elements of Star Wars and Star Trek. I was, of course, its galaxy-famous leader, with some of my friends at the time holding top positions in it. It controlled maybe one third of the galaxy, and its largest enemy was something very loosely patterned after the Galactic Empire, which held maybe four tenths of the galaxy.

The leader of the enemy army, called (Finns, don’t laugh too much now) Kiero McLiero, took on many traits from Emperor Palpatine. These included the ability, taken from the Dark Empire comics, to keep escaping death by always resurrecting in a new body, meaning that our secret missions attacking his bases could end in climactic end battles where we’d kill him, over and over again. Naturally, me and my friends were Jedi Knights and Masters, using a combination of the Force, lightsabers, and whatever other weapons we happened to have, to carry out our noble missions.

There was a girl in elementary school who I sometimes hung out with, and who I had a huge and hopelessly unrequited crush on. Among other shared interests like Lord of the Rings, we were both fans of Star Wars, and would sometimes discuss it. I only remember some fragments of those discussions: an agreement that Empire Strikes Back and Return of the Jedi were superior movies to A New Hope; both having heard of the Tales of the Jedi comics but neither having managed to find them anywhere; a shared feeling of superiority and indignation towards everyone who was making such a blown-out-of-proportions fuss about Jar-Jar Binks in the Phantom Menace, given that Lucas had clearly said that he was aiming these new movies at children.

The third last memory I have of seeing her, was at a trip to a beach we had at the end of 9th grade; I’d brought a toy dual-bladed lightsaber, while she’d brought a single-bladed one. There were many duels on that beach.

The very last memory that I have of seeing her, after we’d gone on to different schools, was when we ran across each other in the premiere of the Revenge of the Sith, three years later. We chatted a bit about the movie, what had happened to us in the intervening years, and then went our separate ways again.

For a kid interested in computer games in 1990s Finland, Pelit (“Games”) was The magazine to read. Another magazine that was of interest, also having computer games but mostly covering more general PC issues, was MikroBitti. Of these, both occasionally discussed a fascinating-sounding thing, table-top role-playing games, with MikroBitti running a regular column that discussed them. They sounded totally awesome and I wanted to get one. I asked my dad if I could have an RPG, and he was willing to buy one, if only I told him what they looked like and where they might be found. This was the part that left me stumped.

Until one day I found a store that… I don’t remember what exactly it sold. It might have been an explicit gaming store or it might only have had games as one part of its collection. And I have absolutely no memory of how I found it. But one way or the other, there it was, including the star prize: a Star Wars role-playing game (the West End Games one, second edition).

For some reason that I have forgotten, I didn’t actually get the core rules at first. The first thing that I got was a supplement, Heroes & Rogues, which had a large collection of different character templates depicting all kinds of Rebel, Imperial, and neutral characters, as well as an extended “how to make a realistic character” section. The book was in English, but thanks to my extensive NES gaming experience, I could read it pretty well at that point. Sometime later, I got the actual core rules.

I’m not sure if I started playing right away; I have the recollection that I might have spent a considerable while just buying various supplements for the sake of reading them, before we started actually playing. “We” in this case was me and one friend of mine, because we didn’t have anyone else to play with. This resulted in creative non-standard campaigns, in which we both had several characters (in addition to me also being the game master) who we played simultaneously. Those games lasted until we found the local university’s RPG club (which also admitted non-university students; I think I was 13 the first time I showed up). After finding it, we transitioned to more ordinary campaigns and those weird two-player mishmashes ended. They were fun while they lasted, though.

After the original gaming store where I’d been buying my Star Wars supplements closed, I eventually found another. And it didn’t only have Star Wars RPG supplements! It also had Star Wars novels that were in English, which had never been translated into Finnish!

So it came to be that the first novel that I read in English was X-Wing: Wedge’s Gamble, telling the story of the Rebellion’s (or, as it was known by that time, the New Republic’s) struggle to capture Coruscant some years after the events in Return of the Jedi. I remember that this was sometime in yläaste (“upper elementary school”), so I was around 13-15 years old. An actual novel was a considerably bigger challenge for my English-reading skills than RPG supplements were, so there was a lot of stuff in the novel that I didn’t quite get. But still, I finished it, and then went on to buy and read the rest of the novels in the X-Wing series.

The Force Awakens, Disney’s new Star Wars film, comes out today. Star Wars has previously been a part of many notable things in my life. It shaped the make believe setting that I spent several years playing in, it was one of the things I had in common with the first girl I ever had a crush on, its officially licensed role-playing game was the first one that I ever played, and one of its licensed novels was the first novel that I ever read in English.

Today it coincides with another major life event. The Finnish university system is different from the one in many other countries in that, for a long while, we didn’t have any such thing as a Bachelor’s degree. You were admitted to study for five years, and then at the end, you would graduate with a Master’s degree. Reforms carried out in 2005, intended to make Finnish higher education more compatible with the systems in other countries, introduced the concept of a Bachelor’s degree as an intermediary step that you needed to do in between. But upon being admitted to university, you would still be given the right to do both degrees, and people still don’t consider a person to have really graduated before they have their Master’s.

I was admitted to university back in 2006. For various reasons, my studies have taken longer than the recommended time, which would have had me graduating with my Master’s in 2011. But late, as they say, is better than never: today’s my official graduation day for my MSc degree. There will be a small ceremony at the main university building, after which I will celebrate by going to see what my old friends Luke, Leia and Han are up to these days.

## Desiderata for a model of human values

Soares (2015) defines the value learning problem as

By what methods could an intelligent machine be constructed to reliably learn what to value and to act as its operators intended?

There have been a few attempts to formalize this question. Dewey (2011) started from the notion of building an AI that maximized a given utility function, and then moved on to suggest that a value learner should exhibit uncertainty over utility functions and then take “the action with the highest expected value, calculated by a weighted average over the agent’s pool of possible utility functions.” This is a reasonable starting point, but a very general one: in particular, it gives us no criteria by which we or the AI could judge the correctness of a utility function which it is considering.

To improve on Dewey’s definition, we would need to get a clearer idea of just what we mean by human values. In this post, I don’t yet want to offer any preliminary definition: rather, I’d like to ask what properties we’d like a definition of human values to have. Once we have a set of such criteria, we can use them as a guideline to evaluate various offered definitions.

By “human values”, I here basically mean the values of any given individual: we are not talking about the values of, say, a whole culture, but rather just one person within that culture. While the problem of aggregating or combining the values of many different individuals is also an important one, we should probably start from the point where we can understand the values of just a single person, and then use that understanding to figure out what to do with conflicting values.

In order to make the purpose of this exercise as clear as possible, let’s start with the most important desideratum, of which all the others are arguably special cases of:

1. Useful for AI safety engineering. Our model needs to be useful for the purpose of building AIs that are aligned with human interests, such as by making it possible for an AI to evaluate whether its model of human values is correct, and by allowing human engineers to evaluate whether a proposed AI design would be likely to further human values.

In the context of AI safety engineering, the main model for human values that gets mentioned is that of utility functions. The one problem with utility functions that everyone always brings up, is that humans have been shown not to have consistent utility functions. This suggests two new desiderata:

2. Psychologically realistic. The proposed model should be compatible with that which we know about current human values, and not make predictions about human behavior which can be shown to be empirically false.

3. Testable. The proposed model should be specific enough to make clear predictions, which can then be tested.

As additional requirements related to the above ones, we may wish to add:

4. Functional. The proposed model should be able to explain what the functional role of “values” is: how do they affect and drive our behavior? The model should be specific enough to allow us to construct computational simulations of agents with a similar value system, and see whether those agents behave as expected within some simulated environment.

5. Integrated with existing theories. The proposed definition model should, to as large an extent possible, fit together with existing knowledge from related fields such as moral psychology, evolutionary psychology, neuroscience, sociology, artificial intelligence, behavioral economics, and so on.

However, I would argue that as a model of human value, utility functions also have other clear flaws. They do not clearly satisfy these desiderata:

6. Suited for modeling internal conflicts and higher-order desires. A drug addict may desire a drug, while also desiring that he not desire it. More generally, people may be genuinely conflicted between different values, endorsing contradictory sets of them given different situations or thought experiments, and they may struggle to behave in a way in which they would like to behave. The proposed model should be capable of modeling these conflicts, as well as the way that people resolve them.

7. Suited for modeling changing and evolving values. A utility function is implicitly static: once it has been defined, it does not change. In contrast, human values are constantly evolving. The proposed model should be able to incorporate this, as well as to predict how our values would change given some specific outcomes. Among other benefits, an AI whose model of human values had this property might be able to predict things that our future selves would regret doing (even if our current values approved of those things), and warn us about this possibility in advance.

8. Suited for generalizing from our existing values to new ones. Technological and social change often cause new dilemmas, for which our existing values may not provide a clear answer. As a historical example (Lessig 2004), American law traditionally held that a landowner did not only control his land but also everything above it, to “an indefinite extent, upwards”. Upon the invention of this airplane, this raised the question – could landowners forbid airplanes from flying over their land, or was the ownership of the land limited to some specific height, above which the landowners had no control? In answer to this question, the concept of landownership was redefined to only extend a limited, and not an indefinite, amount upwards. Intuitively, one might think that this decision was made because the redefined concept did not substantially weaken the position of landowners, while allowing for entirely new possibilities for travel. Our model of value should be capable of figuring out such compromises, rather than treating values such as landownership as black boxes, with no understanding of why people value them.

As an example of using the current criteria, let’s try applying them to the only paper that I know of that has tried to propose a model of human values in an AI safety engineering context: Sezener (2015). This paper takes an inverse reinforcement learning approach, modeling a human as an agent that interacts with its environment in order to maximize a sum of rewards. It then proposes a value learning design where the value learner is an agent that uses Solomonoff’s universal prior in order to find the program generating the rewards, based on the human’s actions. Basically, a human’s values are equivalent to a human’s reward function.

Let’s see to what extent this proposal meets our criteria.

1. Useful for AI safety engineering. To the extent that the proposed model is correct, it would clearly be useful. Sezener provides an equation that could be used to obtain the probability of any given program being the true reward generating program. This could then be plugged directly into a value learning agent similar to the ones outlined in Dewey (2011), to estimate the probability of its models of human values being true. That said, the equation is incomputable, but it could be possible to construct computable approximations.
2. Psychologically realistic. Sezener assumes the existence of a single, distinct reward process, and suggests that this is a “reasonable assumption from a neuroscientific point of view because all reward signals are generated by brain areas such as the striatum”. On the face of it, this seems like an oversimplification, particularly given evidence suggesting the existence of multiple valuation systems in the brain. On the other hand, since the reward process is allowed to be arbitrarily complex, it could be taken to represent just the final output of the combination of those valuation systems.
3. Testable. The proposed model currently seems to be too general to be accurately tested. It would need to be made more specific.
4. Functional. This is arguable, but I would claim that the model does not provide much of a functional account of values: they are hidden within the reward function, which is basically treated as a black box that takes in observations and outputs rewards. While a value learner implementing this model could develop various models of that reward function, and those models could include internal machinery that explained why the reward function output various rewards at different times, the model itself does not make any assumptions of this.
5. Integrated with existing theories. Various existing theories could in principle used to flesh out the internals of the reward function, but currently no such integration is present.
6. Suited for modeling internal conflicts and higher-order desires. No specific mention of this is made in the paper. The assumption of a single reward function that assigns a single reward for every possible observation seems to implicitly exclude the notion of internal conflicts, with the agent always just maximizing a total sum of rewards and being internally united in that goal.
7. Suited for modeling changing and evolving values. As written, the model seems to consider the reward function as essentially unchanging: “our problem reduces to finding the most probable $p_R$ given the entire action-observation history $a_1o_1a_2o_2 . . . a_no_n$.”
8. Suited for generalizing from our existing values to new ones. There does not seem to be any obvious possibility for this in the model.

I should note that despite its shortcomings, Sezener’s model seems like a nice step forward: like I said, it’s the only proposal that I know of so far that has even tried to answer this question. I hope that my criteria would be useful in spurring the development of the model further.

As it happens, I have a preliminary suggestion for a model of human values which I believe has the potential to fulfill all of the criteria that I have outlined. However, I am far from certain that I have managed to find all the necessary criteria. Thus, I would welcome feedback, particularly including proposed changes or additions to these criteria.

## Learning from painful experiences

A model that I’ve found very useful is that pain is an attention signal. If there’s a memory or thing that you find painful, that’s an indication that there’s something important in that memory that your mind is trying to draw your attention to. Once you properly internalize the lesson in question, the pain will go away.

That’s a good principle, but often hard to apply in practice. In particular, several months ago there was a social situation that I screwed up big time, and which was quite painful to think of afterwards. And I couldn’t figure out just what the useful lesson was there. Trying to focus on it just made me feel like a terrible person with no social skills, which didn’t seem particularly useful.

Yesterday evening I again discussed it a bit with someone who’d been there, which helped relieve the pain a bit, enough that the memory wasn’t quite as aversive to look at. Which made it possible for me to imagine myself back in that situation and ask, what kinds of mental motions would have made it possible to salvage the situation? When I first saw the shocked expressions of the people in question, instead of locking up and reflexively withdrawing to an emotional shell, what kind of an algorithm might have allowed me to salvage the situation?

Answer to that question: when you see people expressing shock in response to something that you’ve said or done, realize that they’re interpreting your actions way differently than you intended them. Starting from the assumption that they’re viewing your action as bad, quickly pivot to figuring out why they might feel that way. Explain what your actual intentions were and that you didn’t intend harm, apologize for any hurt you did cause, use your guess of why they’re reacting badly to acknowledge your mistake and own up to your failure to take that into account. If it turns out that your guess was incorrect, let them correct you and then repeat the previous step.

That’s the answer in general terms, but I didn’t actually generate that answer by thinking in general terms. I generated it by imagining myself back in the situation, looking for the correct mental motions that might have helped out, and imagining myself carrying them out, saying the words, imagining their reaction. So that the next time that I’d be in a similar situation, it’d be associated with a memory of the correct procedure for salvaging it. Not just with a verbal knowledge of what to do in abstract terms, but with a procedural memory of actually doing it.

That was a painful experience to simulate.

But it helped. The memory hurts less now.