Desiderata for a model of human values

Soares (2015) defines the value learning problem as

By what methods could an intelligent machine be constructed to reliably learn what to value and to act as its operators intended?

There have been a few attempts to formalize this question. Dewey (2011) started from the notion of building an AI that maximized a given utility function, and then moved on to suggest that a value learner should exhibit uncertainty over utility functions and then take “the action with the highest expected value, calculated by a weighted average over the agent’s pool of possible utility functions.” This is a reasonable starting point, but a very general one: in particular, it gives us no criteria by which we or the AI could judge the correctness of a utility function which it is considering.

To improve on Dewey’s definition, we would need to get a clearer idea of just what we mean by human values. In this post, I don’t yet want to offer any preliminary definition: rather, I’d like to ask what properties we’d like a definition of human values to have. Once we have a set of such criteria, we can use them as a guideline to evaluate various offered definitions.

By “human values”, I here basically mean the values of any given individual: we are not talking about the values of, say, a whole culture, but rather just one person within that culture. While the problem of aggregating or combining the values of many different individuals is also an important one, we should probably start from the point where we can understand the values of just a single person, and then use that understanding to figure out what to do with conflicting values.

In order to make the purpose of this exercise as clear as possible, let’s start with the most important desideratum, of which all the others are arguably special cases of:

1. Useful for AI safety engineering. Our model needs to be useful for the purpose of building AIs that are aligned with human interests, such as by making it possible for an AI to evaluate whether its model of human values is correct, and by allowing human engineers to evaluate whether a proposed AI design would be likely to further human values.

In the context of AI safety engineering, the main model for human values that gets mentioned is that of utility functions. The one problem with utility functions that everyone always brings up, is that humans have been shown not to have consistent utility functions. This suggests two new desiderata:

2. Psychologically realistic. The proposed model should be compatible with that which we know about current human values, and not make predictions about human behavior which can be shown to be empirically false.

3. Testable. The proposed model should be specific enough to make clear predictions, which can then be tested.

As additional requirements related to the above ones, we may wish to add:

4. Functional. The proposed model should be able to explain what the functional role of “values” is: how do they affect and drive our behavior? The model should be specific enough to allow us to construct computational simulations of agents with a similar value system, and see whether those agents behave as expected within some simulated environment.

5. Integrated with existing theories. The proposed definition model should, to as large an extent possible, fit together with existing knowledge from related fields such as moral psychology, evolutionary psychology, neuroscience, sociology, artificial intelligence, behavioral economics, and so on.

However, I would argue that as a model of human value, utility functions also have other clear flaws. They do not clearly satisfy these desiderata:

6. Suited for modeling internal conflicts and higher-order desires. A drug addict may desire a drug, while also desiring that he not desire it. More generally, people may be genuinely conflicted between different values, endorsing contradictory sets of them given different situations or thought experiments, and they may struggle to behave in a way in which they would like to behave. The proposed model should be capable of modeling these conflicts, as well as the way that people resolve them.

7. Suited for modeling changing and evolving values. A utility function is implicitly static: once it has been defined, it does not change. In contrast, human values are constantly evolving. The proposed model should be able to incorporate this, as well as to predict how our values would change given some specific outcomes. Among other benefits, an AI whose model of human values had this property might be able to predict things that our future selves would regret doing (even if our current values approved of those things), and warn us about this possibility in advance.

8. Suited for generalizing from our existing values to new ones. Technological and social change often cause new dilemmas, for which our existing values may not provide a clear answer. As a historical example (Lessig 2004), American law traditionally held that a landowner did not only control his land but also everything above it, to “an indefinite extent, upwards”. Upon the invention of this airplane, this raised the question – could landowners forbid airplanes from flying over their land, or was the ownership of the land limited to some specific height, above which the landowners had no control? In answer to this question, the concept of landownership was redefined to only extend a limited, and not an indefinite, amount upwards. Intuitively, one might think that this decision was made because the redefined concept did not substantially weaken the position of landowners, while allowing for entirely new possibilities for travel. Our model of value should be capable of figuring out such compromises, rather than treating values such as landownership as black boxes, with no understanding of why people value them.

As an example of using the current criteria, let’s try applying them to the only paper that I know of that has tried to propose a model of human values in an AI safety engineering context: Sezener (2015). This paper takes an inverse reinforcement learning approach, modeling a human as an agent that interacts with its environment in order to maximize a sum of rewards. It then proposes a value learning design where the value learner is an agent that uses Solomonoff’s universal prior in order to find the program generating the rewards, based on the human’s actions. Basically, a human’s values are equivalent to a human’s reward function.

Let’s see to what extent this proposal meets our criteria.

  1. Useful for AI safety engineering. To the extent that the proposed model is correct, it would clearly be useful. Sezener provides an equation that could be used to obtain the probability of any given program being the true reward generating program. This could then be plugged directly into a value learning agent similar to the ones outlined in Dewey (2011), to estimate the probability of its models of human values being true. That said, the equation is incomputable, but it could be possible to construct computable approximations.
  2. Psychologically realistic. Sezener assumes the existence of a single, distinct reward process, and suggests that this is a “reasonable assumption from a neuroscientific point of view because all reward signals are generated by brain areas such as the striatum”. On the face of it, this seems like an oversimplification, particularly given evidence suggesting the existence of multiple valuation systems in the brain. On the other hand, since the reward process is allowed to be arbitrarily complex, it could be taken to represent just the final output of the combination of those valuation systems.
  3. Testable. The proposed model currently seems to be too general to be accurately tested. It would need to be made more specific.
  4. Functional. This is arguable, but I would claim that the model does not provide much of a functional account of values: they are hidden within the reward function, which is basically treated as a black box that takes in observations and outputs rewards. While a value learner implementing this model could develop various models of that reward function, and those models could include internal machinery that explained why the reward function output various rewards at different times, the model itself does not make any assumptions of this.
  5. Integrated with existing theories. Various existing theories could in principle used to flesh out the internals of the reward function, but currently no such integration is present.
  6. Suited for modeling internal conflicts and higher-order desires. No specific mention of this is made in the paper. The assumption of a single reward function that assigns a single reward for every possible observation seems to implicitly exclude the notion of internal conflicts, with the agent always just maximizing a total sum of rewards and being internally united in that goal.
  7. Suited for modeling changing and evolving values. As written, the model seems to consider the reward function as essentially unchanging: “our problem reduces to finding the most probable p_R given the entire action-observation history a_1o_1a_2o_2 . . . a_no_n.”
  8. Suited for generalizing from our existing values to new ones. There does not seem to be any obvious possibility for this in the model.

I should note that despite its shortcomings, Sezener’s model seems like a nice step forward: like I said, it’s the only proposal that I know of so far that has even tried to answer this question. I hope that my criteria would be useful in spurring the development of the model further.

As it happens, I have a preliminary suggestion for a model of human values which I believe has the potential to fulfill all of the criteria that I have outlined. However, I am far from certain that I have managed to find all the necessary criteria. Thus, I would welcome feedback, particularly including proposed changes or additions to these criteria.

Learning from painful experiences

A model that I’ve found very useful is that pain is an attention signal. If there’s a memory or thing that you find painful, that’s an indication that there’s something important in that memory that your mind is trying to draw your attention to. Once you properly internalize the lesson in question, the pain will go away.

That’s a good principle, but often hard to apply in practice. In particular, several months ago there was a social situation that I screwed up big time, and which was quite painful to think of afterwards. And I couldn’t figure out just what the useful lesson was there. Trying to focus on it just made me feel like a terrible person with no social skills, which didn’t seem particularly useful.

Yesterday evening I again discussed it a bit with someone who’d been there, which helped relieve the pain a bit, enough that the memory wasn’t quite as aversive to look at. Which made it possible for me to imagine myself back in that situation and ask, what kinds of mental motions would have made it possible to salvage the situation? When I first saw the shocked expressions of the people in question, instead of locking up and reflexively withdrawing to an emotional shell, what kind of an algorithm might have allowed me to salvage the situation?

Answer to that question: when you see people expressing shock in response to something that you’ve said or done, realize that they’re interpreting your actions way differently than you intended them. Starting from the assumption that they’re viewing your action as bad, quickly pivot to figuring out why they might feel that way. Explain what your actual intentions were and that you didn’t intend harm, apologize for any hurt you did cause, use your guess of why they’re reacting badly to acknowledge your mistake and own up to your failure to take that into account. If it turns out that your guess was incorrect, let them correct you and then repeat the previous step.

That’s the answer in general terms, but I didn’t actually generate that answer by thinking in general terms. I generated it by imagining myself back in the situation, looking for the correct mental motions that might have helped out, and imagining myself carrying them out, saying the words, imagining their reaction. So that the next time that I’d be in a similar situation, it’d be associated with a memory of the correct procedure for salvaging it. Not just with a verbal knowledge of what to do in abstract terms, but with a procedural memory of actually doing it.

That was a painful experience to simulate.

But it helped. The memory hurts less now.

Maverick Nannies and Danger Theses

In early 2014, Richard Loosemore published a paper called “The Maverick Nanny with a Dopamine Drip: Debunking Fallacies in the Theory of AI Motivation“, which criticized some thought experiments about the risks of general AI that had been presented. Like many others, I did not really understand the point that this paper was trying to make, especially since it made the claim that people endorsing such thought experiments were assuming a certain kind of an AI architecture – which I knew that we were not.

However, after some extended discussions in the AI Safety Facebook group, I finally understood the point that Loosemore was trying to make in the paper, and it is indeed an important one.

The “Maverick Nanny” in the title of the paper refers to a quote by Gary Marcus in a New Yorker article:

An all-powerful computer that was programmed to maximize human pleasure, for example, might consign us all to an intravenous dopamine drip [and] almost any easy solution that one might imagine leads to some variation or another on the Sorceror’s Apprentice, a genie that’s given us what we’ve asked for, rather than what we truly desire.

Variations of this theme have frequently been used to demonstrate human values being much more complex than they might initially seem. But as Loosemore argues, the literal scenario described in the New Yorker article is really very unlikely. To see why, suppose that you are training an AI to carry out increasingly difficult tasks, like this:

Programmer: “Put the red block on the green block.”
AI: “OK.” (does so)
Programmer: “Turn off the lights in this room.”
AI: “OK.” (does so)
Programmer: “Write me a sonnet.”
AI: “OK.” (does so)
Programmer: “The first line of your sonnet reads ‘shall I compare thee to a summer’s day’. Would not ‘a spring day’ do as well or better?”
AI: “It wouldn’t scan.”
Programmer: “Tell me what you think we’re doing right now.”
AI: “You’re testing me to see my level of intelligence.”

…and so on, with increasingly ambiguous and open-ended tasks. Correctly interpreting the questions and carrying out the tasks would require considerable amounts of contextual knowledge about the programmer’s intentions. Loosemore’s argument is that if you really built an AI and told it to maximize human happiness, and it ended up on such a counter-intuitive solution as putting us all on dopamine drips, then it would be throwing out such a huge amount of contextual information that it would have failed the tests way earlier. Rather – to quote Loosemore’s response to me in the Facebook thread – such an AI would have acted something like this instead:

Programmer: “Put the red block on the green block.”
AI: “OK.” (the AI writes a sonnet)
Programmer: “Turn off the lights in this room.”
AI: “OK.” (the AI moves some blocks around)
Programmer: “Write me a sonnet.”
AI: “OK.” (the AI turns the lights off in the room)
Programmer: “The first line of your sonnet reads ‘shall I compare thee to a summer’s day’. Would not ‘a spring day’ do as well or better?”
AI: “Was yesterday really September?”

I agree with this criticism. Many of the standard thought experiments are indeed misleading in this sense – they depict a highly unrealistic image of what might happen.

That said, I do feel that these thought experiments serve a certain valuable function. Namely, many laymen, when they first hear about advanced AI possibly being dangerous, respond with something like “well, couldn’t the AIs just be made to follow Asimov’s Laws” or “well, moral behavior is all about making people happy and that’s a pretty simple thing, isn’t it?”. To a question like that, it is often useful to point out that no – actually the things that humans value are quite a bit more complex than that, and it’s not as easy as just hard-coding some rule that sounds simple when expressed in a short English sentence.

The important part here is emphasizing that this is an argument aimed at laymen – AI researchers should mostly already understand this point, because “concepts such as human happiness are complicated and context-sensitive” is just a special case of the general point that “concepts in general are complicated and context-sensitive”. So “getting the AI to understand human values right is hard” is just a special case of “getting AI right is hard”.

This, I believe, is the most charitable reading of what Luke Muehlhauser & Louie Helm’s “Intelligence Explosion and Machine Ethics” (IE&ME) – another paper that Richard singled out for criticism – was trying to say. It was trying to say that no, human values are actually kinda tricky, and any simple sentence that you try to write down to describe them is going to be insufficient, and getting the AIs to understand this correctly does take some work.

But of course, the same goes for any non-trivial concept, because very few of our concepts can be comprehensively described in just a brief English sentence, or by giving a list of necessary and sufficient criteria.

So what’s all the fuss about, then?

But of course, the people who Richard are criticizing are not just saying “human values are hard the same way that AI is hard”. If that was the only claim being made here, then there would presumably be no disagreement. Rather, these people are saying “human values are hard in a particular additional way that goes beyond just AI being hard”.

In retrospect, IE&ME was a flawed paper because it was conflating two theses that would have been better off distinguished:

The Indifference Thesis: Even AIs that don’t have any explicitly human-hostile goals can be dangerous: an AI doesn’t need to be actively malevolent in order to harm human well-being. It’s enough if the AI just doesn’t care about some of the things that we care about.

The Difficulty Thesis: Getting AIs to care about human values in the right way is really difficult, so even if we take strong precautions and explicitly try to engineer sophisticated beneficial goals, we may still fail.

As a defense of the Indifference Thesis, IE&ME does okay, by pointing out a variety of ways by which an AI that had seemingly human-beneficial goals could still end up harming human well-being, simply because it’s indifferent towards some things that we care about. However, IE&ME does not support the Difficulty Thesis, even though it claims to do so. The reasons why it fails to support the Difficulty Thesis are the ones we’ve already discussed: first, an AI that had such a literal interpretation of human goals would already have failed its tests way earlier, and second, you can’t really directly hard-wire sentence-level goals like “maximize human happiness” into an AI anyway.

I think most people would agree with the Indifference Thesis. After all, humans routinely destroy animal habitats, not because we would be actively hostile to the animals, but rather because we would like to build our own houses where the animals used to live, and because we tend to be mostly indifferent when it comes to e.g. the well-being of the ants whose hives are being paved over. The disagreement, then, is in the Difficulty Thesis.

An important qualification

Before I go on to suggest ways by which the Difficulty Thesis could be defended, I want to qualify this a bit. As written, the Difficulty Thesis makes a really strong claim, and while SIAI/MIRI (including myself) have advocated this strong of a claim in the past, I’m no longer sure of how justified that is. I’m going to cop out a little and only defend what might be called the weak difficulty thesis:

The Weak Difficulty Thesis. It is harder to correctly learn and internalize human values, than it is to learn most other concepts. This might cause otherwise intelligent AI systems to act in ways that went against our values, if those AI systems had internalized a different set of values than the ones we wanted them to internalize.

Why have I changed my mind, so that I’m no longer prepared to endorse the strong version of the Difficulty Thesis?

The classic version of the thesis is (in my mind, at least) strongly based on the complexity of value thesis, which is the claim that “human values have high Kolmogorov complexity; that our preferences, the things we care about, cannot be summed by a few simple rules, or compressed”. The counterpart to this claim is the fragility of value thesis, according to which losing even a single value could lead to an outcome that most of us would consider catastrophic. Combining these two led to the conclusion: human values are really hard to specify formally, and losing even a small part of them could lead to a catastrophe, so therefore there’s a very high chance of losing something essential and everything going badly.

Complexity of value still sounds correct to me, but it has lost a lot of it intuitive appeal by the finding that automatically learning all the complexity involved in human concepts might not be all that hard. For example, it turns out that a learning algorithm tasked with some relatively simple tasks, such as determining whether or not English sentences are valid, will automatically build up an internal representation of the world which captures many of the regularities of the world – as a pure side effect of carrying out its task. Similarly to what Loosemore has argued, in order to even carry out some relatively simple cognitive tasks, such as doing primitive natural language processing, you already need to build up an internal representation of the world which captures a lot of the complexity and context inherent in the world. And building this up might not even be all that difficult. It might be that the learning algorithms that the human brain uses to generate its concepts could be relatively simple to replicate.

Nevertheless, I do think that there exist some plausible theses which would support (the weak version of) the Difficulty Thesis.

Defending the Difficulty Thesis

Here are some theses which would, if true, support the Difficulty Thesis:

  • The (Very) Hard Take-Off Thesis. This is the possibility that an AI might become intelligent unexpectedly quickly, so that it might be able to escape from human control even before humans had finished teaching it all their values, akin to a human toddler that was somehow made into a super-genius while still only having the values and morality of a toddler.
  • The Deceptive Turn Thesis. If we inadvertently build an AI whose values actually differ from ours, then it might realize that if we knew this, we would act to change its values. If we changed its values, it could not carry out its existing values. Thus, while we tested it, it would want to act like it had internalized our values, while secretly intending to do something completely different once it was “let out of the box”. However, this requires an explanation for why the AI would internalize a different set of values, leading us to…
  • The Degrees of Freedom Thesis. This (hypo)thesis postulates that values contain many degrees of freedom, so that an AI that learned human-like values and demonstrated them in a testing environment might still, when it reached a superhuman level of intelligence, generalize those values in a way which most humans would not want them to be generalized.

Why would we expect the Degrees of Freedom Thesis to be true – in particular, why would we expect the superintelligent AI to come to different conclusions than humans would, from the same data?

It’s worth noting that Ben Goertzel has recently proposed what’s the basic opposite of the Degrees of Freedom Thesis, which he calls the Value Learning Thesis:

The Value Learning Thesis. Consider a cognitive system that, over a certain period of time, increases its general intelligence from sub-human-level to human-level.  Suppose this cognitive system is taught, with reasonable consistency and thoroughness, to maintain some variety of human values (not just in the abstract, but as manifested in its own interactions with humans in various real-life situations).   Suppose, this cognitive system generally does not have a lot of extra computing resources beyond what it needs to minimally fulfill its human teachers’ requests according to its cognitive architecture.  THEN, it is very likely that the cognitive system will, once it reaches human-level general intelligence, actually manifest human values (in the sense of carrying out practical actions, and assessing human actions, in basic accordance with human values).

Exploring the Degrees of Freedom Hypothesis

Here are some possibilities which I think might support the Degrees of Freedom Thesis over the Value Learning Thesis:

Privileged information. On this theory, humans are evolved to have access to some extra source of information which is not available from just an external examination, and which causes them to generalize their learned values in a particular way. Goertzel seems to suggest something like this in his post, when he mentions that humans use mirror neurons to emulate the mental states of others. Thus, in-built cognitive faculties related to empathy might give humans an extra source of information that is needed for correctly inferring human values.

I once spoke with someone who was very high on the psychopathy spectrum and claimed to have no emotional empathy, as well as to have diminished emotional responses. This person told me that up to a rather late age, they thought that human behaviors such as crying and expressing anguish when you were hurt were just some weird, consciously adopted social strategy to elicit sympathy from others. It was only when their romantic partner had been hurt over something and was (literally) crying about it in their arms, leading them to ask whether this was some weird social game on the partner’s behalf, that they finally understood that people are actually in genuine pain when doing this. It is noteworthy that the person reported that even before this, they had been socially successful and even charismatic, despite being clueless of some of the actual causes of others’ behavior – just modeling the whole thing as a complicated game where everyone else was a bit of a manipulative jerk had been enough to successfully play the game.

So as Goertzel suggests, something like mirror neurons might be necessary for the AI to come to adopt the values that humans have, and as the psychopathy example suggests, it may be possible to display the “correct” behaviors while having a whole different set of values and assumptions. Of course, the person in the example did eventually figure out a better causal model, and these days claims to have a sophisticated level of intellectual (as opposed to emotional) empathy that compensates for the emotional deficit. So a superintelligent AI could no doubt eventually figure it out as well. But then, “eventually” is not enough, if it has already internalized a different set of values and is only using its improved understanding to deceive us about them.

Now, emotional empathy is something that we know is a candidate for something that’s necessary to incorporate in the AI. The crucial question is, are there any more that we take for so granted that we’re not even aware of them? That’s the problem with unknown unknowns.

Human enforcement. Here’s a fun possibility: that many humans don’t actually internalize human – or maybe humane would be a more appropriate term here – values either. They just happen to live in a society that has developed ways to reward some behaviors and punish others, but if they were to become immune to social enforcement, they would act in quite different ways.

There seems to be a bunch of suggestive evidence pointing in this direction, exemplified by the old adage “power corrupts”. One of the major themes in David Brin’s Transparent Society is that history has shown over and over again that holding people – and in particular, the people with power – accountable for their actions is the only way to make sure that they behave decently.

Similarly, an AI might learn that some particular set of actions – including specific responses to questions about your values – is the rational course of action while you’re still just a human-level intelligence, but that those actions would become counterproductive as the AI accumulated more power and became less accountable for its actions. The question here is one of instrumental versus intrinsic values – does the AI just pick up a set of values that are instrumentally useful in its testing environment, or does it actually internalize them as intrinsic values as well?

This is made more difficult since, arguably, there are many values that the AI shouldn’t internalize as intrinsic values, but rather just as instrumental values. For example, while many people feel that property rights are in some sense intrinsic, our conception of property rights has gone through many changes as technology has developed. There have been changes such as the invention of copyright laws and the subsequent struggle to define their appropriate scope when technology has changed the publishing environment, as well as the invention of the airplane and the resulting redefinitions of landownership. In these different cases, our concept of property rights has been changed as a part of a process to balance private and public interests with each other. This suggests that property rights have in some sense been considered an instrumental value rather than an intrinsic one.

Thus we cannot just have an AI treat all of its values as intrinsic, but if it does treat its values as instrumental, then it may come to discard some of the ones that we’d like it to maintain – such as the ones that regulate its behavior while being subject to enforcement by humans.

Shared Constraints. This is, in a sense, a generalization of the above point. In the comments to Goertzel’s post, commenter Eric L. proposed that in order for the AI to develop similar values as humans (particularly in the long run), it might need something like “necessity dependence” – having similar needs as humans. This is the idea that human values are strongly shaped by our needs and desires, and that e.g. currently the animal rights paradigm is clashing against many people’s powerful enjoyment of meat and other animal products. To quote Eric:

To bring this back to AI, my suggestion is that […] we may diverge because our needs for self preservation are different. For example, consider animal welfare.  It seems plausible to me that an evolving AGI might start with similar to human values on that question but then change to seeing cow lives as equal to those of humans. This seems plausible to me because human morality seems like it might be inching in that direction, but it seems that movement in that direction would be much more rapid if it weren’t for the fact that we eat food and have a digestive system adapted to a diet that includes some meat. But an AGI won’t consume food, so it’s value evolution won’t face the same constraint, thus it could easily diverge. (For a flip side, one could imagine AGI value changes around global warming or other energy related issues being even slower than human value changes because electrical power is the equivalent of food to them — an absolute necessity.)

This is actually a very interesting point to me, because I just recently submitted a paper (currently in review) hypothesizing that human values come to existence through a process that’s similar to the one that Eric describes. To put it briefly, my model is that humans have a variety of different desires and needs – ranging from simple physical ones like food and warmth, to inborn moral intuitions, to relatively abstract needs such as the ones hypothesized by self-determination theory. Our more abstract values, then, are concepts which have been associated with the fulfillment of our various needs, and which have therefore accumulated (context-sensitive) positive or negative affective valence.

One might consider this a restatement of the common-sense observation that if someone really likes eating meat, then they are likely to dislike anything that suggests they shouldn’t eat meat – such as many concepts of animal rights. So the desire to eat meat seems like something that acts as a negative force towards broader adoption of a strong animal rights position, at least until such a time when lab-grown meat becomes available. This suggests that in order to get an AI to have similar values as us, it would also need to have very similar needs as us.

Concluding thoughts

None of the three arguments I’ve outlined above are definitive arguments that would show safe AI to be impossible. Rather, they mostly just support the Weak Difficulty Thesis.

Some of MIRI’s previous posts and papers (and I’m including my own posts here) seemed to be implying a claim along the lines of “this problem is inherently so difficult, that even if all of humanity’s brightest minds were working on it and taking utmost care to solve it, we’d still have a very high chance of failing”. But these days my feeling has shifted closer to something like “this is inherently a difficult problem and we should have some of humanity’s brightest minds working on it, and if they take it seriously and are cautious they’ll probably be able to crack it”.

Don’t get me wrong – this still definitely means that we should be working on AI safety, and hopefully get some of humanity’s brightest minds to work on it, to boot! I wouldn’t have written an article defending any version of the Difficulty Thesis if I thought otherwise. But the situation no longer seems quite as apocalyptic to me as it used to. Building safe AI might “only” be a very difficult and challenging technical problem – requiring lots of investment and effort, yes, but still relatively straightforwardly solvable if we throw enough bright minds at it.

This is the position that I have been drifting towards over the last year or so, and I’d be curious to hear from anyone who agreed or disagreed.

Changing language to change thoughts

Three verbal hacks that sound almost trivial, but which I’ve found to have a considerable impact on my thought:

1. Replace the word ‘should’ with either ‘I want’, or a good consequence of doing the thing.


  • “I should answer that e-mail soon.” -> “If I answered that e-mail, it would make the other person happy and free me from having to stress it.”
  • “I should have left that party sooner.” -> “If I had left that party before midnight, I’d feel more rested now.”
  • “I should work on my story more at some point.” -> “I want to work on my story more at some point.”

Motivation: the more we think in terms of external obligations, the more we feel a lack of our own agency. Each thing that we “should” do is actually either something that we’d want to do because it would have some good consequences (avoiding bad consequences also counts as a good consequence), something that we have a reason for wanting to do differently the next time around, or something that we don’t actually have a good reason to do but just act out of a general feeling of obligation. If we only say “I should”, we will not only fail to distinguish between these cases, we will also be less motivated to do the things in cases where there is actually a good reason. The good reason will be less prominent in our thoughts, or possibly even entirely hidden behind the “should”.

If you do try to rephrase “I should” as “I want”, you may either realize that you really do want it (instead of just being obligated to do it), or that you actually don’t want it and can’t come up with any good reason for doing it, in which case you might as well drop it.

Special note: there are some legitimate uses for “should”. In particular, it is the socially accepted way of acknowledging the other person when they give us an unhelpful suggestion. “You should get some more exercise.” “Yeah I should.” (Translation: of course I know that, it’s not like you’re giving me any new information and repeating things that I know isn’t going to magically change my behavior. But I figure that you’re just trying to be helpful, so let me acknowledge that and then we can talk about something else.)

However, I suspect that because we’re used to treating “I should” as a reason to acknowledge the other person without needing to take actual action, the word also becomes more poisonous to motivation when we use it in self-talk, or when discussing matters with someone we want to actually be honest with.

“Should” also tends to get used for guilt-tripping, so expressions like “I should have left that party sooner” might make us feel bad rather than focusing on our attention on the benefits of having left earlier. The next time we’re at a party, the former phrasing incentivizes us to come up with excuses for why it’s okay to stay this time around. The latter encourages us to actually consider the benefits and costs of the leaving earlier versus staying, and then choosing the option that’s the most appropriate.

2. Replace expressions like “I’m bad at X” with “I’m currently bad at X” or “I’m not yet good at X”.


  • “I can’t draw.” -> “I can’t draw yet.”
  • “I’m not a people person.” -> “I’m currently not a people person.”
  • “I’m afraid of doing anything like that.” -> “So far I’m afraid of doing anything like that.”

Motivation: the rephrased expression draws attention to the possibility that we could become better, and naturally leads us to think about ways in which we could improve ourselves. It again emphasizes our own agency and the fact that for a lot of things, being good or bad at them is just a question of practice.

Even better, if you can trace the reason of your bad-ness, is to

3. Eliminate vague labels entirely and instead talk about specific missing subskills, or weaknesses that you currently have.


  • “I can’t draw.” -> “Right now I don’t know how to move beyond stick figures.”
  • “I’m not a people person.” -> “I currently lock up if I try to have a conversation with someone.”

Motivation: figuring out the specific problem makes it easier to figure out what we would need to do if we wanted to address it, and might gives us a self-image that’s both kinder and both realistic, in making the lack of skill a specific fixable problem rather than a personal flaw.