The Uncanny Valley

Game Developer magazine

December 2004 Page 44

[Why people react more negatively the more "human" something appears]

UNCANNY VALLEY

by Steve Theodore

STEVE THEODORE started animating on a text-only mainframe renderer and then moved on to work on games such as HALF-LIFE and Counter-Strike. He can be reached at stheodore@gdmag.com.

Our business is built on a continual flood of incremental improvements, so it's easy for us to assume that each passing year makes our job a little easier and our product a little better. Certainly anybody who ever sweated over a 4-bit palette or tried to animate an FK-only walk cycle in WaveFront is sure to be a believer in progress. In this column, we're going to explore the infuriating ways our technological miracles can actually make it harder to create characters that audiences care about.

Film animators have long known that improvements in technique don't always translate into improvements in emotional appeal. In the history of rotoscoped animation, films such as Max Fleischer's Gulliver's Travels (1938) and Ralph Bakshi's much-reviled The Lord of the Rings (1978), are classic cases of supposedly superior representational technologies that failed the emotional-bonding test. In the fine arts, hyper-realistic sculptures have always produced a distinctly cool reaction from audiences — whether a waxwork in Madame Tussauds museums or silicon castings by Duane Hanson. It's no accident that the most recent big-time show featuring Hanson's work was an international exhibition called "The Uncanny."

Artists aren't the only people who've found that perfecting one's presentation of people is harder than it seems. Japanese roboticist Dr. Masahiro Mori began researching the way people react to humanoid robots in the late 1970s. He started with the common-sense assumption that each progressive improvement on the appearance of a robot would elicit a warmer emotional response—after all, people find it easier to relate to C3PO than to a six-ton, one-armed welding machine. What Mori found, though, was that the relationship is more complicated. As machines approach a more convincing human appearance, they stop being pleasantly anthropomorphic and quickly become, well, creepy. Imperfect mimicry produces something closer to animated corpses than approachable people — they aren't simply less attractive than real people, they're actively repulsive. Mori represented his conclusions in a set of graphs (see Figure 1) that have become iconic among sci-fi writers, artists, and culture pundits. He coined the phrase "the Uncanny Valley" to describe the portion of the graph in which increasing visual fidelity actually detracts from the emotional appeal of a creation.

valley graph

FIGURE 1. Mori's work has been charted into three basic graphs showing the level of uncanniness in relation to movement, appearance, and overall impression. Movement: 1. Industrial robot; 2. android; 3. moving corpse/Uncanny Valley; 4. prosthetic hand 5. disabled person; 6. Bunraku puppet; 7. unhealthy person; 8. healthy person. Appearance: 1. stuffed toy; 2. Noh mask of thin man; 3. corpse/Uncanny Valley; 4. decorative robot; 5. doll. Overall: 1. toy robot; 2. Uncanny Valley; 3. Bunraku puppet.

[My note on Figure 1; on the left hand side of the graph is the vertical scale which represents warmth of reaction and sense of empathy. Along the bottom of the graph is a scale which indicates degree of "similarity to human-ness". You would expect that as you move to the right on the scale of human appearance, the graphed line would show a smooth curve upwards in positive reaction. Instead, the curve moves smoothly upward at first, then suddenly takes a much faster dive to a *negative* reaction, one of disgust and repulsion, then a sudden jump back up to an even higher positive emotional score. This dive into negative numbers is what is referred to as the "valley", or the "Uncanny Valley". This writer compares the reaction of disgust and revoltion to reactions to corpses....more human-like than any doll, robot, or computer game character, but which inspires extreme negative emotions instead of positive ones. Some people in the autistic community have questioned whether or not the "Uncanny Valley" might explain the negative reactions that autistics inspire in non-autistics....TOO human to be accepted as harmlessly "different", too different to be accepted as fully "normal".]

[It is my argument that by trying harder and harder to simulate non-autistic ("neurotypical") behaviors autistics are actually doing themselves harm...creating situations where the reactions of non-autistics to them actually becomes *worse* instead of better.]

The odd shapes of Mori's graphs are rooted in the biology of social perception. Evolutionary psychologists point out that our senses are acutely attuned to reading other people. Everyone knows how easy it can be to read the mood of a stranger from just a fleeting glimpse of her face or posture. Researchers believe that a side effect of this sensitivity is an extremely critical reaction to flaws in artistic representations of people. This isn't simply a matter of taste. On an unconscious level, a failed representation of humanity alarms us in the same way a physical or mental illness does. We react more harshly to subtle failures in realism than to obviously cartoonish images because we have begun to think of them as broken people, rather than broken pictures.

At the same time, our in-built sensitivity to the human form encourages us to project human characteristics onto distinctly non-human forms. Thus, on the lower slopes of Mori's curves, even very crude suggestions of humanity are sufficient to generate empathy. This is why very simple cartoons of the Mickey Mouse variety were popular all over the world, whereas more sophisticated graphic novel styles tend to appeal to a more localized taste. As Scott McCloud points out in Understanding Comics [Perennial Currents, 1994), it's easier to read the emotions of a stick figure than of a mediocre portrait, even though the portrait is far closer to looking real than a circle, two dots, and a line. When we know what we see is not human, we are happy to project human characteristics onto it. But when we think it may really be human, we become much more critical.

In the last decade or so, the concept of the Uncanny Valley has seeped into mainstream culture. Probably the most famous exponent of the idea is film critic Roger Ebert, who cited it in his review of Final Fantasy: The Spirits Within — a movie that strikes many as the best imaginable example of the theory. [Ebert also included an Uncanny Valley reference in his review of the Wayans Brothers' White Chicks, a strong second runner-up in the Uncanny sweepstakes.] Nowadays, it's not uncommon to hear the phrase mentioned in esoteric Siggraph panel debates or GDC talks. But what does all this pontification mean for the future of game characters?

LESS IS MORI

Mori concluded his initial discussion of the Uncanny Valley by suggesting that the best tool for creating empathetic, likeable robots was simplification. Rather than pushing directly toward photo-realistic creations, he advocated deliberately crude designs that stuck to the lower slopes of the realism curve. His theory has been widely influential. Robot designers from Doug Chiang to the makers of Honda's Asimo have very deliberately chosen low-fidelity, cute designs in order to help people bond with their creations. Perhaps more tellingly, most medical prosthetics today are still explicitly mechanical. Many users of artificial hands, for example, find it easier to offer an obviously mechanical gripper than a realistic-looking hand which is stiff and corpse-cold.

The obvious implication for us is that when it comes to realism, less may be more. There are many applications in which striving for realism is going to create many problems and little gain. Certainly, movies such as Super Mario Brothers and The Flintstones proved that many characters really shouldn't be real people in a foam suit or a vertex-deformed, pixel-shaded, motion-captured digital equivalent of one. Games that are amenable to non-realistic art styles may wish to stay very deliberately on the lower slopes of the Mori curve. Even games that are superficially realistic may benefit from a deliberate injection of exaggeration. The animations of CITY OF HEROES, for example, perfectly complement the subject matter. Replacing them with frame-perfect mocap [motion-capture] would be a terrible step backward. So by lowering the overall level of fidelity with hammy animation or deliberately stylized renderings you can help pull yourself back from the lip of the Valley.

DEVIL TAKE THE HINDMOST

Of course, some games have to focus on realistic humans. The market isn't clamoring for SPLINTER CELL SUPER-DEFORMED just yet. In these games the Uncanny Valley theory provides a scientific twist to the oldest tradition of commercial art: You will be judged by your screw-ups. As game graphics get more sophisticated, the best effects begin to fade into invisibility while the worst become ever more glaring. That skin tone that you lovingly recreated through hours of labor won't garner a second glance because it is just right; but an oddly proportioned nose will bring out the art critic in every programmer and game designer in your office. The Uncanny Valley effect tends to sharpen the contrast between well and poorly executed areas even more strongly because your best elements may pull you into dangerous territory where the flaws will become more damning. When your skin shaders are flawless, people will suddenly start to notice that your teeth aren't shadowing inside your mouth, or that you belong to the Hair Club for NPCs.

The only defense against the sharp eyes of the audience is an honest, critical appraisal of what you and your technology are good at. "Bullet point" technologies may look great on the back of your box, but if you have to choose between upgrading your 3-year-old animation system and adding a really slick skin shader, you may very well find that time spent on the trailing edge will be better rewarded. We've all seen characters with impressive poly counts and sophisticated shaders who run headfirst into walls, or beautifully mocapped animations that are strung together with only the crudest transitions. No amount of shader programming will lift you up if your AI or animation code is dragging you down into the depths of the Uncanny Valley. Artists, of course, aren't the only part of the team responsible for the way characters work in the game, but because artists are trained to be sensitive to nuances of appearance and action that others miss, it's often going to be their job to advocate on behalf of the characters to make sure that compromises are balanced intelligently.

NON-CONSECUTIVE TERMS

The key to navigating the Uncanny Valley can be seen in the graphs. Mori argued that movement counted far more in people's perception of humanity than physical appearance. This is more than just special pleading from a builder of servos — it's a direct result of the psychological dynamics we outlined above. A 100-percent visually realistic human can still be visually unnerving — what, after all, is more disturbing than a corpse? Nobody objects to a dancing mushroom (think Fantasia) with very human motions. However, the beautifully sculpted animatronics in the Hall of Presidents at Disney World has been giving people the willies for a generation.

There is basically no point in building characters that are indistinguishable from real people in screenshots but who act like robots or sleepwalkers when they move. Our animation systems need to be more sophisticated, both technically and artistically. We've basically solved real-time playback for skeletal animations, but that's only the very beginning of what's needed to "sell" a character to an audience. We need to upgrade our animation art to the same high level we've achieved in rendering, whether that means sending all our animators to Ed Hooks's "Acting for Animators" class or hiring real actors and directors for our mocap sessions. We also need technical fixes, like realistic skin deformation, facial animation, and muscle systems. Most of all we need more sophisticated behavior-modeling - our characters have to live in their worlds, not in their bounding boxes. This can be as simple as programmatically making "eye contact" with the player during conversation, or as complicated as being able to correctly pick an object off a table without moving robotically to the same spot every time. In any case, nothing is going to establish the emotional presence of our characters more than their acting skills, and right now the tools for turning animated models into actors are just a dream.

A VIEW FROM ABOVE

Contemporary graphics technology has brought us to the pass that looks down into Uncanny Valley. We're only just getting good enough that being terrible is a real possibility. Some day soon we'll be able to make characters that are John Malkovitch scary, instead of just big-fangs-and-red-eyes scary. Of course, being John Malkovitch isn't something you want to do by accident, so we're all going to have to go back to the well for a more subtle and sophisticated understanding of what makes a character seem real. As each successive barrier to realism falls — as lighting, skin shaders, realistic deformation, and facial animation improve — we'll be dealing with ever subtler and ever more subliminal problems and facing ever more thankless criticism for our efforts. Games like HALF-LIFE 2 have begun to turn gamers' attention to the nuances of character and physical presence, but this work is only in its infancy. It's going to be an exciting and infuriating time.

December 28, 2004