Be careful what you ask for: Goodhart’s law affects computers too.


There’s an old joke famously played by Marty Feldman in the Mel Brooks film, Young Frankenstein, when Feldman leads Gene Wilder from a train station with the instruction to ‘walk this way,’ and then insists that Wilder walk bent with a cane as Feldman’s character Igor does himself. “This way,” he says, demonstrating.

Its a good example of how fuzzy language can be interpreted to comic effect. But it also points to a problem in talking to artificial agents, which take instructions more literally than they are sometimes intended.

A common method in machine learning is to specify a “cost function” that measures how good an action is and use this to train an agent towards desired behavior.

For example, you might specify a cost function for a “simple” task like walking with the following:

  • Moving legs

  • Forward motion

  • Symmetrical legs

But the above criteria are not enough since a robot rolling on its side while moving its legs meets all three but is certainly not “walking”.

In fact, mathematically expressing vague human ideas is actually extremely hard. Cost functions often have bugs or incentivize subtly wrong actions.

The recently published paper, "The Surprising Creativity of Digital Evolution: A Collection of Anecdotes from the Evolutionary Computation and Artificial Life Research Communities" by Joel Lehman et al at Michigan State University, collects some fascinating examples of evolutionary algorithms acting in unexpected ways.

Computers (for now) always do exactly what they are told. Rather than any sort of “defiant” behavior, evolutionary algorithms produce interesting results simply due to mis-specified instructions.

In 1975, famed economist Charles Goodhart coined Goodhart’s Law which states that “when a measure becomes a target, it ceases to be a good measure.”

This is similar to the idea of unintended consequences that result from people following a law to the letter while completely violating its spirit.

One anecdote from British India illustrates the point: Delhi suffered from venomous cobras, so the government put out a bounty for dead snake skins to try to eliminate the problem.

But citizens soon realized they could breed the snakes for massive profit instead of actually hunting the cobras.

The government ended the program after they discovering the ruse, but this then induced the breeders to release their (now useless) snakes into the wild. Thus, the policy ended up worsening the cobra problem.

Usually, human norms and culture will prevent such debacles where the letter of the law (not its spirit) matters most.

If a CEO casually requests her sales team to increase revenue by a million dollars next quarter, this does not include breaking into a bank and stealing the money, even though the CEO did not explicitly forbid such actions.

But coding human norms into computer algorithms is quite difficult and rarely done. Thus, the computer will follow “the letter of the law” to the extreme, sometimes with disastrous consequences.

Is this creativity as the paper claims? Judge for yourself. We certainly call lawyers creative at times.

Evolutionary Algorithms vs Gradient Descent

The paper’s title suggests that evolutionary algorithms are the root cause of the behavior when the actual reason is a mis-specified cost function.

Evolutionary algorithms do not require the cost functions to be differentiable and thus can be more generally applied than stochastic gradient descent. 

However, optimizing the cost function with techniques like stochastic gradient descent (the most common optimizer for deep neural networks) will result in similar issues.

OpenAI trained an agent to play the racing game CoastRunners using reinforcement learning.

Instead of progressing through the racetrack, the agent hacked a higher score by looping in a short circle and collecting optional powerups.

This was a classic example of following the law to the letter (maximizing score) while totally violating the spirit (completing the racetrack).

The paper includes an array of such anecdotes.

Researchers used an evolutionary algorithm to sort numbers, but the evaluation code merely checked that the algorithm returned any sorted list instead of ensuring that it sorted the numbers that were given.

The agent returned an empty list for every input and got full marks since an empty list is in sorted order.

In another case, computer scientists and physicists collaborated to find better carbon nanostructures.

An evolutionary algorithm suggested a molecule where all the carbon atoms were stacked in the exact same position in space, something not explicitly ruled out in the simulation but impossible in the real world.

Physicists blamed the computer scientists for producing an impossible configuration, while the computer scientists blamed the physicists for a faulty simulation model. The cross-discipline collaboration collapsed shortly afterward.

Though these stories seem frivolous in hindsight, it is actually extremely hard to specify a good reward function that isn’t vulnerable to Goodhart’s law.

Implications for AI Safety

Stories above about “toy” simulations gone wrong suddenly become causes for extreme concern if agents deviate significantly from expected behavior in the real world.

In one airplane landing simulation, an evolutionary algorithm applied extremely large forces to the airplane and triggered a bug, causing the simulation to overflow and claim impossibly good results for dangerous behavior.

Containment thus seems pretty important. Never release a black box optimizer directly into the wild without having high confidence in its expected behavior.