The Tyranny of Metrics
by Jerry Z. Muller.
Princeton University Press, 2018.
Hardcover, 220 pages, $25.

Gene Callahan
One way of defining “rationalism” (when the term is understood as a flaw rather than a virtue) is that it is the attempt to replace experience by technique. In his important new book, historian Jerry Muller takes on a particular species of rationalism: our modern fixation on replacing expert judgment with might be described as “the dictatorship of the quantifiable.”

Muller dubs his target “metric fixation.” In his introductory material, in his case studies, and in his conclusion, he highlights the many pitfalls of this fixation: once we restrict our focus to certain aspects of a situation that we can measure, we lose sight of important things that we can’t measure; we tempt people judged by some metric to set aside doing their best job and instead focus on winning at the metric game; and we encourage cheating to meet metric standards often seen as arbitrary and imposed from on high.

An amusing example from my own profession, software engineering, nicely illustrates the second point: for a little while, there was a management fad to measure the “performance” of software engineers by counting how many lines of code they wrote per day. This was a bizarre idea, in that generally speaking, the best solution to a problem involves writing the fewest lines of code, not the most. But it also led to silliness like this: if I want to set a variable x to the value five, I would naturally just write “x = 5”. But once keeping my job depends on writing lots of lines of code, I can instead write:

x = 1

x = x + 1

x = x + 1

x = x + 1

x = x + 1

Now I am “five times as productive” as if I had just written “x = 5”!

But if metrics, while useful, fall so short of their promise, why did they become so popular and ubiquitous? Muller explains this by noting that the rise of “neophyte elites” made the new holders of elite status anxious to justify their positions: unlikely to “feel secure in their judgments,” they are “more likely to seek seemingly objective criteria by which to make decisions.” Meanwhile, “when institutional establishments came under populist attack, they too resorted to metrics as a means of defense to demonstrate their effectiveness.” Muller further contends that the “apotheosis of choice” added to the trend, as advocacy groups challenged the hegemony of experts: “The road to empowerment was paved with metrics.” And the spread of computers made tools like spreadsheets seem like an easy way to sum up an entire business without really understanding it.

The reliance on metrics in management has seemed to offer newcomers who are adept with numbers a quick way to beat out competition with years of experience, appealing to the egalitarian spirit of our age. As education in management gradually shifted from on-the-job training to education by management professors who may have never managed anything more complicated than their gradebook, abstract methods of managing—doing it “by the numbers”—were bound to gain in popularity. An early form of this focus on measurements was Taylorism:

Taylorism was based on trying to replace the implicit knowledge of the workmen with mass production methods developed, planned, monitored, and controlled by managers. “Under scientific management,” [Taylor] wrote, “the managers assume … the burden of gathering together all the traditional knowledge which in the past has been possessed by the workmen and then of classifying, tabulating, and reducing this knowledge to rules, laws, formulae … Thus all of the planning which under the old system was done by the workmen, must of necessity under the new system be done by management in accordance with the laws of science.”

Muller goes on to describe the tremendous transformation that took place in the vision of the role of management, and the understanding of whom should become a manager, that occurred as formal education replaced apprenticeship and “management” came to be seen as mastery of a set of abstractions:

The core of managerial expertise was now defined as a distinct set of skills and techniques, focused upon a mastery of quantitative methodologies. Decisions based on numbers were viewed as scientific, since numbers were thought to imply objectivity and accuracy …

Before that, “expertise” meant the career long accumulation of knowledge of a specific field, as one progressed from rung to rung within the same institution or business.… Auto executives were “car guys”—men who had spent much of their professional life in the automotive industry. They were increasingly replaced by [Robert] McNamara-like “bean counters,” adept at calculating costs and profit margins.

[This trend] morphed into the gospel of managerialism. The role of judgment grounded in experience and a deep knowledge of context was downplayed.

Another reason for the popularity of metric fixation in business is the rise of principal-agent theory, which claimed that management could not be trusted to act in the owners’ behalf, unless their performance was tightly monitored with … metrics.

As noted at the beginning of this review, Muller’s critique of metric fixation is part of a larger case against what thinkers such as Michael Oakeshott and F. A. Hayek called “rationalism.” And Muller recognizes it as such, as illustrated, for instance, by his description of one of the “key components” of metric fixation: “the belief that it is possible and desirable to replace judgment, acquired by personal experience and talent, with numerical indicators of comparative performance based upon standardized data (metrics).” And he knows his forebears, specifically mentioning Oakeshott, Hayek, Michael Polanyi, and James C. Scott. He cites Oakeshott as criticizing rationalists for their belief that “the conduct of human affairs is a matter of applying the right formulas or recipes.” He extends Hayek’s argument against rationalistic central planning to apply also to “rationalist” managers:

Just as Soviet bloc planners set output targets for each factory to produce, so do bureaucrats set measurable performance targets for schools, hospitals, police forces, and corporations. And just as Soviet managers responded by producing shoddy goods that met the numerical targets set by their overlords, so do schools, police forces, and business find ways of of fulfilling quotas with shoddy goods of their own by graduating pupils with minimal skills, or downgrading grand theft to misdemeanor-level petty larceny, or opening dummy accounts for bank clients.

Critics of Oakeshott, Hayek, and similar thinkers often mistake their critique of rationalism as a disguised defense of the status quo in politics. But Oakeshott himself extended the critique well beyond politics (e.g., in his essay “Rational Conduct”), and others have gone further than him: Michael Polanyi and Paul Feyerabend applied it to science itself, Ludwig Wittgenstein to philosophy, Jane Jacobs to urban planning, James C. Scott to forestry and agriculture, Nassim Taleb to finance and to religion, myself to software engineering, and here, Muller extends it to management and metric fixation in general. “Rationalism” is not mainly a political phenomenon, nor is it exclusively right-wing or left-wing: rather, it is an obsession with replacing experience and judgment with formal techniques, based on a mistaken understanding of what is (falsely) supposed to itself be a single technique called “the scientific method.” (The work of Feyerabend and Polanyi ought to have destroyed the notion of a single way of proceeding called “scientific,” but for the ideological convenience of that construct.) And attacking rationalism is not a defense of whatever circumstances happen to be in place at present: rather, these attacks hold that genuine correction of current ills must be based on experience, not abstract theories.

But back to our book: After discussing the history and theory of metric fixation, Muller turns to its contemporary practice: in his “case studies,” he addresses higher education, schools, medicine, policing, the military, business and finance, and philanthropy and foreign aid. In each domain he cites numerous examples of the deleterious effects of metric fixation, of which we will visit some highlights.

Perhaps the most pernicious metrics used in higher education are those based on the idea that “everyone should go to college.” The original idea of the university was of a place that could educate a small intellectual elite in the disciplines of theology, philosophy, medicine, and law. Gradually that idea broadened to include mathematics and the natural sciences, but university education was still seen as a specialized sort of activity, and certainly not something that was needed to train a bank clerk or a shop manager. But a combination of democratic sentiments and an obsession with formal education, at the expense of apprenticeship, led to the idea that the more formal education, and the better “educated” our workforce, the better off we are. As a result, young people who might make excellent auto mechanics or plumbers or hairdressers are instead shoved off to college by their “guidance” counselors, whose performance is judged by measuring the percentage of graduating seniors who have been shoved off to college. Many of these students have little interest in analyzing Hamlet or proving the Fundamental Theorem of Arithmetic, but they are marched through four years of college with passing grades, only to discover that their degrees have qualified them for corporate jobs earning them only a quarter or a third of what a skilled carpenter makes.

Such efforts “combine hard measures of statistical validity with weak interest in the validity of the units of measurement.”

Another form of metric fixation that appears in universities is the desire to have some sort of “measurement” for all aspects of student learning. Courses should set out a variety of goals, and have a numeric score for how far the course went in meeting the goal. These “measurements” are then compiled, averaged, their variance measured, and so on. The joke here is that the numbers used as input are not measurements at all: professors are simply asked to pick a number between one and five, or one and ten, to indicate how close the class came to achieving the goal. Rather than being an actual measurement, of the sort performed with scales, rulers, thermometers, and so on, the “score” is simply whatever number the professor being surveyed wishes to pick! It as though quantum physics were done by asking a bunch of physicists “How highly would you rate the attraction of leptons?” and then running calculations based on the results. In fact, what usually happens is that everyone asked to “score” these categories on a scale of 1–5 puts down 3 or 4 for almost everything: 1 or 2 would mean you are doing a bad job, and 5 would look like you are exaggerating! Professors simply write down the numbers that they think the administrators doing the “measuring” want to see. The administrators then report these “measurements” to the university board, as evidence that “We are not doing too badly, but we’ve still a ways to go.” As Muller puts it, such efforts “combine hard measures of statistical validity with weak interest in the validity of the units of measurement.”

The list of silly measurements in higher education goes on: the importance of researchers is “measured” by their “impact scores,” which count how many people cite their papers. These metrics are gamed by things like “citation circles,” in which a group of authors repeatedly cites each others’ papers. Furthermore, groundbreaking scientists like Copernicus (who fifty years after his death had roughly three followers), Gregor Mendel (whose genetic theories were only understood as important a decade after his death), Alfred Wegener (who proposed continental drift fifty years before the idea became accepted), or Georges Lemaître (first proponent of the “Big Bang” theory of the origins of the universe), would not have been recognized as of any importance, since their works would have achieved low “impact scores.”

Perhaps even worse than the silliness of many of these measurements is their cost: for instance, in 2002, England spent roughly a quarter of a billion pounds compiling such metrics. Wouldn’t this money have been far better spent hiring more teachers, or building better labs, or buying students computers? As Muller notes, “The effect [of this metric fixation] is to … divert spending from the doers to the administrators—which usually suits the latter just fine.” University students are educated by professors, but the “ratio of administrators to professors … has … risen astronomically in recent decades.”

Metric fixation has also wreaked havoc in “lower” education. A prime example is the “No Child Left Behind” legislation passed under George W. Bush’s presidency. A key goal of the legislation was to close the achievement gap between black students and those of other races, which has not happened. Meanwhile, the focus on reading and mathematics scores has led to neglect of other subjects, and to teachers devoting extensive class time for studying how to pass the newly mandated, standardized tests. As Muller notes, “The problem does not lie in the use of standardized tests … Value-added testing, which measures the changes in student performance from year to year, has real utility … It is the emphasis placed on these tests as the major criterion for evaluating schools that creates perverse incentives …”

Metric fixation fares no better in the field of medicine. For instance, Muller describes how, if hospitals or individual surgeons are compensated based on the survival rate of their patients, that gives them a motive to refuse service to the most desperate cases.

Police forces can respond to metric fixation in several counterproductive ways. If, for instance, officers are to be judged by a metric like “percentage of felony complaints leading to an arrest,” then a simple way to improve one’s “stats” is to report felonies that promise to be hard to solve as misdemeanors instead. Season two, episode two of the famed television series The Wire might have been written to illustrate the potentially destructive effect of metrics on policing: thirteen human trafficking victims are found dead in a cargo container. What ensues is a game of hot potato between the Baltimore city police, various state and county police forces, port authority police, and U.S. Customs officers, with each desiring to shove the case off on any other organization but their own. The reason? Having thirteen unsolved murders on their books will make their stats look bad. Actually achieving justice for the dead women is a distinctly secondary concern.

The military, too, has been infected with metric fixation. Instead of looking to results, metric-fixated military ventures often focus on inputs: sorties undertaken, money distributed, foreign soldiers trained, or schools built. But what if the sorties aren’t helping to defeat the enemy, the money is actually flowing into the enemy’s coffers, the soldiers trained soon desert, and the schools are blown up a year after they are built? However, as Muller notes, even better metrics aren’t a silver bullet: “use of the best performance metrics demands judgment based on experience.”

Business and finance have also fallen prey to metric fixation. A major cause of the crash of 2007–2008 was the focus on numbers, which could be manipulated in financial models (which showed there was little risk in the pyramid of derivatives being erected), while the wisdom of experienced traders was dismissed as hopelessly behind the times. And bonuses at investment banks were tied to measurable achievements, meaning that “metrics provided the means, and pay-for-performance supplied the motivation, for undue risk taking under conditions of opacity.” And Muller’s final case study demonstrates that obsessing over metrics has yielded similarly poor results in the fields of philanthropy and foreign aid as in all of the other areas he previously examined.

Muller concludes with two chapters, the first classifying the harms done by an over-reliance on metrics, and so acting as a summary of the case studies that preceded it. He includes such problems as:

  • Goal displacement: where focusing on a measured goal (think standardized tests) displaces a more important one (well-educated children) because it is measured and rewarded.
  • Cost in employee time: “Those within the organization end up spending more and more time compiling data, writing reports, and attending meetings at which the data and reports are coordinated.”
  • Diminishing utility: Initial success with metrics may encourage ever greater efforts to collect ever-less-useful metrics.
  • Rule cascades: Once it is detected that people are gaming a metric regime, an organization may attempt to block such cheating with ever more elaborate systems of rules. This adds even more overhead to the measuring process.
  • Discouraging risk taking: Often, a high-stakes venture with a big reward at the end may take years to pay off. As Muller notes, “The intelligence agents who ultimately located Bin Laden worked on the problem for years. If measured at any point, their productivity would have seemed to be zero.”

Muller’s final chapter demonstrates that he does not object to the use of metrics in principle, but only to the worship of metrics as a panacea that can replace experience and judgment. He declares, “There is nothing intrinsically pernicious about counting and measuring human performance.” He describes heuristics for differentiating a judicious, helpful use of metrics from “metric fixation.” It includes such advice as focusing on measuring things that are not affected by the process of measuring them, distinguishing what is worth measuring from what can be easily measured, and noting that the fact that some metrics are helpful does not mean that even more metrics are even more helpful. Most importantly, he points out that metrics developed and employed by practitioners themselves are less liable to abuse than are those imposed from the top down.

While some of the case studies offered here are a little skimpy—the one of philanthropy only runs four pages, and that on the military only five—nevertheless, Muller presents a very convincing argument that metric fixation does significant harm, and ought to be resisted. And because he connects metric fixation to the wider case against rationalism, his work is an important contribution to the growing literature condemning rationalism as the great plague of our times.  

Gene Callahan is associate industry professor of computer science at NYU. He is the author of Economics for Real People and Oakeshott on Rome and America.

Subscribe to the University Bookman