Size matters—how measurement defines our world
This article is based on his book Measurement Theory and Practice: the World through Quantifi cation, published by Arnold in 2004.
Abstract
“Measurement is the contact of reason with nature”—Henry Margenau
We live in a world of measurements, says David Hand. We talk about the weight of ingredients in cooking, the scores of students in tests, the inflation rate, the distance to the moon, the strength with which an opinion is held, and so on and so on. In brief, we see the world through the spectacles of quantification. But, occasionally, distortions or fractures occur in the lenses of quantification which cast doubt on our understanding of the world–doubts which sometimes raise questions about the reality we believe we are perceiving.
Here is a very simple example. A few years ago, an article in The Times said: “Temperatures in London were still three times the February average, at 55°F (13°C) yesterday.” Given this, one might reasonably ask: what is the February average? That's easy enough. It is 55/3=18⅓°F. Or, perhaps alternatively, it is 13/3=4⅓°C. But wait a moment: 18⅓°F is below freezing, whereas 13/3=4⅓°C is above freezing. Both of these cannot be right. We appear to have a contradiction.
Of course, this is a particularly simple example, and readers will no doubt immediately see why the contradiction arose (even if they will not be able to decide which answer is correct). But nevertheless it raises all sorts of questions. Do other, less obvious contradictions arise, which we do not notice? What if only one of the answers is given—how do we know it is the correct one? What mathematical or statistical operations other than averaging might lead to contradictions? How should we resolve such contradictions when they do arise? And, of course, more generally, what does it mean to “measure” something? In short, what is measurement?
A key feature of measurement is that it serves to represent relationships between objects by relationships between numbers: we compare the height of the Empire State Building with the length of a (1‐foot) unit ruler—and find that one is 1250 times the other. At the simplest level, these relationships will be in terms of a single characteristic or attribute of the objects—their weight, length, intelligence, brightness or whatever. Measurement, then, establishes a mapping from the empirical system of the objects to a numerical system. The mapping of the length of sticks to numbers representing those lengths is a very simple example. We can place the ends of two parallel sticks against a wall and see which of their other ends projects further into the room. If we call the sticks A and B, then we can use the numbers x(A) and x(B) to represent their lengths, and we can choose numbers such that x(A) > x(B) whenever A▹B, where A▹B means that stick A projects further than stick B.
Of course, this mapping is not unique. Any monotonic increasing transformation of the numbers x, to numbers y, say, will also preserve the empirical relationship, so that A▹B means that y(A) > y(B).
“Statistics, derived from properly measured attributes, add to the richness, depth and understanding of life, deepening our appreciation of it, with the potential for making it better”
In fact, we can go further with this example. There are also other empirical relationships between the sticks, and we might try to find numbers which preserve those relationships as well. For example, if we place one end of stick A against the wall, and then put stick B at the other end of A, in a straight line with it, then we can find another stick C which has the same empirical length as this combination of A and B. And we can find numbers to represent the lengths of the three sticks A, B and C such that the number assigned to C is the sum of the numbers assigned to A and B. In fact, of course, these numbers would be our usual, familiar, numerical lengths of the sticks.
“The formal exploration of the use of numerical systems for representing relationships between objects began around the end of the 19th century, at a time when so much else about our understanding of the universe was changing”
But, as we all know, even these numbers are not unique. An arbitrary rescaling would yield alternative legitimate numerical representations of the lengths. For example, we can multiply by 2.54 to change the representation in inches to that in centimetres.
The formal exploration of the use of numerical systems for representing relationships between objects began around the end of the 19th century, at a time when so much else about our understanding of the universe was changing. Atomic theories were beginning to be accepted as a description of reality, and not merely as a convenient mathematical fiction. Relativity and quantum mechanics were about to shake the very foundations of physics. And the axioms of formal theories of measurement began to be laid. In doing so, however, they also laid the foundations for a major controversy which was to dog measurement, and in particular its relationship to statistics, for most of the 20th century. The problem was that all of these earlier axiomatisations assumed as a basic starting point that there was some empirical operation, equivalent to our end‐to‐end positioning of sticks, which could be mapped to addition. That's great for length, weight (balance two rocks on one pan of a weighing scale by a single rock on the other) and other simple physical systems, but cracks appear in this approach when we try to measure more complicated things. In particular, the problem seemed especially acute in psychology: it is not at all obvious what “empirical addition” operation can be made for perceived loudness (is the loudness of sound C the same as the “combined loudness” of sounds A and B?). And as for things such as anger, pain, depression, and so on, well, things seemed hopeless.
This controversy stimulated some sharp exchanges. The physicist Norman Campbell remarked: “Why do not psychologists accept the natural and obvious conclusion that subjective measurements of loudness in numerical terms … are mutually inconsistent and cannot be the basis of measurement?”. But he did not restrict his criticisms to psychologists, also saying: “The most distinguished physicists, when they attempt logical analysis, are apt to gibber, and probably more nonsense is talked about measurement than about any other part of physics.” The psychophysicist Samuel Stevens remarked that Campbell “seems to contribute his fair share” to this nonsense. When this controversy later spilled over into whether and how measurement issues should impact on statistical analysis, others were equally confident—“This particular question admits of no doubt whatsoever” (Norman Anderson); “The height of absurdity” (John Gaito)—even if they were not always on the same side of the argument.
Psychologists reacted to the assault on the foundations of their discipline in various ways. Some ignored it. Others adopted an explicitly operational perspective: provided that a well‐defined procedure had been used to map the objects to numbers then it represented measurement. But Stevens presented a deeper analysis. He argued that measurement need not hinge on the notion of an empirical addition operation, but that other kinds of operation and relationship could be used, and that these produced different kinds of measurement scales. This led to the nominal, ordinal, interval and ratio scales with which some statisticians and all psychologists are now very familiar.
Stevens's work, carried out in the 1930s and 1940s, was fairly intuitive, but recent mathematical work has explored how many scale types there are, in terms of mappings to real numbers which preserve the relationships between objects in the system being measured. It turns out that Stevens was basically right: with a few additions, his scales are all that can exist. In an ordinal scale, for example, any strictly monotonic transformation of the numerical representation of the relationship between the objects leads to another which also preserves the relationships between objects. In a ratio scale, any rescaling transformation will do. Transformations which preserve structure in this way are called admissible transformations.
Based on these ideas, the philosopher Duncan Luce coined a principle for scientific laws. He said that a scientific law had to satisfy two properties:
- admissible transformations of the independent variables had to lead only to admissible transformations of the dependent variables; and
- the mathematical structure of theories should be invariant under admissible transformations.
In fact, Luce's principle also stimulated considerable controversy, and Luce eventually retracted the term “principle”. But, regardless of what one calls it, it is an immensely powerful idea. It underlies the ideas of dimensional analysis widely used in physics and engineering, and also used in economics and other areas. It allows one to spot, almost at a glance, certain kinds of errors in formulae: to detect the kind of creases in the apparent fabric of reality that I mentioned at the start. For example, a textbook which will remain nameless gives the following for the probability density function of the sample variance:

However, it only takes a moment to see that the units in which this is measured are not the correct units for the density of a variance.
In fact, even when dealing with empirical addition operations, confusion can arise. An empirical operation can be mapped to addition (think of placing electrical conductors in series, and mapping to numbers called “resistance”) but it can also be mapped to other numerical operations. For example, we could map the empirical operation of placing electrical conductors in series to the numbers which combined not by addition but by the operation x ⊕ y = (x−1 + y−1)−1. The numbers resulting from this mapping are called “conductance.” Although the numbers resulting from the two mappings will be different, there must obviously be a 1–1 mapping between them (since for example the order relationship is preserved in both cases). In fact, as many readers will know, the relationship is given by the reciprocal transformation:

This is all straightforward until one has to choose between resistance and conductance—for example, in analysing measures of galvanic skin response, as used in psychophysiology or lie detectors. Now contradictory results can be obtained when means are calculated using the two—both equally legitimate—representations. Although it may be obvious why the difference arises (expectation and non‐linear transformations do not commute), it is not at all obvious which, if either, is correct. In fact, the discovery that different researchers could draw different conclusions in this way, despite starting with the same data, led to a major controversy in the psychophysiology community in the 1960s and 1970s.
Things are then further complicated by the fact that we can also map a different empirical addition operation to addition: we could place our electrical conductors in parallel and map this operation to either addition or to ⊕. This means we have two different ways of combining objects physically, each of which can be mapped to two alternative numerical operations. Quite clearly the opportunity for confusion is very substantial.
Attributes of physical objects (length, weight, resistance and so on) are fairly well defined. And yet, as we have seen, even with these characteristics there is much scope for confusion and misunderstanding. How much more scope will there be, then, when we turn to measuring more subtle features: quality of life, the growth of the economy, customer satisfaction, the creativity of an advertising agency, the complexity of a computer program, and so on? For a start, the simple “direct” approach of looking at the relationships between objects (A▹B, A and B together balance C on the weighing scales, and so on) goes out of the window. Instead, we have to adopt an “indirect” approach, in which we relate the attribute we want to measure, but cannot observe directly, to simpler manifest variables which we can actually measure. Such relationships will be based on our understanding—our theories—about how these things are related. Latent variable models such as factor analysis are examples of this kind of approach. Note that implicit in this is a flavour of defining the attribute at the same time as deciding how to measure it. We decide what manifest variables to measure, and how to combine them to produce our overall measure of the attribute we are interested in—although numerical values of parameters and so on may be estimated from data.
In fact, we can take this idea further. We might choose to define our measurement procedure, and simultaneously our definition of the attribute being measured, on grounds of practical convenience or in terms of our aims as well as in terms of purported theoretical relationships. Thus, for example, in measuring health‐related quality of life (HRQOL), we might decide a priori to include measures of pain, feelings of worthlessness, lack of sleep and so on, with certain specified weights, simply because that is how we want to define HRQOL and that is the extent to which we wish each component measure to contribute to it. I call this the pragmatic aspect of measurement.
Measurement, then, has two rather different aspects. It has a representational aspect, in which empirical objects are mapped to numbers such that relationships between objects are represented by relationships between numbers. And it has a pragmatic aspect, in which the numbers are chosen on external grounds. Measurement procedures combine these two aspects. Measurement of attributes such as length sits well towards the representational end of the continuum, whereas measurement of HRQOL, as defined above, sits well towards the pragmatic end.
Merely because a procedure may have weak representational aspects does not mean it is inadequate. If I find that some almost entirely pragmatic measurement x is correlated with some other measurement y, then I can use x to predict y. I can use my HRQOL measure to predict risk of suicide. So, when psychologists Robyn Dawes and Tom Smith say that “it is not uncommon for psychologists and other social scientists to investigate a phenomenon at great length without knowing what they're talking about”, it does not mean that the measurements they take are useless. It merely means that the systems being studied cannot be defined sufficiently precisely to permit pure representational measurement, but necessarily have a substantial pragmatic component.
One might suspect that the pragmatic aspects are confined to the social sciences, but this is not so. Pragmatic considerations are also used to crystallise the choice of numbers in the physical sciences. Jim Baggott describes photon polarisation states thus: “Remember that we have no way of knowing the ‘actual’ signs of the phase factors because this is information that is not revealed in experiments. However, we can adopt a phase convention which, if we stick to it rigorously, will always give results that are both internally consistent and consistent with experiment.”
There are many aspects to measurement and its relationship to our understanding of the world that I have not covered in this article. One is the issue of accuracy of measurement. This can have major consequences. An error in measuring growth in the UK's construction industry led to the first three months of 2003 having a reported 2.6% fall, instead of the correct figure of a 0.5% growth. In constructing the Climate Orbiter Mars probe, at a cost of $125 million, Lockheed Martin Corporation used pressure data based on pounds while the Jet Propulsion Laboratory used Newtons. The result of this confusion was that the probe failed, probably because it entered the Martian atmosphere too low and burned up. Many other examples, equally costly, can easily be found.
Another issue I have not covered is the notion that the formalisation implicit in measurement “drives away the magic”. This is an anxiety which has always been with us, doubtless from earliest undocumented attempts to understand the paths of the planets, certainly in the mid‐19th century when older physicians “thought it hardly in good taste to time the pulse with a watch”, and now all the way through to the present, when David Boyle could write, in his uncompromisingly titled book The Tyranny of Numbers: Why Counting Can't Make Us Happy: “Every time a new set of statistics comes out, I can't help feeling that some of the richness and mystery of life gets extinguished”. Well, he is right that some of the mystery goes. But my Concise Oxford Dictionary defines mystery as “A secret, hidden or inexplicable matter”. Surely it is all to the good to extinguish such things. As to whether the richness goes, on the contrary, statistics, derived from properly measured attributes, add to the richness, depth and understanding of life, deepening our appreciation of it, with the potential for making it better.
Measurement and quantification are ubiquitous. Our entire Weltanschauung is built on such notions. As Theodore Porter put it: “Karl Pearson was neither the first nor the last to worship quantification, which he regarded as integral to scientific method. Its appeal has been the appeal of impersonality, discipline, and rules. Out of such materials, science has fashioned a world.”
Citing Literature
Number of times cited according to CrossRef: 1
- J. C. F. de Winter, Controversy in human factors constructs and the explosive use of the NASA-TLX: a measurement perspective, Cognition, Technology & Work, 10.1007/s10111-014-0275-1, 16, 3, (289-297), (2014).







