Why ‘General Intelligence’ Doesn’t Exist

Donald Trump took an IQ test … you’ll never guess what he scored!

Apologies. That was my attempt at clickbait.1 Now that I’ve hooked you, let’s talk about the elephant in the room. No, not Donald Trump. Let’s talk about IQ.

For as long as I can remember I’ve been skeptical of measures of ‘intelligence’. The whole procedure rubbed me the wrong way. You take a test, get a score, and find out your ‘intelligence’. Doesn’t that seem weird? In school, I took hundreds of tests. None of them claimed to measure ‘intelligence’. It was clear to me (and to everyone else) that each test measured performance on specific tasks. But IQ tests are somehow different. Rather than measure specific skills, IQ tests claim to measure something more expansive: general intelligence.

I think this claim is bullshit. The problem, as I see it, is that ‘general intelligence’ doesn’t really exist. It’s a reified concept — a vague abstraction made concrete though a series of arbitrary decisions.

To see the arbitrariness, let’s use different words. Substitute ‘intelligence’ with ‘performance’. Imagine that your friend tells you, “I just took a general performance test. I scored in the top percentile!” You’d ask, “What did you perform? Did you make a painting? Do some math? Play music? Play a video game?” It’s obvious that this ‘general performance’ test is arbitrary. Someone thought of some tasks, measured performance on these tasks, and added up the results. Presto! They (arbitrarily) measured ‘general performance’.

This arbitrariness is part of any measure that aggregates different skills. The problem is that the skills that we select will affect what we find. That’s because a person who is exceptional on one set of tasks may be average on another set. And so our aggregate measurement depends on what we include in it. This is true of ‘general performance’. And it’s true of ‘general intelligence’.

The word ‘intelligence’, however, carries a mystique that ‘performance’ does not. No one believes that ‘general performance’ exists. Yet many people think that ‘general intelligence’ lurks in the brain, waiting to be measured.

It doesn’t.

A complete (and hence, objective) measure of ‘general intelligence’ is forever beyond our reach. And if we forge ahead anyway, we’ll find that how we define intelligence affects what we find.

Speaking of ‘intelligence’

I’ll start this foray into intelligence not with psychology, but with linguistics. Language is, in many ways, a barrier to science. The problem is that everyday language is imprecise. Usually that’s a good thing. Vagueness allows us to communicate, even though our subjective experiences are different. We can talk about ‘love’, for instance, even though we each define the word differently. And we can talk about ‘intelligence’, even though the concept is poorly defined.

In everyday life, this vagueness is probably essential. Without it, we’d spend all day agreeing on definitions. But in science, vague language is ruinous. That’s because how we define concepts determines how we measure them. Without a precise definition, precise measurement is impossible. And without precise measurement, there is no science.

Take, as an example, something as simple as mass. In everyday language, we use the word ‘mass’ as a synonym for ‘weight’. Usually that’s not a problem. But if you want to do physics, you need to be more precise. Equating ‘mass’ with ‘weight’ implies that you can use a spring scale to measure ‘mass’. But that’s true only in certain circumstances.

In physics, ‘mass’ has a specific definition. It’s the resistance to acceleration.2 Now, spring scales can measure mass, but only in the correct setting. That’s because spring scales technically measure ‘force’, not ‘mass’. But the two concepts are related. According to Newton’s laws, force is proportional to mass times acceleration (F = ma). So if we know the acceleration and the force, we can infer mass. On Earth, the downward acceleration of gravity is (nearly) constant.3 That means we can use the force registered on a spring scale to measure mass. But this works only if you’re at rest. If you’re in an accelerating elevator, your bathroom scale will mislead you.

The point of this foray into physics is to highlight how measurement follows from a definition. Newton defined mass as force per unit of acceleration: m = F/a. From this precise definition follows precise measurement.

Back to intelligence. In the same way that we speak of ‘mass’ in colloquial terms, we also speak of ‘intelligence’. But whereas physicists have devised their own precise definition of ‘mass’ (that differs from the colloquial usage), psychologists have not devised a precise definition of ‘intelligence’. This makes its measurement problematic.

When we measure ‘intelligence’, what exactly are we quantifying? Perhaps an easier question is what are we excluding? When we measure mass, for instance, we exclude ‘color’. That’s because according to Newton’s laws, color doesn’t affect mass. So what doesn’t affect ‘intelligence’?

Most human behavior.

If you look at how IQ tests are constructed, they exclude an enormous range of human behavior. They exclude athletic ability. They exclude social and emotional savvy. They exclude artistic skill (visual, musical, and written). The list goes on.

What’s the reason for this exclusion? It stems not from any scientific concept of ‘intelligence’, but rather, from the colloquial definition of the word. In common language, great musicians are not considered ‘intelligent’. They are ‘talented’. The same is true of a host of other skilled activities. In common parlance, the word ‘intelligent’ is reserved for a specific suite of skills that we might call ‘book smarts’. A mathematician is intelligent. An artist is talented.

There’s nothing wrong with this type of distinction. In fact, it highlights an interesting aspect of human psychology. We put actions into categories and use different words to describe them. Sometimes we even use different categories when the same task is done by different objects. When a person moves through water, we say that they ‘swim’. But when a submarine does the same thing, it doesn’t ‘swim’. It ‘self propels’. Similarly, when a person does math, they ‘think’. But when a computer does math, it ‘computes’.4

This type of arbitrary distinction isn’t a problem for daily life. But it’s a problem for science. Science usually requires that we abandon colloquial definitions. They’re simply too vague and too arbitrary to be useful. That’s why physicists have their own definition of ‘mass’ that differs from the colloquial concept. But with ‘intelligence’, something weird happens. Cognitive psychologists use the colloquial concept of ‘intelligence’, which arbitrarily applies to a narrow range of human behaviors. Then they attempt to measure a universal quantity from this arbitrary definition. The result is incoherent.

Intelligence as computation

If we want to measure something, the first thing we need to do is define it precisely. So how should we define ‘intelligence’? We should define it, I believe, by turning to computer science. That’s because one of the best ways to understand our own intellect is to try to simulate it on a computer. When we do so, we realize that the concept of intelligence is quite simple. Intelligence is computation.

This simplicity doesn’t mean that intelligence is easy to replicate. We struggle, for instance, to make computers drive cars — a task that most people find mundane. But defining intelligence as ‘computation’ tells us which tasks require ‘intellect’ and which do not. Catching a ball requires intellect because for a computer to do so, it must calculate the ball’s trajectory. But the ball itself doesn’t need intellect to move on its trajectory. That’s because the laws of physics work whether you’re aware of them or not.

Having defined intelligence as computation, we immediately run into a problem. We find that ‘general intelligence’ can’t be measured. Here’s why. Our definition implies that ‘general intelligence’ is equivalent to ‘general computation’. But ‘general computation’ doesn’t exist.

To see this fact, imagine asking a software engineer to write a program that ‘generally computes’. They’d look at you quizzically. “Computes what?” they’d ask. This reaction points to something important. While we can speak of ‘computation’ in the abstract, real-world programs are always designed to solve specific problems. A computer can add 2 + 2. It can calculate π. It can even play Jeopardy. But what a computer cannot do is ‘generally compute’. The reason is simple. ‘General computation’ is unbounded. A machine that can ‘generally compute’ could solve every specific problem that exists. It could also solve every problem that will ever exist.

This unboundedness raises a giant red flag for measuring intelligence. If ‘general computation’ is unbounded, so is ‘general intelligence’. This means that neither concept can be measured objectively.

Think of it like a sentence. Suppose that your friend tells you that they’ve constructed the longest sentence possible. You know they’re wrong. Why? Because sentences are unbounded. No matter how long your friend’s sentence, you can always lengthen it with the phrase “and then …”. The same is true of ‘general computation’. If someone claims to have definitively measured ‘general computation’, you can always show that they’re wrong. How? By inventing a new problem to solve.

The same is true of ‘general intelligence’. Any measure of ‘general intelligence’ is incomplete, because we can always invent new tasks to include. This means that a definitive measure of ‘general intelligence’ is forever beyond our reach.

Impossible … but let’s do it anyway

I don’t expect the argument above to convince many cognitive psychologists to stop measuring intelligence. That’s because a general dictum in the social sciences seems to be:

If you cannot measure, measure anyhow.5

As a social scientist, I understand this dictum (although I don’t agree with it). It arises out of practicality. Many concepts in the social sciences are poorly defined. If we waited for precise definitions of everything, we’d never measure anything. The solution (for many social scientists) is to pick an arbitrary definition and run with it.

With the ‘measure anyhow’ dictum in mind, let’s forge ahead. Let’s pick an arbitrary set of tasks, measure performance on these tasks, and call the result ‘intelligence’.

Which tasks should we include? If intelligence is computation, every human task is fair game. (I can’t think of a single task that doesn’t require computation by the brain. Can you?) Let’s spell out this breadth. Any conscious activity is fair game for our intelligence test. So is any unconscious activity.

Against this vast set of behavior, think about the narrowness of IQ tests. Taking them involves sitting at a desk, reading and responding to words. That’s an astonishingly narrow set of human behavior. And yet IQ tests claim to measure ‘general intelligence’.

Variation in intelligence

That IQ tests are ‘narrow’ is an old critique that I don’t want to dwell on. Instead, I want to ask a related question. If we widened our test of intelligence, what would we find? Unfortunately, no one has ever attempted a broad test that includes the full suite of human behavior. So we don’t know what would happen. Still, we can make a prediction.

To do so, we’ll start with a rule of thumb. The narrower the task, the more performance between people will vary. Conversely, the broader a task, the less performance between people will vary. The consequence of this rule as that as we add more tasks to our measure of intelligence, variation in intelligence should collapse.

This prediction stems in part from our intuition about the mind. But it also stems, as I explain below, from basic mathematics.

Chess power

Back to our rule of thumb. The narrower a task, the more performance will vary between individuals.

To grasp this rule, ask yourself the following question: who is the world’s best gamer? That’s hard to know. There are many different games, and everyone is better at some than others. Now ask yourself: who is the best chess player? That’s easier to know. The best chess players — the grandmasters — stand out from the crowd.

This thought experiment suggests that abilities at specific games vary more than abilities at a wide range of games. Why is this? I suspect it’s because the rules of a specific game restrict the range of allowable behavior. This constraint emphasizes subtle differences in how we think. In everyday life, such differences are imperceptible. But games like chess bring them to the forefront. In chess, a minute cognitive difference gets amplified into a huge advantage.

This rule of thumb raises an interesting question. At ultra-narrow tasks like chess, how much does individual ability vary? Like most aspects of human performance, we don’t really know. But we can hazard a guess. And we can use our definition of intellegence to do so.

Intelligence, I’ve proposed, is computation. Taking this literally, suppose we measured chess-playing intelligence in terms of the computer power needed to do defeat you. How much would this computer power vary between people? We don’t have rigorous data. But history does provide anecdotal evidence. Let’s look at the computer power needed to defeat two different men: Hubert Dreyfus and Garry Kasparov.

Hubert Dreyfus was an MIT professor of philosophy. A vocal critic of machine intelligence, Dreyfus argued bellicosely that computers would never beat humans at chess. In 1967, Dreyfus played the chess-playing computer Mac Hack VI. He lost. What is perhaps most humiliating, in hindsight, is that Mac Hack ran on a computer that today wouldn’t match a smartphone. To beat Dreyfus, Mac Hack evaluated about 16,000 chess positions per second.

Despite humiliating Dreyfus, computers like Mac Hack were no match for the best human players. Not even close. Take, as an example, chess grandmaster Garry Kasparov. In 1985, Kasparov beat thirty-two different chess-playing computers simultaneously. (As Kasparov describes it, he “walked from one machine to the next, making … moves over a period of more than five hours.”) Still, Kasparov was eventually defeated. In 1997 he lost to IBM’s Deep Blue. But what testifies to Kasparov’s astonishing ability is the computational power needed to beat him. Deep Blue could evaluate 200 million positions per second. That’s about 10,000 times more computing power than needed to beat Hubert Dreyfus.

So Garry Kasparov may have been 10,000 times better at chess than Hubert Dreyfus. But was he 10,000 times more intelligent? Unlikely. The reason stems from our rule of thumb. Yes, performance can vary greatly when tasks are hyper specific. But as we broaden tasks, performance variation will decrease.

Think about it this way. At his peak, Kasparov was certainly the greatest chess player. But he was not the greatest Go player. Nor was he the greatest bridge player. So if we measured Kasparov’s intelligence at many different types of games, he would appear less exceptional. That’s because his stupendous ability at chess would be balanced by his lesser ability at other games. If we moved beyond gaming to the full range of human tasks, Kasparov’s advantage would lessen even more. The reason is simple. No one is the greatest at everything.

A central limit

When we generalize the principle that ‘no one is the greatest at everything’, something startling happens. We find that the more broadly we define ‘intelligence’, the less variation we expect to find. The reason, interestingly, has little to do with the human mind. Instead, it stems from a basic property of random numbers.

This property is described by something called the central limit theorem. As odd as it sounds, the central limit theorem is about the non-random behavior of random numbers. I’ll explain with an example. Suppose that I have a bag containing the numbers 0 to 10. From this bag, I draw a number and record it. I put the number back into the bag and draw another number, again recording it. Then I calculate the average of these numbers. Let’s try it out. Suppose I draw a 1 followed by a 7, giving an average of 4. Repeating the process, I draw an 8 followed by a 10, giving an average of 9. As expected, the numbers vary randomly, and so does the corresponding average. But according to the central limit theorem, there’s order hidden under this randomness.

Like our random numbers themselves, you’d think that the average of our sample is free to bounce around between 0 and 10. But it’s not. Variation in the average, it turns out, depends on the sample size. For a small sample, the average could indeed be anything. But for a large sample, this isn’t true. As my sample size grows, the central limit theorem tells us that the average must converge to 5. Stated differently, the more numbers I draw from the bag, the less the average of my sample is allowed to vary.6

That’s interesting, you say. But what does the central limit theorem have to do with intelligence? Here’s why it’s important. To measure someone’s ‘intelligence’, we take a set of tasks and then average their performance on each task. While seemingly benign, this act of averaging evokes the central limit theorem under the hood. And that causes something startling to happen. It means that the number of tasks included in our measure of intelligence affects the variation of intelligence.

I’ll show you a model of how this works. But first, let’s make things concrete by returning to chess wizard Garry Kasparov. Kasparov, it’s safe to say, is far better at chess than the average person — perhaps thousands of times better. So if we were to measure ‘intelligence’ solely in terms of chess performance, Kasparov would be an unmitigated genius. But as we add other tasks to our measure of intelligence, Kasparov’s genius will appear to decline. That’s because like any human, Kasparov isn’t the greatest at everything. So as we add tasks in which Kasparov is mediocre, his ‘intelligence’ begins to lessen. In other words, Kasparov’s ‘intelligence’ isn’t some definite quantity. It’s affected by how we measure intelligence!

A model of ‘general intelligence’

Let’s put this insight into a model of ‘general intelligence’. Imagine that we have a large sample of people — veritable cross-section of humanity. We subject each person to a barrage of tests, measuring their performance on thousands of tasks. Their average performance is then their ‘intelligence’.

The problem, though, is that we have to choose which tasks to include in our measure of intelligence. In academic speak, this choice is called the ‘degrees of freedom’ problem. It’s a problem because if a researcher has too much freedom to choose their method, you can’t trust their results. Why? Because they could have cherry-picked their method to get the results they wanted.

Suppose we’re aware of this problem. To solve it, we decide not to pick just one measure of intelligence. We’ll pick many. We start by selecting a single task and using it to measure intelligence. We then measure how intelligence varies across the population. Next, we add another task to our metric, and again measure intelligence variation. We repeat until we’ve included all of the available tasks.

Before getting to the model results, one more detail. Let’s assume that individuals’ performance on different tasks is uncorrelated. This means that if Bob is exceptional at arithmetic, he can be abysmal at multiplication. Bob’s skill at different tasks is completely random. Now, this is obviously unrealistic. (I’ll revise this assumption shortly.) But I make this assumption to illustrate how the central limit theorem works in pure form. This theorem assumes that random numbers are independent of one another. Applied to intelligence, this means that individuals’ performance on different tasks is unrelated.

Figure 1 shows the results of this simple model. The horizontal axis shows the number of tasks included in our measure of intelligence. We start with just 1 task and gradually add more until we’ve included 10,000. For each set of tasks, we measure the ‘intelligence’ of every person. Finally, we measure the variation in intelligence using the Gini index. (A Gini index close to 1 indicates huge variation. A Gini index close to 0 indicates minimal variation.) Plotting this Gini on the vertical axis, we see how the variation of ‘general intelligence’ changes as we add more tasks.

Figure 1: Variation in ‘general intelligence’ decreases as more tasks are measured. Here’s the results of a model in which we vary the number of tasks included in a measure of general intelligence. I’ve assumed that individuals’ performance on different tasks is uncorrelated. The vertical axis shows how the variation in general intelligence (measured using the Gini index) decreases as more tasks are added.

According to our model, variation in general intelligence collapses as we add more tasks. Intelligence starts with a Gini index of about 0.38. This represents the performance variation on each task. (I’ve chosen this value arbitrarily.) As we add more tasks, the variation in intelligence collapses. Soon it’s far below variation in standardized tests like the SAT. (SAT scores have a Gini index of about 0.11.)7

The take away from this model is that our measure of intelligence is ambiguous. There is no definitive value, but instead a huge range of values. If we include only a few tasks, ‘intelligence’ is unequally distributed. But as we add more tasks, ‘intelligence’ becomes almost uniform. This doesn’t mean that the properties of people’s intellect change. Far from it. Our results are caused by the act of measurement itself. How we define ‘intelligence’ affects how it varies.

A more realistic model

The model above comes with a big caveat. I’ve assumed that performance on different tasks is uncorrelated. This is dubious. If Bob is exceptional at arithmetic, he’s probably also exceptional at multiplication.

This correlation between related abilities is common sense. It’s also scientific fact. Performance on different parts of IQ tests tends to be correlated. If you score well on the language portion, for instance, you’ll also likely score well on the math portion. Knowledge of this correlation dates to the early 20th-century work of psychologist Charles Spearman. He found that performance of English school children tended to correlate across seemingly unrelated subjects. This correlation between different abilities is important because it’s the main evidence for ‘general intelligence’. It suggests that underneath diverse skills lies some ‘general intellect’. Charles Spearman called it the g factor.

Given that abilities tend to correlate, let’s revise our model. We’ll again measured performance on a wide variety of tasks. But now, let’s assume that performance on ‘adjacent’ tasks is highly correlated.

Here’s how it works. Suppose task 1 is simple arithmetic and task 2 is simple multiplication. I’ll assume that performance on the two tasks is 99% correlated (meaning the correlation coefficient is 0.99). This means that if you’re great at arithmetic, you’re also great at multiplication. But I’ll go further. I’ll assume that performance on any adjacent pair of tasks is 99% correlated. Suppose that task 3 is simple division. Performance on this task is 99% correlated with performance on task 2 (multiplication). Task 4 is exponentiation. Performance on task 4 is 99% correlated with task 3 (division). This correlation between adjacent tasks goes on indefinitely. Performance on task n is always 99% correlated with performance on task n-1.

The effect of this correlation is twofold. First, it creates a broad correlation in performance across all tasks. So if you went looking for a ‘g-factor’, you’d always find it. Second, it creates a gradient of ability. So if you’re excellent at multiplication, you’re also excellent at related tasks like addition. But this excellence diffuses as we move to unrelated tasks (say cooking). This gradient, I think, is a realistic model of human abilities.

With this more realistic model, it may seem that ‘general intelligence’ is better defined. If performance between tasks is highly correlated, it seems like there really is some ‘general intellect’ waiting to be measured.

And yet there isn’t.

As Figure 2 shows, variation in ‘general intelligence’ is still a function of the number of tasks measured. When we measure few tasks, intelligence varies greatly between individuals. But as we add more tasks, this variation collapses. This pattern is an unavoidable consequence of the central limit theorem. The more random numbers we add together, the less the corresponding average varies (even when these random numbers are highly correlated).

Figure 2: Variation in ‘general intelligence’ decreases as more tasks are measured, even when performance on adjacent tasks is highly correlated. Here’s the results of a second model in which we vary the number of tasks included in our measure of general intelligence. This time individuals’ performance on different tasks is highly correlated. I assume performance on adjacent tasks (meaning task n and n+1) is 99% correlated. The vertical axis shows how the variation in general intelligence (measured using the Gini index) decreases as more tasks are added.

The results of this model are unsettling. Despite strong correlation between performance on different tasks, it seems that ‘general intelligence’ is still ambiguous. It’s not a definite property of the brain. Instead, it’s a measurement artifact that we actively construct.

Multiple intelligences?

One of the long-standing criticisms of IQ tests is that they are too narrow. They measure only one ‘type’ of intelligence. The alternative, critics propose, is that many types of intelligence exist. This leads to the theory of ‘multiple intelligences’.

At first glance, such a theory seems convincing. There are many types of human abilities. Why not assign each of these abilities its own domain of ‘intelligence’ and then measure it accordingly. Sounds good, right?

While I’m sympathetic to this approach, I think it grants too much credence to orthodox measures of intelligence. It effectively says ‘you can keep your standard measure of intelligence, but we’ll add others to it’. The problem is that the arguments for ‘general intelligence’ can always be used to undermine the theory of ‘multiple intelligences’. Suppose we discover that different ‘types’ of intelligence are correlated with the ‘g-factor’ (a real finding). This suggests that intelligence isn’t multiple, but general.

What I’ve tried to show here is that even if we grant a strong correlation between different abilities, the measure of ‘general intelligence’ is still ambiguous. We can never objectively measure ‘general intelligence’ because the concept is unbounded. This means that any specific measure is incomplete, and worse still, arbitrary. We can put on a brave face and measure anyway. But doing so won’t solve the problem. Instead, we’ll find that ‘intelligence’ is circularly affected by how we’ve defined it.

Does this mean we shouldn’t measure human abilities? Of course not. Specific abilities can be measured. The trouble comes when we attempt to measure general abilities. The problem is that such abilities are fundamentally ill-defined. The sooner we realize this, the sooner we can put ‘general intelligence’ in its proper place: the trash bin of history.

Support this blog

Economics from the Top Down is where I share my ideas for how to create a better economics. If you liked this post, consider becoming a patron. You’ll help me continue my research, and continue to share it with readers like you.


Stay updated

Sign up to get email updates from this blog.

[Cover image: Pixabay]

Model Code

Here’s the code for my model of intelligence. It runs in R. Use it and change it as you see fit.

The model assumes that performance on each task is lognormally distributed. You can vary this distribution by changing the parameters inside the rlnorm function. In the first model (iq_uncor), performance is completely random. But in the second model (iq_cor), performance on each task is 99% correlated with performance on the previous task. I create the correlation using the function simcor. To vary the correlation, change the value for task_cor (to any value between 0 and 1).


# number of tasks in IQ test
n_tasks = 10^4

# number of people
n_people = 10^4

# task correlation (model 2)
task_cor = 0.99

# distribution of performance on each task
performance = function(n_people){ rlnorm(n_people, 1, 0.7) }

# mean and standard deviation of performance on each task
perf_mean = mean(performance(10^4))
perf_sd = sd(performance(10^4))

# function to generate correlated random variable
simcor = function (x, correlation, stdev) {
  x_mean = perf_mean
  x_sd = perf_sd
  n = length(x)
  y = rnorm(n)
  z = correlation * scale(x)[,1] + sqrt(1 - correlation^2) * scale(resid(lm(y ~ x)))[,1]
  y_result <- x_mean + x_sd * z

# output vectors (Gini index of IQ)
g_uncor = rep(NA, n_tasks)
g_cor = rep(NA, n_tasks)

# loop over tasks
pb <- txtProgressBar(min = 0, max = n_tasks, style = 3)

for(i in 1:n_tasks){
    if(i == 1){
        # first task
        x_uncor = performance(n_people)
        iq_uncor = x_uncor
        x_cor = performance(n_people)
        iq_cor = x_cor
    } else {
        # all other tasks  
        x_uncor = performance(n_people)
        iq_uncor = iq_uncor + x_uncor
        x_cor = abs( simcor(x_cor, task_cor) )
        iq_cor = iq_cor + x_cor

    # Gini index of IQ
    g_uncor[i] = Gini(iq_uncor)
    g_cor[i] = Gini(iq_cor)      
    setTxtProgressBar(pb, i)

results = data.table(n_task = 1:n_tasks, g_uncor, g_cor)

# export
fwrite(results, "iq_model.csv")


  1. This kind of clickbait is all over the internet. Here’s a real example: “What is Donald Trump’s IQ? His IQ test scores will shock you”.
  2. Actually, ‘mass’ has a dual meaning in physics. Mass is ‘resistance to acceleration’ — usually called the inertial mass. But mass is also what causes gravitational pull — the gravitational mass. According to Newton’s equivalence principle, the two masses are the same. That’s why all objects accelerate uniformly in the same gravitational field.
  3. Where is the gravitational ‘acceleration’ when you’re standing (at rest) on the bathroom scale? The convention, in physics, is to treat the acceleration as what would occur if the Earth was removed from beneath your feet and you entered free fall. Since you’re not in free fall, it follows that the Earth is constantly working to stop this acceleration by applying an upward force (what physicists call the ‘normal’ force). The bathroom scale measures this upward force. Given the known acceleration when in free fall, (9.8 m/s), you can use this force to measure your mass. But only if you’re at rest.
  4. Noam Chomsky often uses this linguistic analogy when discussing artificial intelligence. Do machines think? A meaningless question, he argues:

    There is a great deal of often heated debate about these matters in the literature of the cognitive sciences, artificial intelligence, and philosophy of mind, but it is hard to see that any serious question has been posed. The question of whether a computer is playing chess, or doing long division, or translating Chinese, is like the question of whether robots can murder or airplanes can fly — or people; after all, the “flight” of the Olympic long jump champion is only an order of magnitude short of that of the chicken champion (so I’m told). These are questions of decision, not fact; decision as to whether to adopt a certain metaphoric extension of common usage.

    There is no answer to the question whether airplanes really fly (though perhaps not space shuttles). Fooling people into mistaking a submarine for a whale doesn’t show that submarines really swim; nor does it fail to establish the fact. There is no fact, no meaningful question to be answered, as all agree, in this case. The same is true of computer programs, as Turing took pains to make clear in the 1950 paper that is regularly invoked in these discussions. Here he pointed out that the question whether machines think “may be too meaningless to deserve discussion,” being a question of decision, not fact, though he speculated that in 50 years, usage may have “altered so much that one will be able to speak of machines thinking without expecting to be contradicted” — as in the case of airplanes flying (in English, at least), but not submarines swimming. Such alteration of usage amounts to the replacement of one lexical item by another one with somewhat different properties. There is no empirical question as to whether this is the right or wrong decision.

    (Chomsky in Powers and Prospects)

  5. This quote comes from Frank Knight, who was commenting on economists’ inability to measure utility. This inability didn’t stop them however. Economists simply inverted the problem. Utility was supposed to explain prices. But prices, economists proposed, ‘revealed’ utility. Knight’s comment is quoted in Jonathan Nitzan and Shimshon Bichler’ book Capital as Power.
  6. The central limit theorem is usually stated as follows. Imagine we sample n numbers from a distribution with mean μ and standard deviation σ. The sample mean distribution will have a standard deviation of \sigma/\sqrt{n} . So as n grows, the standard deviation of the sample mean will converge to 0.
  7. Here’s how I estimate the Gini index for the SAT. According to College Board, the average score for the 2019 SAT was 1059 and the standard deviation was 210. That gives a coefficient of variation (the standard deviation divided by the mean) of 0.2. Next, we’ll assume that SAT scores are lognormally distributed. The coefficient of variation for a lognormal distribution is CV=\sqrt{e^{\sigma^2} - 1} , where σ is the ‘scale parameter’. Solving for σ gives: \sigma = \sqrt{\log(CV^2 + 1)} . The Gini index of the lognormal distribution is then defined as G=\text{erf}(\sigma/2) , where erf is the Gauss error function. Plugging CV = 0.2 into these equations gives a Gini index of SAT performance of 0.11.

Further reading

Chomsky, N. (2015). Powers and prospects: Reflections on nature and the social order. Haymarket Books.

Gardner, H. (1985). Frames of mind: The theory of multiple intelligences. Basic Books.

Gould, S. J. (1996). The mismeasure of man. WW Norton & company.

Thomson, G. H. (1916). A hierarchy without a general factor. British Journal of Psychology, 8(3), 271.


  1. Wow, big topic.

    I disagree that intelligence = computation. Maybe some of what IQ tests—even academic tests—measure is computation. But they also test knowledge of the appropriate procedure or algorithm, and they may even test for the ability to derive new procedures or algorithms.

    Intelligence—insofar as it is a real quality—is less the ability to execute programs, and more the ability to write them. Of course they cannot be completely separated. But the quest for “General Artificial Intelligence” (an acknowledged poor label) is not to increase raw computation, but to devise systems which not only execute procedures, but also devise them.

    General intelligence, as vague and even misleading as that is, does not imply “unbounded” intelligence, nor of infinite computation. It is a measure of the ability to deal in—to recognize, apply, and create—layers of abstraction. The more a person, or a machine, can recognize patterns and apply high-level abstractions to discover knowledge, which is in the form of either facts or procedures at a lower level of abstraction, the more intelligent they could be said to be.

    Aside from that, I do actually agree with your critique, but I think you have somewhat either misunderstood or misrepresented the computer science.

    For example, it’s not at all obvious what conclusions are appropriate to make from the Kasparov vs Deep Blue. To a certain degree, it is Apples vs Oranges. Both fruits can satisfy your hunger. Different strategies can win chess games.

    I guess your implicit point is that measuring “IQ” is a stupid waste of time. You are probably right, but it depends on the goal of the tester. What benefits IQ tests have, if any, may be vastly outweighed by the harm of the edifice of false beliefs and biased attitudes which people bring with them, and which the tests seem to promote.

    Anyway, if I had time, I should write my own blog post in response, as this topic is vast and complex. But better people are dedicated to it already.


    • Hi brillient,

      I admit that I’m far from an expert on artificial intelligence. So wading into the debate is probably not the best idea. Still, I have thought about it somewhat.

      You’re correct that using computer power to measure intelligence is crude. If we were to divide AI into three components, I’d say it is:

      1. Hardware capacity
      2. Information base
      3. Algorithm/software

      Obviously it’s important to have good hardware, but a supercomputer without a program is useless. I didn’t spell it out in the post, but an interesting test of chess skill would be to let a computer play humans while holding the software constant. Then vary some part of the hardware (by artificially limiting calculations). This is by no means a definitive test, but I think it would be interesting nonetheless.

      I’m told that modern chess-playing computers (based on neural nets) are far better than their GOFAI progenitors. So they can probably do more with less computer power. Again, this is a variable that I glossed over.

      Also, I didn’t really define ‘computation’ in the essay. I certainly don’t mean compute in the mathematical sense … as in calculate. I mean it in the broadest sense of AI. If you need AI to replicate some human behavior, that behavior counts as ‘intelligence’. Obviously this is useless for the person wanting to create AI. I use it mostly as a situating device, to escape the myopia of psychology.

      I also find it interesting that the tasks that are colloquially most associated with ‘intelligence’ are the ones that are most easily done by a computer. Most people would say the if Bob can do 4-digit multiplication in his head, Bob is ‘intelligent’. Today your pocket calculator can do the same task — indeed beat any human at it. What computers struggle to do is mundane tasks like driving. Why?

      Here’s my theory. We associate the word ‘intelligence’ with tasks that most people find difficult. We assume that this perceived difficulty is an indication that the task is actually difficult, and hence, requires great intelligence. So multiplying 4-digit numbers in our heads is difficult for most people. So it requires ‘intelligence’.

      This is a mistake. The perceived difficulty of a task is really about our level of adaption to that task. If we’ve had millions of years to adapt to doing something (say walking upright), it becomes ‘easy’. In fact, we don’t even need to think about walking — the task is instinctive.

      But the ability to do a task instinctively tells us nothing about the difficulty of the task. My understanding is that building human-like machines that walk upright is very difficult. Millions of years of evolution make us think it is easy.

      The tasks that we are not really adapted to do — say do calculus — feel very difficult. And yet it’s easy to design computers that can do integrals.

      To conclude this long digression, I think it’s an interesting testament to our evolution that we label certain tasks as requiring ‘intelligence’ and not others. But it says nothing of the computational difficulty of actually doing these tasks.

      In fact I’d hazard that there’s a rule of thumb here: the more intuitive some behavior is (and hence, the less most people would say it requires ‘intelligence’) the more difficult it is to get a computer to replicate the behavior.


  2. Hey Blair:

    A very clear explanation. Thanks.

    While I came to the same conclusions (though less precisely stated/understood) quite early in my research into G, something still nags at me:

    People quite often say, “wow, she’s really smart!” or “he’s not very bright,” and others seem to understand pretty clearly what they’re saying.

    Maybe this is due to the same error that gives rise to IQ tests. Maybe it’s a singularly Western idea. (Do other cultures say such things?) Maybe people never said or thought that way (in the West?) until the year X.

    And even if it is a human universal, is it a bias toward a particular set of skills/abilities?

    I’m not making any kind of claims here; just saying that I remain vexed, not sure how to think about it.




    • Hi Steve,

      I think your point echos my argument that everyday language is imprecise, and we’re okay with that.

      If I say “Michael Jordan is a great athlete”, everyone knows what I’m talking about and will probably agree. Similarly, if I say that “Albert Einstein was intelligent,” everyone will agree.

      But when you actually start to *measure* things, we realize that these words are actually very imprecise. Michael Jordan is a great basketball player, no doubt. But where does he rank in terms of over-all athleticism? Far more difficult to measure. The point is that in everyday conversation, we never actually quantify these things. But it’s in the quantification that all the problems start.

      On a side note, there’s a great talk by David Epstein on the reasons that athletes are getting better at their sports. Leaving aside technology, one big reasons is body specialization. A century ago, people thought there was such a thing as a ‘general athlete’. This athlete was the typical mesomorph — average height, muscular, not too big not too small. Turns out, however, that this ‘general athlete’ isn’t the best at any specific sport. The best swimmers have freakishly long bodies. The best marathoners have tiny calves (takes less energy to move). The best basketball players are freakishly tall. And so on. Here’s the video.

      In qualitative terms, all of these people are ‘great athletes’. On this, everyone would agree. But they’re all good at different things. When we *quantify* things, however, the implication is that everything becomes universally commensurable. Measuring ‘athleticism’ implies converting all forms of sport into one. Most people don’t realize how difficult this is. It means that there is only one correct way to rank athletes, just as there is only one correct way to rank mass. If there isn’t, that means your measure is subjective, and not really worth having.

      The same goes for intelligence. It’s something we can all speak about. And we can certainly recognize when people have disabilities and when they do not. And as Nassim Taleb argues, any cognitive test can make this differentiation. But IQ test supposedly go further, and are actually able to rank everyone’s intelligence. That’s an incredibly strong claim, because it implies that there is only one correct way to do it — one correct way to aggregate across all the desperate skills on earth.

      Now, I would have no problem with IQ tests if they were treated like every other test — an arbitrary measure of ability. Nothing wrong with that. It’s the mystique of IQ that is the problem — the belief that you’re identifying some universal property of a persons mind.


  3. Fascinating article Blair, and I’m in full agreement. Another interesting aspect of this story is the cultural differences that exist over the meaning of intelligence. IQ exams typically test skills and abilities that are highly prized in Western societies because of the economic demands imposed by capitalism (especially the obsession with quantification, hence why so many parents dream of their kids being math geniuses and entering “STEM” fields). A lot of anthropological research shows that other cultures value many different features of intelligence, including everything from emotional intelligence to being able to perform practical tasks (ie. “street smarts”).

    Anyway, I could say much more about this stuff, but Sternberg and his collaborators wrote a fascinating 2001 article about all of these issues, “The Predictive Value of IQ.” Here’s the pdf link:

    Click to access 09e4150d72ed6b102a000000.pdf


  4. Excellent! By chance, earlier this morning I listened for the first time to Ken Robinson’s famous TED talk, Do schools kill creativity?. https://www.ted.com/talks/sir_ken_robinson_do_schools_kill_creativity#t-1152791
    What I wondered was whether a key point in “intelligence” was learning to use tools. Using tools in your thinking allows you to achieve far more. Just as the role of tools in human evolution, or your ideas on energy use. On this hypothesis, a lot of education is Luddite. Concentrating on memorising lists, rather than developing the simple tools to use them effectively. The English language that you and I have used here is a good example. A complex set of completely arbitrary rules, where failure to memorise and follow them is regarded as evidence of a lack of intelligence.


    • Hi Michael,

      I too like that Ted talk. About tools, it’s interesting that education in the trades (where you use tools) is often considered inferior to education in academia. I’ve been on both sides of this divide. I studied music at the University of North Texas. While embedded in academia, most of the students and music professors didn’t care much for academic achievement. What mattered was your skill at your instrument. Many of the teachers had bachelor degrees only (and some, not even that). They were there because they played well. My drum teacher often remarked that graduate training in music was mostly about getting credentials to teach in a university. It wasn’t really about learning to play better. For that, you needed ‘real-world’ training. Go out and play music!

      When I went into grad-school, the very opposite was true. School achievement because the ultimate demarcation of skill. And as a result, academics often consider themselves ‘smarter’ than everyone else (although they take pains not to say so overtly).


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s