Have you heard of the ‘DunningKruger effect’? It’s the (apparent) tendency for unskilled people to overestimate their competence. Discovered in 1999 by psychologists Justin Kruger and David Dunning, the effect has since become famous.
And you can see why.
It’s the kind of idea that is too juicy to not be true. Everyone ‘knows’ that idiots tend to be unaware of their own idiocy. Or as John Cleese puts it:
If you’re very very stupid, how can you possibly realize that you’re very very stupid?
Of course, psychologists have been careful to make sure that the evidence replicates. But sure enough, every time you look for it, the DunningKruger effect leaps out of the data. So it would seem that everything’s on sound footing.
Except there’s a problem.
The DunningKruger effect also emerges from data in which it shouldn’t. For instance, if you carefully craft random data so that it does not contain a DunningKruger effect, you will still find the effect. The reason turns out to be embarrassingly simple: the DunningKruger effect has nothing to do with human psychology.^{1} It is a statistical artifact — a stunning example of autocorrelation.
What is autocorrelation?
Autocorrelation occurs when you correlate a variable with itself. For instance, if I measure the height of 10 people, I’ll find that each person’s height correlates perfectly with itself. If this sounds like circular reasoning, that’s because it is. Autocorrelation is the statistical equivalent of stating that 5 = 5.
When framed this way, the idea of autocorrelation sounds absurd. No competent scientist would correlate a variable with itself. And that’s true for the pure form of autocorrelation. But what if a variable gets mixed into both sides of an equation, where it is forgotten? In that case, autocorrelation is more difficult to spot.
Here’s an example. Suppose I am working with two variables, x and y. I find that these variables are completely uncorrelated, as shown in the left panel of Figure 1. So far so good.
Next, I start to play with the data. After a bit of manipulation, I come up with a quantity that I call z. I save my work and forget about it. Months later, my colleague revisits my dataset and discovers that z strongly correlates with x (Figure 1, right). We’ve discovered something interesting!
Actually, we’ve discovered autocorrelation. You see, unbeknownst to my colleague, I’ve defined the variable z to be the sum of x + y. As a result, when we correlate z with x, we are actually correlating x with itself. (The variable y comes along for the ride, providing statistical noise.) That’s how autocorrelation happens — forgetting that you’ve got the same variable on both sides of a correlation.
The DunningKruger effect
Now that you understand autocorrelation, let’s talk about the DunningKruger effect. Much like the example in Figure 1, the DunningKruger effect amounts to autocorrelation. But instead of lurking within a relabeled variable, the DunningKruger autocorrelation hides beneath a deceptive chart.^{2}
Let’s have a look.
In 1999, Dunning and Kruger reported the results of a simple experiment. They got a bunch of people to complete a skills test. (Actually, Dunning and Kruger used several tests, but that’s irrelevant for my discussion.) Then they asked each person to assess their own ability. What Dunning and Kruger (thought they) found was that the people who did poorly on the skills test also tended to overestimate their ability. That’s the ‘DunningKruger effect’.
Dunning and Kruger visualized their results as shown in Figure 2. It’s a simple chart that draws the eye to the difference between two curves. On the horizontal axis, Dunning and Kruger have placed people into four groups (quartiles) according to their test scores. In the plot, the two lines show the results within each group. The grey line indicates people’s average results on the skills test. The black line indicates their average ‘perceived ability’. Clearly, people who scored poorly on the skills test are overconfident in their abilities. (Or so it appears.)
On its own, the DunningKruger chart seems convincing. Add in the fact that Dunning and Kruger are excellent writers, and you have the recipe for a hit paper. On that note, I recommend that you read their article, because it reminds us that good rhetoric is not the same as good science.
Deconstructing DunningKruger
Now that you’ve seen the DunningKruger chart, let’s show how it hides autocorrelation. To make things clear, I’ll annotate the chart as we go.
We’ll start with the horizontal axis. In the DunningKruger chart, the horizontal axis is ‘categorical’, meaning it shows ‘categories’ rather than numerical values. Of course, there’s nothing wrong with plotting categories. But in this case, the categories are actually numerical. Dunning and Kruger take people’s test scores and place them into 4 ranked groups. (Statisticians call these groups ‘quartiles’.)
What this ranking means is that the horizontal axis effectively plots test score. Let’s call this score x.
Next, let’s look at the vertical axis, which is marked ‘percentile’. What this means is that instead of plotting actual test scores, Dunning and Kruger plot the score’s ranking on a 100point scale.^{3}
Now let’s look at the curves. The line labeled ‘actual test score’ plots the average percentile of each quartile’s test score (a mouthful, I know). Things seems fine, until we realize that Dunning and Kruger are essentially plotting test score (x) against itself.^{4} Noticing this fact, let’s relabel the grey line. It effectively plots x vs. x.
Moving on, let’s look at the line labeled ‘perceived ability’. This line measures the average percentile for each group’s self assessment. Let’s call this selfassessment y. Recalling that we’ve labeled ‘actual test score’ as x, we see that the black line plots y vs. x.
So far, nothing jumps out as obviously wrong. Yes, it’s a bit weird to plot x vs. x. But Dunning and Kruger are not claiming that this line alone is important. What’s important is the difference between the two lines (‘perceived ability’ vs. ‘actual test score’). It’s in this difference that the autocorrelation appears.
In mathematical terms, a ‘difference’ means ‘subtract’. So by showing us two diverging lines, Dunning and Kruger are (implicitly) asking us to subtract one from the other: take ‘perceived ability’ and subtract ‘actual test score’. In my notation, that corresponds to y – x.
Subtracting y – x seems fine, until we realize that we’re supposed to interpret this difference as a function of the horizontal axis. But the horizontal axis plots test score x. So we are (implicitly) asked to compare y – x to x:
Do you see the problem? We’re comparing x with the negative version of itself. That is textbook autocorrelation. It means that we can throw random numbers into x and y — numbers which could not possibly contain the DunningKruger effect — and yet out the other end, the effect will still emerge.
Replicating DunningKruger
To be honest, I’m not particularly convinced by the analytic arguments above. It’s only by using real data that I can understand the problem with the DunningKruger effect. So let’s have a look at some real numbers.
Suppose we are psychologists who get a big grant to replicate the DunningKruger experiment. We recruit 1000 people, give them each a skills test, and ask them to report a selfassessment. When the results are in, we have a look at the data.
It doesn’t look good.
When we plot individuals’ test score against their self assessment, the data appear completely random. Figure 7 shows the pattern. It seems that people of all abilities are equally terrible at predicting their skill. There is no hint of a DunningKruger effect.
After looking at our raw data, we’re worried that we did something wrong. Many other researchers have replicated the DunningKruger effect. Did we make a mistake in our experiment?
Unfortunately, we can’t collect more data. (We’ve run out of money.) But we can play with the analysis. A colleague suggests that instead of plotting the raw data, we calculate each person’s ‘selfassessment error’. This error is the difference between a person’s self assessment and their test score. Perhaps this assessment error relates to actual test score?
We run the numbers and, to our amazement, find an enormous effect. Figure 8 shows the results. It seems that unskilled people are massively overconfident, while skilled people are overly modest.
(Our lab techs points out that the correlation is surprisingly tight, almost as if the numbers were picked by hand. But we push this observation out of mind and forge ahead.)
Buoyed by our success in Figure 8, we decide that the results may not be ‘bad’ after all. So we throw the data into the DunningKruger chart to see what happens. We find that despite our misgivings about the data, the DunningKruger effect was there all along. In fact, as Figure 9 shows, our effect is even bigger than the original (from Figure 2).
Things fall apart
Pleased with our successful replication, we start to write up our results. Then things fall apart. Riddled with guilt, our data curator comes clean: he lost the data from our experiment and, in a fit of panic, replaced it with random numbers. Our results, he confides, are based on statistical noise.
Devastated, we return to our data to make sense of what went wrong. If we have been working with random numbers, how could we possibly have replicated the DunningKruger effect? To figure out what happened, we drop the pretense that we’re working with psychological data. We relabel our charts in terms of abstract variables x and y. By doing so, we discover that our apparent ‘effect’ is actually autocorrelation.
Figure 10 breaks it down. Our dataset is comprised of statistical noise — two random variables, x and y, that are completely unrelated (Figure 10A). When we calculated the ‘selfassessment error’, we took the difference between y and x. Unsurprisingly, we find that this difference correlates with x (Figure 10B). But that’s because x is autocorrelating with itself. Finally, we break down the DunningKruger chart and realize that it too is based on autocorrelation (Figure 10C). It asks us to interpret the difference between y and x as a function of x. It’s the autocorrelation from panel B, wrapped in a more deceptive veneer.
The point of this story is to illustrate that the DunningKruger effect has nothing to do with human psychology. It is a statistical artifact — an example of autocorrelation hiding in plain sight.
What’s interesting is how long it took for researchers to realize the flaw in Dunning and Kruger’s analysis. Dunning and Kruger published their results in 1999. But it took until 2016 for the mistake to be fully understood. To my knowledge, Edward Nuhfer and colleagues were the first to exhaustively debunk the DunningKruger effect. (See their joint papers in 2016 and 2017.) In 2020, Gilles Gignac and Marcin Zajenkowski published a similar critique.
Once you read these critiques, it becomes painfully obvious that the DunningKruger effect is a statistical artifact. But to date, very few people know this fact. Collectively, the three critique papers have about 90 times fewer citations than the original DunningKruger article.^{5} So it appears that most scientists still think that the DunningKruger effect is a robust aspect of human psychology.^{6}
No sign of Dunning Kruger
The problem with the DunningKruger chart is that it violates a fundamental principle in statistics. If you’re going to correlate two sets of data, they must be measured independently. In the DunningKruger chart, this principle gets violated. The chart mixes test score into both axes, giving rise to autocorrelation.
Realizing this mistake, Edward Nuhfer and colleagues asked an interesting question: what happens to the DunningKruger effect if it is measured in a way that is statistically valid? According to Nuhfer’s evidence, the answer is that the effect disappears.
Figure 11 shows their results. What’s important here is that people’s ‘skill’ is measured independently from their test performance and self assessment. To measure ‘skill’, Nuhfer groups individuals by their education level, shown on the horizontal axis. The vertical axis then plots the error in people’s self assessment. Each point represents an individual.
If the DunningKruger effect were present, it would show up in Figure 11 as a downward trend in the data (similar to the trend in Figure 7). Such a trend would indicate that unskilled people overestimate their ability, and that this overestimate decreases with skill. Looking at Figure 11, there is no hint of a trend. Instead, the average assessment error (indicated by the green bubbles) hovers around zero. In other words, assessment bias is trivially small.
Although there is no hint of a DunningKruger effect, Figure 11 does show an interesting pattern. Moving from left to right, the spread in selfassessment error tends to decrease with more education. In other words, professors are generally better at assessing their ability than are freshmen. That makes sense. Notice, though, that this increasing accuracy is different than the DunningKruger effect, which is about systemic bias in the average assessment. No such bias exists in Nuhfer’s data.
Unskilled and unaware of it
Mistakes happen. So in that sense, we should not fault Dunning and Kruger for having erred. However, there is a delightful irony to the circumstances of their blunder. Here are two Ivy League professors^{7} arguing that unskilled people have a ‘dual burden’: not only are unskilled people ‘incompetent’ … they are unaware of their own incompetence.
The irony is that the situation is actually reversed. In their seminal paper, Dunning and Kruger are the ones broadcasting their (statistical) incompetence by conflating autocorrelation for a psychological effect. In this light, the paper’s title may still be appropriate. It’s just that it was the authors (not the test subjects) who were ‘unskilled and unaware of it’.
Support this blog
Economics from the Top Down is where I share my ideas for how to create a better economics. If you liked this post, consider becoming a patron. You’ll help me continue my research, and continue to share it with readers like you.
Stay updated
Sign up to get email updates from this blog.
This work is licensed under a Creative Commons Attribution 4.0 License. You can use/share it anyway you want, provided you attribute it to me (Blair Fix) and link to Economics from the Top Down.
Notes
Cover image: Nevit Dilmen, altered.
 The DunningKruger effect tells us nothing about the people it purports to measure. But it does tell us about the psychology of social scientists, who apparently struggle with statistics.↩︎

It seems clear that Dunning and Kruger didn’t mean to be deceptive. Instead, it appears that they fooled themselves (and many others). On that note, I’m ashamed to say that I read Dunning and Kruger’s paper a few years ago and didn’t spot anything wrong. It was only after reading Jonathan Jarry’s blog post that I clued in. That’s embarrassing, because a major theme of this blog has been me pointing out how economists appeal to autocorrelation when they test their theories of value. (Examples here, here, here, here, and here.) I take solace in the fact that many scientists were similarly hoodwinked by the DunningKruger chart.↩︎

The conversion to percentiles introduces a second bias (in addition to the problem of autocorrelation). By definition, percentiles have a floor (0) and a ceiling (100), and are uniformly distributed between these bounds. If you are close the floor, it is impossible for you to underestimate your rank. Therefore, the ‘unskilled’ will appear overconfident. And if you are close to the ceiling, you cannot overestimate your rank. Therefore, the ‘skilled’ will appear too modest. See Nuhfer et al (2016) for more details.↩︎

In technical terms, Dunning and Kruger are plotting two different forms of ranking against each other — testscore ‘percentile’ against testscore ‘quartile’. What is not obvious is that this type of plot is data independent. By definition, each quartile contains 25 percentiles whose average corresponds to the midpoint of the quartile. The consequence of this truism is that the line labeled ‘actual test score’ tells us (paradoxically) nothing about people’s actual test score.↩︎

According to Google scholar, the three critique papers (Nuhfer 2016, 2017 and Gignac and Zajenkowski 2020) have 88 citations collectively. In contrast, Dunning and Kruger (1999) has 7893 citations.↩︎

The slow dissemination of ‘debunkings’ is a common problem in science. Even when the original (flawed) papers are retracted, they often continue to accumulate citations. And then there’s the fact that critique papers are rarely published in the same journal that hosted the original paper. So a flawed article in Nature is likely to be debunked in a more obscure journal. This asymmetry is partially why I’m writing about the DunningKruger effect here. I think the critique raised by Nuhfer et al. (and Gignac and Zajenkowski) deserves to be well known.↩︎

When Dunning and Kruger published their 1999 paper, they both worked at Cornell University.↩︎
Further reading
Gignac, G. E., & Zajenkowski, M. (2020). The DunningKruger effect is (mostly) a statistical artefact: Valid approaches to testing the hypothesis with individual differences data. Intelligence, 80, 101449.
Kruger, J., & Dunning, D. (1999). Unskilled and unaware of it: How difficulties in recognizing one’s own incompetence lead to inflated selfassessments. Journal of Personality and Social Psychology, 77(6), 1121.
Nuhfer, E., Cogan, C., Fleisher, S., Gaze, E., & Wirth, K. (2016). Random number simulations reveal how random noise affects the measurements and graphical portrayals of selfassessed competency. Numeracy: Advancing Education in Quantitative Literacy, 9(1).
Nuhfer, E., Fleisher, S., Cogan, C., Wirth, K., & Gaze, E. (2017). How random noise and a graphical convention subverted behavioral scientists’ explanations of selfassessment data: Numeracy underlies better alternatives. Numeracy: Advancing Education in Quantitative Literacy, 10(1).
Isn’t it precisely the surprising thing that selfassessment contains so little information about task performance? I certainly found it surprising to find out how bad I am at telling apart what I know and what I don’t know. A handcrafted dataset that doesn’t contain the DK effect would be one where people perfectly know how good they are, i.e. the noise y would just be always 0. So, the implicit assumption that “totally random” means “no DK effect” seems just wrong and thus all conclusions based on that. Interesting to learn that the noise decreases with education level, though! 🙂
[…] The DunningKruger effect is autocorrelation 465 by ljosifov  164 comments . […]
[…] The DunningKruger effect is autocorrelation 467 by ljosifov  166 comments on Hacker News. […]
If you get statistical noise in response to asking people how tall they think they are, then that noise means that tall people tend to underestimate their height while short people tend to overestimate their height.
DKE is the presence of statistical noise where one would not expect statistical noise.
Plus the chart doesn’t measure X against X, it measures (for example) perceived height vs actual height. Those are different things.
And the revised test doesn’t measure skill but something like social status. It isn’t measuring people’s expected height against themselves as short vs tall, but as a function of something g separate than their height.
So for example the revised test just shows that people of different ages, where age is assumed to correlate with height, all similarly misguess their height. It doesn’t answer whether or not short people overestimate or underestimate their height. It answers wether or not old people overestimate or underestimate their height, which isn’t the question.
[…] Read More […]
“Here are two Ivy League professors…”
While he may have worked at the university (as virtually every graduate student does), I can find no evidence that Kruger was ever a professor at Cornell. This paper was published the same year he graduated, so that would be remarkable if true.
[…] Read More […]
But sir, your article clearly shows that DKE is true. If, for example, people are good at estimating their level – you will get correlation on your figure 7, going as a line from 0 to 100. If you got statistical noice – it means people estimate their level incorrectly. For any test result you can take dots on a vertical axis, and dots ABOVE the diagonal line – are people that overestimate their level, and dots BELOW this line – are people that underestimate themself. Your figure 7 clearly shows that the higher the result – the less people overestimate themself, and the more people underestimate themself. So, if DKE is true – results will be the same as on your figure 7, and if they are NOT true – there will be a correlation. You are extremely dumb, my sir.
Some of this is right, but for the wrong reasons. Agreed, the bothering with quartiles and mixing it with percentiles in the first place is what seems absurd – Evidently the only reason to do it was to get the perfectly straight 45 degree line (I.e. Yes, the lowest scores broken into percents are also the lowest percentiles in a linear way – Thank you..?)
They could’ve just plotted actual performance by perceived performance. Just ask the people after they take the test how well they believe they did on the test. Easy peasy. You might not get a nice clean trend, but you’ll definitely get something. And I expect there will be less delta at the high score range between perception and reality. Though there probably wouldn’t be much delta at the very low end either. Just off the cuff I suspect the largest deltas would hover somewhere around the middle.
Here is a graphical interpretation of how DK results can be obtained directly from null hypotheses that assume individuals are equally accurate in estimating performance at all levels of actual performance.
1) Extend the “Perceived Ability” line noting intercepts at about (0,55) and (100, 75).
3) Null Hypothesis 1: Assume everyone believes they got 25% of answers actually right wrong (25% type 1 error rate). That generates a line from (0,0) to (100,75).
2) Null Hypothesis 2: Assume everyone believes they got 55% of answers actually wrong right (55% type 2 error rate).). That generates a line from (0,55) to (100,100).
4) Assume both Hypothesis 1 and 2 are correct. This generates a straight line (0,55) to (100,75). The slope and intercepts of “Perceived Ability” are a function of type 1 and type 2 error rates. Any systemic effect of performance on perceived performance must be coded in the residuals, which are small.
[…] The DunningKruger Effect is Autocorrelation – Economics from the Top Down (economicsfromthetopdown.com) […]