Groping in the Dark: The Untold Side of Research

There is an exciting side of blogging that I want to explore here. Blogging can tell the story behind research. This is something you don’t get in journals.

Most scientific articles obey a formula that goes like this:

  1. Here is the question I asked.
  2. Here is how I answered the question.
  3. Here is what I found.

This formula makes the article easy to read, since we know what to expect. But it gives the reader a misleading sense of how science is done. It makes science seem like a set of procedures — a production function. Scientists ask a question, investigate it, and publish the results. Easy peasy.

But this is not how science works. In the finished paper, the clean formula hides the (many) missteps along the way. It hides the confusion that dominates scientists’ lives. The real process of doing science is more like “groping in the dark”. Here are Bichler and Nitzan describing the quest for knowledge:

[S]cientists grope in the dark. They search for cues, hints and leads. They often stumble, falling flat on their faces. Rarely do they know exactly what they are looking for. But they go on. And then, suddenly, comes a revelation.

That moment of revelation is what drives us as scientists. It’s the rush of clarity after long periods of confusion. To paraphrase Karl Marx, revelation is the ‘opiate of the scientist’.

What’s interesting is that the moment of revelation is not really about answering our questions. No, the revelation is usually about determining the right question to ask . Here’s Charles Darwin commenting on his breakthrough in explaining evolution:

Looking back, I think it was more difficult to see what the problems were than to solve them.

Here is Albert Einstein expressing the same sentiment:

The formulation of a problem is often more essential than its solution, which may be merely a matter of mathematical or experimental skill. To raise new questions, new possibilities, to regard old questions from a new angle, requires creative imagination and marks real advance in science.

Although my research pales in comparison to Darwin’s or Einstein’s, my experience mirrors their own. I have found that asking the right question is the hardest part of science. There is no formula for how to do it. It’s a creative process that remains a mystery to most of us. We’re just glad when we stumble on the right question.

With this in mind, I’d like to tell the story of how I stumbled on the relation between energy use and institution size. The finished research is here . Like most scientists, I wrote the paper to make it look like I knew what I was doing all along. But I assure you that I did not. Behind the published results was a sea of confusion.

The growth of large corporations is related to energy use

In the last blog post, I discussed the evidence that first got me interested in hierarchy. For some reason, the employment share of large corporations increases with energy use per capita.

For a long time, I was stuck on this correlation. My focus was technical. I wanted to get more data that related corporate concentration to energy use. At the time, this was a daunting task because it challenged my analytic abilities.

I began my research career as a committed Excel user. But the funny thing about Excel is that it tends to blow up when you give it a few million lines of data! So answering the questions that I was asking involved abandoning Excel. I learned to use R , which is a programming language that specializes in statistics and data analysis. I’ll write more about my journey with R in future posts. As you’ll see, I’m a bit of an R evangelist.

Back to my research, I was focused on how corporate concentration relates to energy use. I looked at US data and found a pretty tight relation. As energy use increases, the employment share of the top 200 corporations does the same:

United States (Source)

I then went back to the international data. Using my new skills in R, I looked at trends both between and within countries. Here is the result. The figure below shows the employment share of the top 25 corporations (in each country) plotted against energy use per capita. Each squiggly line shows the path through time of a specific country. As energy use increases, the employment share of large firms increases as well.

Trends Between and Within Countries (Source)

For the longest time, I was stuck on this relation. By ‘stuck’, I mean that I was thinking this: “These are interesting results! Now what do I do with them?”

The breakthrough didn’t come with new data. It came with a new question. One day something dawned on me. The people who moved to large firms had to come from somewhere! Where were they coming from ? Small firms? Medium-sized firms?

As soon as this question was posed, it was obvious that I was missing another side of the story. What happened to small firms as energy use increased? After this question was posed, it was simple to answer. We look at rates of self employment. People who are self employed belong to a firm of size 1 — pretty much the smallest you can get. So when we study self-employment rates, we are looking at the employment share of very small firms.

So what happens to self-employment rates as energy use increases? It turns out that they plummet. Here is a century of US data. To show the correlation with energy, I’ve plotted self employment on a reverse scale. So going ‘up’ on the graph corresponds to a decline in the self-employment rate. As US energy use increases, self employment rates decline in lock step.

United States (Source)

The same is true when we look at international data. Self employment rates plummet as energy use increases. Here’s the trend between countries. Each squiggly line is the path through time of a single country.

Trends Between and Within Countries (Source)

This discovery was moment of revelation for me — a dose of the scientific opiate that I crave. But it soon led to more confusion.

Something doesn’t add up here

Let’s review the results so far. As energy use increases, the employment share of large firms increases. So large firms are getting larger. At the same time, the self employment rate declines. So there are fewer and fewer small firms. This evidence all points to one conclusion: as energy use increases, firms tend to get bigger (on average).

This conclusion was ‘in the data’. But for a long time I didn’t want to see it. In fact, I thought this conclusion was wrong. Why? Because it contradicted results from Jonathan Nitzan and Shimshon Bichler. Looking at the US, Nitzan and Bichler found that the average size of US corporations decreased over the 20th century. Here is their figure from page 336 of Capital as Power:


Something doesn’t add up. My results suggested that firm size should have increased over the 20th century. But Nitzan and Bichler clearly found the opposite trend. So what gives? Were Nitzan and Bichler wrong? I didn’t think so — their analysis was impeccable. Was I wrong? More likely, but I could not find a problem in my methods.

Again, the resolution to this confusion came by clarifying the question. You see, Nitzan and Bichler were asking a different question than I was. They were asking — how has the average size of corporations changed with time. But I was asking — how has the average size of all businesses changed with time. There is a subtle difference.

A corporation is a particular legal form for a business. But it is not the only form. Many small businesses are unincorporated. This was especially true a century ago when incorporation was a new concept.

Over the 20th century, incorporation became increasingly popular. So we have two contrasting trends. On the one hand, rates of self employment declined. On the other hand, an increasing proportion of self employed people chose to incorporate their activity. As a result, if you look at average corporation size, you see a decline. More and more self-employed people incorporated, and this dragged down the average size of the corporation. This explains Nitzan and Bichler’s results. But if you look at the average size of all businesses, we should see an increase. Self employment rates declined and large firms got larger. So the average size of all businesses must be increasing. That would explain the results that were in my data.

Having clarified the question I was asking, finding the answer was simple. I looked at the average size of all businesses, both incorporated and unincorporated. I found that the average size of US firms increased over the 20th century. And it did so in lock step with energy use per capita:

United States (Source)

The same is true between countries. Here is a plot of international data. Each dot is a country, with vertical bars showing uncertainty in the data. As energy use per capita increases, so does average firm size.

Trends Between Countries (Source)

What about those other institutions?

The relation between firm size and energy use was intriguing. It was so intriguing that I actually forgot that other institutions existed. You know those little things called governments? They’re kind of important. Again, revelation came when I realized I wasn’t asking an important question: is government size related to energy use?

Well, it turns out that it is. As energy use increases, governments tend to get larger. Here is US data over the last century.

United States (Source)

Look at how the employment share of US government grows until the 1970s. Then it tanks as the 1970s energy crisis takes hold. If you lived through this era (I did not) you likely remember how government policies changed. Politicians went from preaching exuberant spending to calling for restraint and cutbacks. What you probably didn’t realize is that there was something deeper going on. The way we behave and the policies we favor are somehow connected to how we consume energy. We’re only beginning to understand how this works.

Again, the US is not alone here. Across all countries, government size tends to grow with energy use. The squiggly lines in the figure below represent the path through time of different countries.

Trends Between and Within Countries (Source)

What’s interesting is that not all countries are moving up and to the right on this figure. Some are moving down and to the left. This means their energy use is declining, as is the government share of employment. The most conspicuous example of this trend is the collapse of the Soviet Union. After the collapse, energy use and government size dramatically declined in former Soviet states. I’ll talk more about this in a future post.

This isn’t the end of the research story, but it’s enough for one post. I hope you’ve enjoyed this journey into the underside of research. I’m sure other scientists who are ‘groping in the dark’ can relate to my confusion. And for non-scientists, I hope it makes you appreciate how science works. The messy process of doing science is very different from how it appears in scientific journals.

Support this blog

Economics from the Top Down is where I share my ideas for how to create a better economics. If you liked this post, please consider becoming a patron. You’ll help me continue my research, and continue to share it with readers like you.


Stay updated

Sign up to get email updates from this blog.

This work is licensed under a Creative Commons Attribution 4.0 License. You can use/share it anyway you want, provided you attribute it to me (Blair Fix) and link to Economics from the Top Down.


  1. Some quick thoughts on the graphs.

    1, Energy usage per capita in your graphs for a year is actually a measure of power per capita. Because the definition of Power [Watts,GW]= energy [Juels,GJ]/ time [1 year]. People might figure that out some wont.
    a. Now given that, an electrical networks have limits or average power generations and distribution capacities. Probably the same for fuel distrebutions except ships can change countries.
    b. Cumulative production would be related to energy and individual production rate (of a factory) would be proportional to power given the same or simmiler efficiencies.

    2. You also might compare levels (cumulative sum of rates, integral in calculus) wich have less noise or fluxuations. Cumulative GDP would be some what a measure of wealth. I say this because econoimics puts more emphais on rates and rates of rates and much less emphasis on levels and accumulations. Whos rich who is poor. Firm size!
    a. So, if firm size is proportional to firm accumulation maybe quite related to integral or accumulation of GDP which would be related to Energy = accumulated power usage. Sum of your “energy/year” and gdp*(corprate share) over the years. Revenue and net revenue=profit adds to the capital account in accounting! And companies reivest from profit and loans to form productive capital equipment.
    b. You could use more than 2 dimientional plots to mix and max rates, levels, and accumulations.

    3. Have you plotted directions of the paths? Paths shows the actuall individual dynamics. The points would have the dates. Gnuplot can plot arrows showing the path direction using vector capability. An other way would be to color by decade or 5 years. Or, the date lables could be inserted for every 10 or 5th year.

    4. I see you use log plots which often show relationships.

    5. You can also brake down the power or energy usage by application of that energy. If that can not be done the type of fuel might give an indication less accurately. Gasoline is more for commuting (to work) and Deisel mor for moving product and production logistics.

    6. Smaller firms probably require much less of a commute per person. The oposite for larger ones in a different neighborhood and more diverse worker living locations.

    7. The power capacity could easily casued some what by firm size in addition to the other direction. The more money collected can effect power capacity of the power companies. A bi directional feed back. Of course electrification of countries has shown to be a good policy.

  2. I thought of some thing I have experienced might be related to your data.

    Some times plotting two exponentially growing quantities as a function of time on an log log x-y plot looks like they are correlated in time. I have done this and thought temporarraly that I have discovered some thing. But, what is actually going on is that they share a common variable time, t.

    Yet, do not dismiss with out being sure.

    Or, it could be two exponentialy growing things based on some other thing in common, a comon variable in the two functions. Maybe firm size and energy usage are effected by exponentially growing population? Energy usage is definately a function of population. (And, of course the common exponentially growing population share the common variable time. ) Firm size is also related to population. A nonlinear or exponential relation with population size could be involved in both, also.

    I think, this can be frequent in economics because many quantities are exponentially growing most due to exponental growth of population and the other to exponential growth of prices due to dillution of the currency and new credit money produced also effecting prices. Also, interest paid is an exponential growth inherent in economics.

    Any thing that has a near constant percent growth is an exponential growing thing or quantity in time. And, many organizations and firms The differential equation for this is Y´(t)= C*Y(t) or d/dt Y(t)=c*Y(t), where c is constant or is c(t) wich is nearly constant in the reial world. The solution to the diffential equatino is Y(t)=Y(t_initial)*exp(c*t). An example of this would be a bank account paying 20% interest at a continous rate with out withdrawing money, M(t)=100*exp(.2*t), where t is years and the inital deposit is $100. Econoimcs is full of this sort of thing, GDP of the US much of the time, population growth, price rises, energy(power) usage, labor suply, labor, etc., etc.

    Yet in economics if the fit is really good it might be dismissed as abnormal because much data is much more noisy. But, I have found when the fit is really good it is often do to an equation. Some times disapointingly I learn, an equation that is due to an economic definiton. For example, an equation with the defined quantity labor productivity, a defined efficiency of labor,, p.1 formula 1. Restated GDP=labor productivy*L

    • Your plots are percent on the vertical and logrithmic on the horizontal.

      The above precautions could apply for the following reasons: They might be related to the above because a percent growth is related to exponential ploted logrithmically. Or, if there is a nonlinear relationship.

      Or, most likely you have a real relationship. You might find a better relation, a third common variable, or an understanding from an expanded point of view.

  3. “but I could not find a problem in my methods.”

    There is a systematic way to detect and remove this kind of problem, every time!

    It is the consistant use of units in formulas, the unit coversion method (lime method). That is sufficient. But, there is also dimentional analysis. This seems like more work initially but is actually less work because it is nessesary to avoid lots of problems that cause more work.

    Economic notation seems rather bereft of units. This is one of the reasons for problems in economics. If units are not specified in measurements and calculations, physical calculations mean absolutly nothing. If a graph”s axis is not labeled with units it means very little, its wrong.

    Every good comparison shoud have the same units and dimentions. Both, sides of every equation should have the same units and dimentions. You might have to convert units to an other (usually of the same dimention) to match what your are comparing (inches to meters).

    Check out secondary and below science books where quantities are calculated. Work some problems to learn it.

    Example of a problem in economics is, Q, in the mythical supply demand curve. The dimention is not a quantity! But flow of a quantiy, change in quantity over time. Example units could be: 8 [Giga tones of iorn]/[year] or 8 Giga tones of iorn/ year. It is the addition (supply) or subtraction from inventory over a period of time.

    Converting Units With Conversion Factors
    Interpreting units in formulas | Mathematics I | High School Math | Khan Academy

    • Correctionion:

      I meant to say, (line method), but there might be other names for it today.

      I meant to say in the 6th paragraph, Check out secondary scienc books and below where quantities are calculated. Work some problems to learn it.

  4. More on the dimentional analysis solution:

    “Having the same units on both sides of an equation [or comparison] does not ensure that the equation is correct, but having different units on the two sides (when expressed in terms of base units) of an equation [or comparison] implies that the equation [or comparison] is wrong. ”

    Quotes to motivate economists to practice this:

    “In economics, one distinguishes between stocks and flows: a stock has units of “units” (say, widgets or dollars), while a flow is a derivative of a stock, and has units of “units/time” (say, dollars/year). ”

    “In some contexts, dimensional quantities are expressed as dimensionless quantities or percentages by omitting some dimensions. For example, debt-to-GDP ratios are generally expressed as percentages: total debt outstanding (dimension of currency) divided by annual GDP (dimension of currency) – but one may argue that in comparing a stock to a flow, annual GDP should have dimensions of currency/time (dollars/year, for instance), and thus Debt-to-GDP should have units of years, which indicates that Debt-to-GDP is the number of years needed for a constant GDP to pay the debt, if all GDP is spent on the debt and the debt is otherwise unchanged. ”

    How to Convert Units – Unit Conversion Made Easy

Leave a Reply