Many people have commented on the fact that, after adjusting for chained CPI, median income has not risen significantly since the 1970's. Tyler Cowen points to this as evidence for his theory of the "Great Stagnation", which purports that the economy has grown more slowly during the latter parts of the 20'th century than during the former.

It's important to understand what the figures in the above graph mean. At any point in time, the income distribution of the US population was measured. Then percentiles were calculated, and plotted on the above graph. This is called a cross-sectional study.

A flaw with cross sectional studies is that trends they measure may not exist at the level of individuals - instead, a trend in a cross sectional study may be the result of a change in the composition of the sample.

A simple example: suppose you want to measure whether red apples turn green. You might fill a bucket with 90% red apples and 10% green apples. Later on, you might observe the bucket is now comprised of 15% green apples. Is this evidence that red apples turn green?

Not necessarily - someone might have dropped a few extra green apples into the bucket in between your observations. Since you don't track any individual apple in a cross sectional study, you have no way to know for sure.

In contrast to a cross-sectional study, a longitudinal study repeatedly measures the sample over time. I.e., a longitudinal study would label 100 apples and look at the color of each individual apple before and after (e.g., apple #1 started red and ended red, repeat for apple #2, etc). This differs from a cross sectional study because no new apples are added to the batch, and none are removed.

The hypothesis I'm proposing in this blogpost is this: the "Great Stagnation" is an artifact of cross sectional income measures, and is mainly a statistical artifact caused by immigration.

In 2008 the Brookings Institute did a study, based on PSID data, which attempts to measure income dynamics longitudinally rather than cross sectionally. In particular, they take as a sample a set of people who were American children in 1968 and compare their incomes to that of their parents at similar ages. The result?

Median income rose 29%.

The increase in income is not even across the population - it is concentrated at the bottom. The bottom quintile roughly doubled their income, while the top quintile experienced no income growth at all (relative to their parents at the same age).

The longitudinal data paints a very different picture than the cross sectional data. How can we explain the difference? The best way is to try to figure out which group of people are included in the cross sectional data but excluded by the longitudinal data.

The answer to this question is clear - anyone who's parents did not live in the US in 1968, i.e. immigrants.

Immigrants tend to occupy the lowest rungs of the economic ladder in the US, and these rungs tend to be lower than the average of the parents of US natives. This has caused income averages (both mean and median) to stagnate even though neither immigrants [1] nor natives has experienced income stagnation.

Thus, I propose the hypothesis that immigrants and Simpson's Paradox are the cause of the income stagnation in the US.

[1] I have no data to back this up, but it seems intuitively obvious based on the fact that most countries which provide a sizable number of immigrants to the US (Mexico, China, the Phillipines and India) are considerably poorer than the US.