# Income Inequality, Political Polarization, and Big Data

## Jun 18, 2014

A fascinating piece in a recent Washington Post draws a striking comparison between income inequality and political polarization. It suggests a high correlation between greater inequality and more polarization. That’s all well and good, but it’s all meaningless if you don’t hypothesize a causal relationship between the two. And the emergence of big data enables us to go off on more and more efforts that are frequently wild-goose chases.

Here’s a graph from the story, plotting inequality on the vertical axis and polarization on the horizontal axis, for every year from 1917 through 2011:

The graph’s added wrinkle – drawing a line between each adjacent year, so you can see how the metric between the two relationships has evolved over the years – is worth remembering, as a clever way of presenting three dimensions of data on a two-dimensional plot. There is certainly visual evidence of a correlation between inequality and polarization, and the lines connecting the points show how the relationship between polarization and income disparity has shifted over time.

But there the value of this analysis, at least for me, ends. The accompanying article seems to be headed toward a number of unsupported, and dubious, conclusions, like the author’s assertion that “tax policy has almost certainly played a role in driving up inequality” in the last 35 years, which is factually incorrect. Most significant – though qualified by a buried, mild reminder that “all this tells us nothing about causality” – is the implication that polarization causes inequality.

The strongest visual clue of that editorial inclination is in the basic layout of the above graph: polarization is scaled along the horizontal axis (or “X-axis”) and income inequality along the vertical (or “Y-axis”). It is standard practice to graph the independent, or causal, variable along the X-axis and the dependent variable along the Y-axis. (Or, as we learned in algebra I, y = ¦(x).)

There’s no question that income inequality is a serious problem and getting worse. But is it caused by polarized politics? If anything, my intuition favors the converse: that income inequality is exactly the kind of large, complex, emotional issue with debatable solutions that causes, rather than is caused by, political polarization.

What does all this have to do with big data? Well, big data software enables us only to identify correlations between variables. It doesn’t tell us whether a causal relationship exists, let alone which direction the causal relationship runs. But we’ll certainly be able to find more of these correlations.

So whether we’re talking business, consumer behavior, public policy, or sports, be prepared for more of these partially-baked analyses.

Here’s a graph from the story, plotting inequality on the vertical axis and polarization on the horizontal axis, for every year from 1917 through 2011:

The graph’s added wrinkle – drawing a line between each adjacent year, so you can see how the metric between the two relationships has evolved over the years – is worth remembering, as a clever way of presenting three dimensions of data on a two-dimensional plot. There is certainly visual evidence of a correlation between inequality and polarization, and the lines connecting the points show how the relationship between polarization and income disparity has shifted over time.

But there the value of this analysis, at least for me, ends. The accompanying article seems to be headed toward a number of unsupported, and dubious, conclusions, like the author’s assertion that “tax policy has almost certainly played a role in driving up inequality” in the last 35 years, which is factually incorrect. Most significant – though qualified by a buried, mild reminder that “all this tells us nothing about causality” – is the implication that polarization causes inequality.

The strongest visual clue of that editorial inclination is in the basic layout of the above graph: polarization is scaled along the horizontal axis (or “X-axis”) and income inequality along the vertical (or “Y-axis”). It is standard practice to graph the independent, or causal, variable along the X-axis and the dependent variable along the Y-axis. (Or, as we learned in algebra I, y = ¦(x).)

There’s no question that income inequality is a serious problem and getting worse. But is it caused by polarized politics? If anything, my intuition favors the converse: that income inequality is exactly the kind of large, complex, emotional issue with debatable solutions that causes, rather than is caused by, political polarization.

What does all this have to do with big data? Well, big data software enables us only to identify correlations between variables. It doesn’t tell us whether a causal relationship exists, let alone which direction the causal relationship runs. But we’ll certainly be able to find more of these correlations.

So whether we’re talking business, consumer behavior, public policy, or sports, be prepared for more of these partially-baked analyses.

*“Painting with Numbers” is my effort to get people to focus on making numbers understandable. I welcome your feedback and your favorite examples. Follow me on twitter at @RandallBolten.***Related Blogs**

**Other Topics**