We live in a world of big data.
Every day, new data sets are released and new jobs are created for data scientists.
People are predisposed to trusting data because, well, it’s about numbers, right? Data is objective, free from personal influence. Three is three. I can’t just tell you that three is two. You wouldn’t believe me. You’re not that easily fooled.
But what happens when the numbers are right, but the analysis is off?
The Providence Journal recently gave us an example of using solid numbers, but failing to consider all aspects of the situation during the analysis:
The U.S. Census Bureau says Rhode Island has about 770,000 adult citizens and that 73.5 percent of them are registered to vote.
So how many voters does the state have on its rolls?
Need help? Here’s a hint: 73.5 percent of 770,000 is about 566,000.
If that’s your guess, you’re wrong.
Rhode Island’s voting list claims more than 748,000 people.
That’s 180,000 more than the Census Bureau numbers suggest belong there.
And 20 of Rhode Island’s 39 municipalities, from the largest city to the smallest town, had more registered voters than it had citizens old enough to vote.
The article goes on to explain the possible reason for this crazy disparity: It’s hard to track voters. People move and people die, and these important life events are not always recorded.
Most people assume that if you change your address with the post office or the DMV, your voter registration is automatically updated. But that’s not actually the case. Changing your postal address has nothing to do with your voter registration and, in Pennsylvania, you have to check a special box on your DMV form to change voter registration.
Didn’t know that? It’s OK. The helpful polling officials at my voting location didn’t know it either. When I asked for a voter registration change-of-address form, they told me just to change my address with the post office, and my voter registration would be updated.
You might be asking yourself at this point, ‘Why does this matter? Who cares if there are more registered voters than citizens in a given municipality? That doesn’t mean that there’s voter fraud.’
But that’s exactly why it does matter. People can look at numbers like 770,000, 73.5 percent and 748,000 and infer wrongdoing, even if there isn’t any.
Such perceived wrongdoing could spur any number of reactions, including seeing a need for Voter ID laws.
Another example of dangerous data analysis is Google Flu Trends. Here’s Google’s explanation of the project:
“We’ve found that certain search terms are good indicators of flu activity. Google Flu Trends uses aggregated Google search data to estimate current flu activity around the world in near real-time.”
Awesome! You can literally see the flu spreading in your geographic area with Google’s Flu Trends maps. Google even has years of data that give credibility to their suggested correlation between actual flu outbreak and their search term tracking algorithm.
So what went wrong in 2012? One explanation is that Google was relying too heavily on only one form of data collection, search terms. Another explanation is, people are hypochondriacs who, if given the chance, will WebMD themselves into a myocardial infarction.
What is the takeaway here? Are we supposed to distrust those seemingly faithful numbers we love so much?
However, we need to realize that when we look at data sets that relate to humans we must be careful to consider that relevant variable, the subjective human, in our equations.
Reach Allie Kanik at 412-350-0264 or firstname.lastname@example.org.
This fact-based local reporting drives impact and creates change. Help power that impact.
James Baldwin wrote, “Not everything that is faced can be changed, but nothing can be changed until it is faced.” PublicSource exists to help the Pittsburgh region face its realities and create opportunities for change. When we shine a light on inequity in our region, like the “completely unacceptable” conditions in low-income housing in McKeesport, things change. When we ask questions about policymakers’ decisions, like how Allegheny County is handling COVID-19 safety for its employees, things change. When we push for transparency on issues that affect the public, like in the use of facial recognition software by Pittsburgh police, things change.
It takes a lot of time, skill and resources to produce journalism like this. Our stories are always made available for free so that they can benefit the most people, regardless of ability to pay. But as an independent, nonprofit newsroom, we count on donations from our readers to support this crucial work. Can you make a contribution of any amount (or better yet, set up a recurring monthly gift) to help ensure we can continue to report on what matters and tell stories for a better Pittsburgh?