From Feb. 27 to March 1, most of the PublicSource staff will be in Louisville, Ky., for a National Institute of Computer Assisted Reporting conference, learning more about how to use electronic data to provide readers with better stories.
The story below is a behind-the-scenes look at how PublicSource reporter Emily DeMarco used public data to report a story about Pittsburgh's roads.
Finding stories about Pittsburgh's roads in electronic data
By Emily DeMarco | PublicSource | Feb. 6, 2013
This story was first published by UPLINK, a publication of NICAR.
My palms become sweaty just thinking about math. Taking on a dataset of 25,000 records? Turn on the armpits.
I’m a fellow at PublicSource. My first crack at a data story was terrifying. And incredibly empowering.
After a three-day Investigative Reporters and Editors and National Institute of Computer-Assisted Reporting bootcamp, I was hooked.
Back in the newsroom, we used records from Pittsburgh’s 311 non-emergency call center to analyze how the city was handling pothole complaints. We found that between 2006 and 2012, the amount of time it took to resolve pothole complaints was on the rise. By 2011, residents were waiting an average of three weeks, and service was inconsistent between neighborhoods.
Frankly, I had it easy. My editor was supportive and patient; colleagues helped troubleshoot; and data queens like Jaimi Dowdell, IRE’s training director, fielded my panicky phone calls and emails.
As for getting my hands on the records, I didn’t -- initially -- get pushback.
The 311 call center manager emailed me spreadsheets with three years of pothole complaints, including the notes section. When I asked for the entire database, the city’s spokeswoman said I needed to file a state Right-To-Know Law request.
My request was denied on grounds that releasing the records would have a chilling effect on the public’s use of the call center. But, by that point, another city official had emailed me a redacted version of the database.
It’s ridiculous, but I was shocked when I first opened the 311 files. (Why are there four different date fields? What do these abbreviations mean? How can I possibly find a human in this mess?)
At times, Microsoft Excel spreadsheets and Access database managers seemed just as horrifying. I spent too many hours late in the office sparring with the computer to get it to do what I wanted.
If you read no further, here’s my advice to newcomers to data journalism: shelve your ego and ask for help.
The rewards include sleep at night; feeling confident that flaws were corrected before going to publication; and being armed with an arsenal of bulletproofed findings to take to your public agency for comment.
People inside and out of my newsroom helped.
For a start, I needed to calculate the workdays (i.e. Monday through Friday) to determine how long a complaint took to resolve. I found an Excel formula that excluded weekends, but PublicSource’s web designer, Allie Kanik, tweaked the formula to exclude holidays.
PublicSource reporter Halle Stockton identified another major flaw. Because Pittsburgh’s streets aren’t exclusively owned by the city, she pointed out that I needed to exclude pothole complaints on county- and state-owned roads.
Other issues weren’t as cerebral.
During the first few days of working with the data, Excel refused to read the date fields properly. Cursing at the program, I sent a note to the NICAR bootcamp listserv. Fellow bootcamper Curtis Skinner and his editor, Mike Sullivan, wrote a formula that solved my problem in a snap.
What my analysis came down to was very, very basic math. Addition, subtraction, percentages, that sort of thing. All of my panic seemed misplaced. Slowly, bizarrely, I began to feel confident about math for the first time in my life.
Liberated, I started to identify patterns and oddities among the data.
Pittsburgh’s 311 database depends on paper records. Pothole repairs are marked as resolved on paper work orders by public works crews. At times, the information may not be entered promptly in the 311 database.
The biggest oddity was the number of unresolved complaints, all of which had to be dropped and noted in the methodology. When asked about this issue, a city official said the records with blank fields should have been wiped, like bad debt.
Plus, the location of the pothole complaints was rudimentary, at best. No latitude/longitude. (Again, paper records.) Despite those setbacks, my colleague Kanik did an incredible job of visualizing the data.
For the graphics, she started with Google Chart Tools, then used Adobe Illustrator. She did the same for the maps, preparing them in ArcGIS.
Here are some tips I learned, but underestimated, during bootcamp, led by Dowdell and Jennifer LaFleur of ProPublica:
- Document your work. I found myself writing in an almost ‘See Spot run’ tone. Over-simplified descriptions were crucial, because what makes sense at 9pm on a Tuesday, may not be as clear on Wednesday morning.
- Memorize your table’s record count. Write it on your hand. Carve it in the bathroom stall. Having that magic number in mind will help you make certain no records were lost after you run your calculations.
- Visualize it! I’m a novice at this, but even rough drawings helped me.
- Arm yourself with the calculations that didn’t make it to the final edit. When the phone rings from city officials looking to poke holes in your analysis, you’ll want answers to their questions on hand. Print your findings for quick access!
- Don’t make any assumptions, although this goes without saying to reporters. Check even the most benign questions with the agency that built and uses the dataset.
Potholes are hardly a sexy topic, but the costs of streets riddled with potholes became apparent the more I waded through the records. A child with a fractured wrist. Pock-marked streets outside elderly residents’ housing. A mother who fell while carrying her baby – twice – at the same intersection.
Math still makes me queasy, but the stories behind the numbers are worth it.