Ben Wellington, a blogger and computer science professor, recently discovered that New York City had been incorrectly issuing an estimated $1.7 million worth of parking tickets each year since 2009.
He was able to spot this pattern because the New York City Police Department published its ticket data. Only then did police officials realize they neglected to train their beat cops when a parking law changed seven years ago.
Data can be used by people outside government — journalists, researchers, activists, computer programmers — to discover something new about your city or state.
Even if you’re not downloading spreadsheets, you’re likely interacting with government data.
It fuels school scores on Zillow and bus routes on Google Maps.
“People don’t realize that government data is flowing and is embedded in many of the services that they rely on,” said Kevin Merritt, CEO of open data company Socrata.
Whether it’s an app in Chicago that tells you if your car has been towed, or New York City officials using data to identify the source of a deadly Legionnaires’ disease outbreak, those kinds of projects — inside and outside of governmental institutions — are not happening with state data in Pennsylvania. Mostly because it’s not available.
Pennsylvania is one of 25 states that doesn’t provide a robust open data website, according to a PublicSource analysis of every state’s open data website. Those states don’t publish data on a variety of subjects and in formats that can be easily analyzed.
It is also one of 10 states that does not provide data to the federal government’s open data website, according to the Pennsylvania Office of Administration.
“The bad thing is that we’re at the bottom of the barrel right now, but the good thing is that we have all of these examples to learn from,” said Erik Arneson, director of the Pennsylvania Office of Open Records.
With a strong vision and good planning, he said, there’s no reason the state can’t catch up in the next two years.
Gov. Tom Wolf signed an executive order on April 18 to create an open data website and policies on how data is collected and distributed by state agencies under the governor’s office, such as the Department of Human Services, the Department of Corrections and 30 others. State-related universities, such as the University of Pittsburgh and Penn State, independent agencies and other affiliated agencies will be encouraged, but not required, to participate.
Pennsylvania cities such as Philadelphia, Pittsburgh and Reading already have open data websites.
They’ve seen the benefits. People have built useful tools with their data. Workflow improvements have saved time and money.
But they’ve also had their share of challenges, many of the same challenges that the state will likely face. They’ve had to fight for buy-in from different departments and coax data out of closed environments.
What do we know so far about the state’s plans?
Since Wolf signed the executive order in April, the state has begun to meet with data experts in the state and to negotiate with Socrata, a company that has created more than 300 open data websites nationwide, including The White House, Illinois and New York City.
The Office of Administration is considering paying Socrata through an existing contract with the state’s website provider, Pennsylvania Interactive. Because the details have not been finalized, the cost of creating an open data website is unknown.
There are two employees in the Office of Administration who will be coordinating much of the work on the open data initiative. There are no plans to hire additional staff to support the open data initiative.
Paper records — a journalist’s worst nightmare
PublicSource frequently receives paper records from state agencies and does the tedious work of getting them into a format that can be analyzed.
For a project about prescribing patterns in state youth detention facilities, PublicSource received more than 1,600 pages of records. The absence of electronic records was a symptom of the lack of oversight of the powerful medications prescribed to the juveniles.
When we wanted to analyze more than 10 years of internal discipline records at the Pennsylvania State Police, those arrived as a box of paper records.
PublicSource also digitized eight years of state vaccination records for another story on the dangers of allowing unvaccinated children to attend school provisionally for up to eight months. Since PublicSource reported on this issue, the state proposed reducing the provisional attendance period to five days.
“Agencies are being asked to publish data that already exists, and any additional support is being provided by existing staff and resources,” Jeff Sheridan, the governor’s spokesman, wrote in an email.
Pennsylvania plans to launch the open data website in August with a statistics page that “highlights the progress and performance of the governor’s goals,” related to education and job creation, among others, according to Julie Snyder, the state’s director of data and digital technology in the Office of Administration.
They are going to start by publishing datasets that are easily accessible, ones that already exist in an electronic format.
For instance, the Office of Administration collects state salaries and publishes them as a more than 4,000-page PDF document on PennWATCH. For years, it has provided a spreadsheet of state salaries upon request, but doesn’t make the spreadsheet publicly available online. That could make it a good candidate for inclusion in the open data website.
The executive order establishes an advisory committee that will suggest and review datasets for publication. The plan is to have academics, state officials and data privacy attorneys on the committee. The list of members is being finalized.
Arneson, who has been asked to be on the committee, said the open records office sees what records people are requesting and the problems they encounter more than any single agency.
“I think that our office can bring a very practical perspective to developing the open data initiative,” he said.
A legal advisory working group will also meet every other month to discuss issues that could result from combining various state datasets.
“That legal advisory group will have to approve the data before it gets released,” Snyder said.
Merritt said there’s a tipping point now where every government entity is realizing they need to put their data online.
“It’s a lot like what it must’ve been like in 1995 and 1996 when governments were deciding that they needed a website,” Merritt said.
“This is now absolutely mainstream,” he added.
Some PA cities are ahead of the state
Under Pittsburgh Mayor Bill Peduto’s administration, Pittsburgh has not only launched an open data website, but has also made technological advances in many areas of the city’s operations.
Laura Meixell, the city’s chief data officer, has helped to implement changes for a number of city services, including modernizing the city’s 311 municipal services system (tips can now be submitted via an app or online) and introducing the snow plow tracker. The city also purchased 10 trash cans for the West End equipped with sensors to see which cans are full and track how fast they fill up.
That allows the city to optimize the pickup route for city employees, reducing the amount of time crews spend emptying garbage cans.
“The bad thing is that we’re at the bottom of the barrel right now, but the good thing is that we have all of these examples to learn from.”
As Arneson said, “Everything that you do to improve data collection winds up saving you money as well.”
The Western Pennsylvania open data website has allowed Pittsburgh and Allegheny County to partner better on permits and licenses.
Where asbestos remediation permits used to only exist on paper and were transferred in a manila envelope from the county to the city, now the permitting information in the city is automatically populated with asbestos permits from the county because the information is on the open data portal. That lowers the amount of time it takes to get a permit and prevents the paperwork from getting lost or buried on someone’s desk.
In Philadelphia, their open data website — Open Data Philly — was unique because it started out as a community project in 2011.
“I don’t think that open data can survive without a community that uses it, talks about it, asks for it, demonstrates the usefulness of it,” said Tim Wisniewski, Philadelphia’s chief data officer.
It’s one of several community-led data projects that Philadelphia has co-opted. A Philadelphia crime map made by software developer Dave Walk lets users draw a box around an area of the city, define the timeline, and see how many and what types of crimes occurred in that area. The map, using open data published by the city, was so good that the Philadelphia Police Department has a version of it on its website.
“By putting the data out on the web…government can benefit from their work as well,” Wisniewski said.
Having an open data website has changed how Philadelphia implements projects. As an example, the city built open data requirements into its bike-share contract.
“We don’t have to approach the system down the line and ask the vendor how to get the data out,” he said.
Bob Gradeck, project director of the Western Pennsylvania Regional Data Center, already traveled to Harrisburg to give the state a presentation about his work, and Wisniewski plans to meet with the state soon.
Emily Shaw, a senior analyst at the Sunlight Foundation, which is a national nonprofit group advocating for government transparency, said it was important that the state learn lessons from Pittsburgh and Philadelphia by consulting with them.
“As long as that’s happening, that’s great news,” she said.
Culture shock
Creating a data culture, and not just a data website, can involve changing how employees work on a daily basis.
“There has to be internal organizers because the data is going to have to move in ways that it hasn’t before,” Shaw said.
In California, Shaw said, they’ve done well with a system where certain employees will take ownership of the data and be the ambassadors for answering questions about a dataset.
“I don’t think that open data can survive without a community that uses it, talks about it, asks for it, demonstrates the usefulness of it.”
“It can also be a somewhat more piecemeal approach where certain departments are leaders,” she said.
Wisniewski, in Philadelphia, said convincing a department to relinquish its data can be a project in and of itself.
Gradeck, also a researcher at the University of Pittsburgh, said you have to show people that the data has value outside of its intended users.
Joanne Foerster, CountyStat manager for Allegheny County, said county employees collect data specifically to administer a program. “What open data is shedding light on is that data has value for other purposes,” she said.
When asked for this story, three of the state’s largest departments — Human Services, Corrections and PennDOT — provided no information about what data they had or hoped to release as part of the open data project.
One of the major challenges for state agencies is dealing with data that’s stowed away on paper or locked in a PDF format, data that often can’t be easily analyzed by computer software.
Going back and digitizing paper records does not appear to be within the scope of the state’s work, nor has it really been a major part of the work done for the Pittsburgh and Philadelphia data websites.
One exception was the air quality data that the Allegheny County Health Department used to publish in PDF format. “It was in an electronic form, but a computer program couldn’t manipulate the data in that form,” Foerster said.
Now that data is up on the Western Pennsylvania Regional Data Center in a spreadsheet format where anyone can analyze it. ‘“[It’s] a big change in culture,” Foerster said.
Reach Eric Holmberg at 412-515-0064 or at eholmberg@publicsource.org. Follow him on Twitter @holmberges.