The Pennsylvania Department of State releases text files of every campaign contribution to every state race on its website. We downloaded those files and selected the contributions to the Corbett and Wolf campaigns.
We then standardized the names and addresses. For instance, if John Smith and John A. Smith each sent in a contribution from the same address, we figured out if they were the same person. Then we combined the amounts. We also corrected for any misspellings.
We standardized the zip codes. Some people wrote five-digit zip code and others wrote full nine-digit zips. We wanted each person to be represented by a five-digit zip code so we could easily group them.
The original data did not contain a designation for counties. Originally, we used Census lookup tables and the Census geocoder to generate counties, but that resulted in an error rate of approximately 4.5 percent. During the campaign, we used the Google Maps geocoder through the open-source mapping program QGIS.
For the app’s final update, we used a combination of geocoding resources. First, we used a Bing/MapQuest geocoder developed by Peter Aldhous, BuzzFeed science and health reporter and professor at University of California. Any addresses that were not found using that method were feed through the Google geocoder. We believe using multiple geocoders has given us the most accurate results.