There are tons of map files (.kml, .shp, .geojson) out there ready for the taking. But sometimes the files you need or want just aren’t available. What to do!

If you’re like me, cursing the gods and throwing in the towel are never options. So the only thing left to do is to make your own.

This may sound daunting, but depending on the type of map files you require, it’s probably not as difficult as you might think.

I have included a download with this post so that you can follow along. Download and uncompress the folder to find the documents I’ll be referencing below.

What exactly are you trying to visualize and why

Having a clear idea of what you’re trying to create before you create it is crucial. Draw up a basic record layout for your map file. List the columns you need it to contain, write a definition of what the columns will contain and give an example of what the data in each column might look like.

Example file:
pa-nursing-homes-20150220-notes.txt

What resources are you going to use to get the base data you’ll need to create the map file?

If you’re following along with the example files, you might be wondering how I came up with all those columns and what they mean. Well, I didn’t come up with them. I got my base address data from the Pennsylvania Department of Health.

Most map files that you create aren’t going to be your own brainchildren. They’re going to start with some existing data that you’ve decided needs to be visualized.

So once you have your record layout drawn up, look around the Internet and see what kind of data already exists that you can pull from.

Generally, you should be looking for lists of addresses that you can geocode. You should also look for information associated with the addresses. One source might have a list of nursing home names and addresses. Another source might have a list of nursing home names with bed count and ownership attached.

Just because the data you find doesn’t come in a neat little package doesn’t mean you can’t make it work together. There are, of course, things to watch out for when you’re joining data from two different places:

  1. Are the datasets from the same time period?

  2. Are all of your sources credible?

  3. Do you find that there is conflicting data between the sources?

Finding credible sources for data is a whole different topic that we won’t be going into today.

Once you find the data that fits all the needs of your record layout, you’re going to need to join it all together.

Here are a couple of tools you can use to join your data:

  • QGIS: QGIS is an open-source mapping program. You can add and join your data to QGIS even if it doesn’t have geographic information.

  • csvkit: csvkit is a great, easy-to-use command line tool for examining and manipulating delimited value data (.csv, .xls, .tsv).

  • sql: Toss your datasets into database tables and use sql JOINs to join them together.

Once your data is joined, go back over it and make sure you’ve received the results that you expected. Once you’re satisfied with the results, update your record layout to reflect all the fields you now have in your dataset.

Example file:
pa-nursing-homes-20150220-source.csv

Is the map file going to be a one-off or will it need to be updated?

You now need to consider what you just created. Is this a dataset that will need to be updated every so often? Unless your data deals with historic information, it should probably be updated occasionally.

But we all know the difference between what should be done and what will get done. So at the very least, you need to add a date to this data.

Head over to your record layout and write up a short description of the dataset you’ve compiled: what it contains, what sources you used, how you created it, and, very importantly, a “Current as of” date.

If you know that you’re going to be able to update this dataset, add an update schedule as well.

Example file:
pa-nursing-homes-20150220-notes.txt

Turning addresses into usable geographic data

So you’ve got a dataset with some pretty interesting information, but now what? You need to convert those address fields into something your mapping program will understand as location data.

Enter geocoding.

Here are a couple of geocoding options:

  • Refine geocoding: Peter Aldous has made geocoding addresses easy, legal and free with this refine .json code. It uses Bing and MapQuest data.

  • geocod.io: Super simple and pretty inexpensive. Uses Census Bureau data.

  • Google: Yes, Google does offer geocoding, but it’s important to read the restrictions that they put on your ability to use their data.

Just make the map already

At this point, you should have your record layout and your dataset with latitude and longitude attached. Now you’re ready to make your map file!

I use QGIS to make my map files, but there are other options out there, like ArcGIS.

We’re not going to go into the nitty gritty how-to’s of making the map, but here’s a little tutorial I drew up for NICAR this year.

Once you have your map file (I’m using a shapefile for this example) you’re going to want to update your record layout again.

file structure screenshot

You should now have 3 files:

  1. A record layout.

  2. The map file. NOTE: Your computer may show this as multiple files. If it does, I recommend compressing all the files into a .zip file for easier file management.

  3. The dataset you used to make the map file.

If they weren’t already, toss all of these files into a folder with a descriptive name.

And there you go. You’ve just created a map file complete with source information, field descriptions and update path.

Alexandra Kanik is the web and interactive developer for PublicSource. You can reach her at akanik@publicsource.org or follow her on twitter @act_rational.

Do you feel more informed?

Help us inform people in the Pittsburgh region with more stories like this — support our nonprofit newsroom with a donation.

Alexandra Kanik

Alexandra Kanik was a web developer and designer for PublicSource between 2011 and 2015.