Guest Blog Post: What on Earth is Placekey and How Can I Use it in Tableau?
Ken and I are incredibly excited to have Sarah Battersby and Paul Rossman join us today as guest contributors to FlerlageTwins.com. They are both brilliant and a ton of fun!
Sarah has been a
member of Tableau Research since 2014.
Sarah’s primary area of focus is cartography, with an emphasis on
cognition. Her work emphasizes how to help everyone visualize and use spatial
information more effectively – no advanced degree in geospatial required. Sarah holds a PhD in GIScience from the
University of California at Santa Barbara. She is a member of the International
Cartographic Association Commission on Map Projections, and is a Past President
of the Cartography and Geographic Information Society (CaGIS). She can identify Kevin and Ken correctly 50%
of the time. Sarah can be contacted at sbattersby@tableau.com or on Twitter
@mapsOverlord.
Paul has been in
the analytics/data science space since 2011 – or when machine learning was
called a regression model. He has been using Tableau since 2015. He dabbles in a
lot of different areas – from baseball to geospatial to tracking his own weight
loss. Paul has a master’s degree in applied mathematics from Indiana University
of Pennsylvania and an undergrad in journalism – but sometimes forgets how to
piece a sentence together (Paul’s words – not mine). Paul knew Ken before it was cool to know
Kevin. Paul can be contacted at
prossman@gmail.com or on Twitter @p7_stats.
(See, I told you they were a lot of fun!)
Thanks for having us Kevin & Ken. As mentioned above, my name is Sarah
Battersby and I LOVE mapping. Today,
Paul Rossman and I will be sharing a bit about a new open initiative from
SafeGraph called Placekey and we will talk about how to utilize it within
Tableau.
The intention of Placekey is to provide a unique,
standard identifier for physical places (a "where") AS WELL AS a
"what" component to track the details for known points-of-interest
(POIs). The Placekey website describes it as follows:
“Placekey is a free, universal standard identifier for
any physical place, so that the data pertaining to those places can be shared
across organizations easily.
However, Placekey goes beyond just an identifier. It’s
a movement of organizations and individuals that prize access to data. Placekey
members want geospatial data that is easily joined and combined...because real
answers come from combining data from many different sources. It is a
philosophy that data should be easy to access, and data should not be hoarded.
These members believe that data, when combined, can do massive good.”
The "Why Placekey" white paper does a great
job at documenting the standard and structure of Placekey coordinates, so we
won't go into that here. We will just
focus on the fun parts of how to use Python to interact with Placekey and how
to turn the Placekey information into data that you can work with in Tableau!
If you are a data scientist & Tableau user like
Paul, you are probably often joining physical places from multiple datasets.
This could be address data from places like POI Factory or ad-hoc location
scrapers built to scrape business locations (like this one for Starbucks). Joining location data from addresses can be
a mess!
It’s great when the physical address matches, but when
it doesn’t, which happens often in rural areas, you run into some challenges.
Using Placekey we can get around some of these problems. Placekey provides us with one unique
identifier for each business or point of interest that can be used across
datasets. This allows us to de-dupe and clean our POI data in a seamless way.
Image source: SafeGraphy "Why Placekey" white paper
First, I will dig into the basics of working with
Placekey and various ways to interact with the data to give you multiple tools
in your data prep toolbox.
From there, Paul will demonstrate a real business use
case and will walk through what this looks like in a Tableau Prep workflow.
So, let’s get going!
A bit of background
To take serious advantage of Placekey and the Python
libraries that we’re using to work with the data, a good place to start is some
background reading. Here are some of our
favorites for getting up to speed… but if you want to just skip over these for
now and come back to them later we totally understand.
Placekey API documents - What is the Placekey API? What can you do with it?
Places manual - General documentation on the SafeGraph Places dataset, with a section
on Placekey
About
Uber's H3 - Uber's Hexagonal Hierarchical Spatial Index
H3 Python library - H3 Python library GitHub Repo
Tableau
TabPy (Background and basics, GitHub Repo)
The basics of working with Placekey
There are a few ways to access and work with Placekey
codes. I have put together a Jupyter Notebook with a step-by-step walkthrough that demonstrates using the Placekey Python library and accessing Placekey using the API URL.
We aren’t going to walk through the whole notebook –
since you can just read it on GitHub or copy it and start working with it on
your own, but we’ll highlight the basics.
1. You’ll need
a Placekey developer account. This is
free. Placekey is free. All you have to do is register on the
Placekey Developer Portal. Once you have
an account, you will have an API credential key. You’ll need this for calling out to
Placekey. Once you have your API key you
can get serious about tapping into Placekey!
2. You need
location data. Placekey can include
various types of location information, including latitude/longitude pairs,
addresses, and POI (Point of Interest) names.
The Jupyter notebook has several different examples of valid input data
that you can check out – like this one for the Tableau HQ where we have the
street address, city, state, etc.
"address": ''' {
"query" : {
"street_address": "1621 N.
34th Street",
"city": "Seattle",
"region": "WA",
"postal_code": "98103",
"iso_country_code": "US"
}
}'''
3. You need
to ask the API nicely to return the Placekey.
You can do that in a few ways… You can type the address in manually on
the Placekey web page:
Or you can use the API and some Python code to run
through a bunch of locations at a time. The Jupyter notebook walks through all
of the steps to checking a series of addresses to return the Placekey for each
– here is the basic query using Requests and the resulting JSON. For each location, we have a Placekey in the
WHAT@WHERE format. For these examples,
we only used a query name for three of the queries…just in case you wondered
why the query_id was 0 on the first two.
We just didn’t name them. You
don’t have to use a name…
4. Generate
some geometry from the result. Once we
have the Placekey WHAT and WHERE components, we have everything we need to
enrich our dataset and drop the data into a map to use in Tableau ( or anywhere
else…but we’ll set up the data in .hyper files for use in Tableau, because we
love Tableau).
Each Placekey WHERE code (e.g., @5x4-4b3-wc5) may look
like a bunch of nonsense, but it represents a specific location on the
earth. The code ties directly to a
single, small-ish Uber H3 hexagon.
Using the Uber H3 library we can find all sorts of
details for that location and drop them into a file to use in Tableau.
We can
find the centroid of the hexagon using the Python Placekey library. This lets you map addresses to approximate
point locations
pk_centroid = Point(pk.placekey_to_geo(“@5x4-4b3-wc5”)
Or we can
get the entire polygon geometry. This
gives you nice hexagon geometry so that you can aggregate your dataset into
polygonal regions (e.g., sum of sales for all locations within this hexagonal
region).
pk_polygon =
pk.placekey_to_polygon(placekey_where_code, geo_json=True)
Here is the result for a few different Placekeys:
5. Write the data to a .hyper file to drop directly into Tableau. It’s nice to use Python to generate data, but that doesn’t go straight into Tableau on its own. But, it’s easy to use the Tableau Hyper API to quickly write the results into a .hyper file. This is all done step-by-step in the Jupyter notebook, so we won’t detail it here.
It’s pretty simple – you just create a hyper file and
add two data tables – one for the original address data, and one for the
geometry. Since multiple locations can
fall in the same location, it makes sense to have a separate table for the
geometry so you don’t have to duplicate a bunch of polygons and bloat up the
size of your .hyper. Instead, you can
just join the data and geometry tables together in Tableau instead.
And then you can go crazy with your analytics in
Tableau:
A Tableau Prep Workflow
While it may have been informative to use a Jupyter notebook walk through to learn about Placekey, it is even better to have a
real-world example from a real, live data scientist. Paul Rossman will walk us through a TabPy
script to insert directly into your data prep workflow in Tableau. Take it away, Paul.
Thank you, Sarah.
To set the stage, imagine you are given 2 datasets with two different
sets of Fast Food locations and you need to figure out which ones match and
which ones are in one dataset but not the other. You look at the data. Some of
the physical addresses match, but some don’t. Some have some pretty gnarly
addresses. Some of them have Rural Route addresses, others are intersections.
Where do you start?
If you are like me, in the past, I would concatenate
the address fields and try to do string matches. You could run it through an
address standardization API, but that only gets you so far.
Here’s where Placekey comes into “play”! With
Placekey, you can pass an address (cleansed or uncleansed) and it will find the
physical address (magically) and give you back the unique identifier for each
location. You can then join on this identifier, and now matching locations
becomes a breeze!
Okay, so let’s do it! Using a python script (availableon GitHub) and Tabpy Server, we can easily clean our location data. You can
download the script to use. To explain how
it works, we’ll walk through key parts of the script here.
We’ll start by jumping to the third function called
placekey_lookup. To help make sense of
this, assume that we have a dataset that has the following fields/structure:
Working with this data, our first step is to rename
the columns so the placekey API can read them.
Next we clean out the data so we eliminate Nulls or
blanks, and convert the data frame to json.
Next, we break the file into 50 record chunks, using
the helper function in the script called prepare_batches_for_API. The function to prepare the batches just sets
up the properly formatted query to the Placekey API. Finally, we pass those chunks to the
placekey API, and parse the returning json.
At last, we have our original data and the matching
placekey for each of the locations it was able to match.
Placekey has a whole bunch of other functions, some of which you may have seen in the Jupyter Notebook that Sarah put together. When cleaning up POI locations, it’s the most robust solution for matching current businesses and past businesses that are / were located in the same place.
Well, that’s it.
We hope that our introduction helps you get started working with
Placekey in your Tableau workflow!
Thanks for reading!
Sarah & Paul
Kevin Flerlage, November 30, 2020
Twitter | LinkedIn | Tableau Public
No comments: