Iron Viz: Race on America’s Farms
It’s Iron Viz season!! I didn’t participate in the contest last year so I was really excited to get involved again this year. To my knowledge, I was the first to share my Iron Viz entry for the first feeder (my brother shared his two days later). As this feeder came with lots of challenges, I wanted to talk a bit about my viz and how I created it. Along the way, I’ll give you some tips that I think may help you with yours.
The Data
Let’s start with the data. For this contest, Tableau provided a single standard data set for everyone to use. The data came from the United States 2012 Census of Agriculture and included over 200 different metrics for every county. The data has a number of challenges including:
1) All the metrics are separate columns, so it’s very wide.
2) The metrics are broken into multiple Excel worksheets.
3) Columns are named by a code. The code must be looked up on a separate worksheet.
4) Each worksheet only includes the county FIPS code, not the county name and state. That information resides on yet another worksheet.
Tip 1: This data requires some data prep
The beauty of Tableau is that we can connect to data then just start exploring, but the structure of this source data makes that pretty difficult, so before we start exploring, we’re going to need to do some data prep. To make this data really usable, we’ll need to merge the multiple worksheets, pivot the fields (make it tall and skinny instead of short and wide), give the columns more informative names, and bring in the county and state names.
I used Tableau Prep to do this. But I was not alone. Chris Love, in a blog discussing 10 potential data stories to get you started, took us step-by-step through the build of a Tableau Prep flow performing the transformations noted above. So, if you’re struggling with getting the data into a usable structure, I’d definitely recommend reading his blog. Or, if you'd like to work with a data set that's already been prepared, Sean Miller has provided a number of different versions of the data.
#DATAfam! As you saw, #IronViz is upon us & we're all using the same dataset! And if you've not taken a look yet, the dataset is in need of some prep. Click the link to find a folder containing the prepped #IronViz dataset.#NoExcuses #GoForthAndVizhttps://t.co/o5UFA9CBMd pic.twitter.com/S1UOUsGgzq— Sean Miller (@HipsterVizNinja) April 9, 2019
Tip 2: Explore the data interactively
After building my flow and writing the data to an extract, I was able to explore the data in Tableau. I started out by building a simple county-level map, coloring the metric value, then stepping through each metric one-by-one to see if I could see anything interesting. As I began to notice interesting patterns in certain geographic locations, I drilled into those further. For instance, I observed that apples were a very popular crop throughout much of the United States, so I then started to look at other fruits to see how they compared. That led to my first idea—a play on the phrase, “Apples & Oranges” where I’d analyze these two crops see which counties produced more apples vs those that produced more oranges and by how much.
Then I expanded that idea to start looking at other fruits, building a small multiple map showing 5 of the more common fruits. Eventually, however, I abandoned this idea because the data was just too sparse. Many of the counties had no data for each of the fruits so I ended up with a lot of grey on my maps which indicated the lack of data.
Despite my fail with the fruit metrics, I persevered. I returned to the exploration and eventually found what I thought could be an interesting story about the racial makeup of farm operators.
The Maps
As this data set includes county-level data, maps are an obvious choice. That said, it’s not the only choice and there are any number of different ways that this data could be visualized using other methods such as spatial treemaps, bar charts, etc. But, after trying out a few different options, I decided to go with maps.
I wanted to use a conic, equal area map projection called the Albers Projection instead of Tableau’s built-in web Mercator projection simply because it had the look and feel I was looking for. But, since this isn’t built into Tableau, I would need to use polygons to draw the maps. Fortunately, I had previously used a technique created by Jake Riley and Josh Tapley, who made the polygon files publicly available (See At it again…Getting Alaska and Hawaii on the Map for the files and instructions on how to use them).
Armed with this technique, I was able to create some beautiful maps. For example, here’s one showing the percentage of farms with Spanish, Hispanic, or Latino operators.
But there are a couple of problems with this.
1) Alaska is missing.
2) In the northeast, where there are very few farms with Hispanic operators, everything just blends together because it’s all the same yellow color.
3) Hawaii’s yellow color almost blends into the background.
So, how to fix these? Let’s start with Alaska. Unlike the other 49 states, Alaska isn’t broken down by county; instead, the data shows the different “Agricultural Census Areas.” These are not part of Tableau’s internal geocoding database, so they won’t map automatically. But I figured there must be a spatial file out there somewhere, right? Yep, if you search around the USDA site, you’ll see that they have shapefiles readily available. So, I used that to map Alaska.
Okay, so technically this Alaska map is using a web Mercator projection while the rest of the US is using Albers. Ideally, both would use the same, but given the fact that we’re moving and resizing Alaska so significantly already, I didn’t think that this was too big of a deal, so I went with it.
What about the county borders? Well, what about just adding borders to our polygons?
The problem is that we can’t control the thickness of the borders so, for small counties, it becomes difficult to see the color at all. Fortunately, I remembered a strange little problem with how polygons render on Tableau Server and Tableau Public. Due to how HTML5 Canvas renders adjacent polygons, they end up having a very slight border around them. Well, that’s exactly what I wanted!! So I removed the borders from the polygons, published my map to the web, and voila! I got a nice thin border around each county. But those borders were white (or more accurately, they were the same as the background) and they were still hard to see on the large swaths of light yellow, so I created a dual axis map, making one a nice shade of orange.
I then turned them into a dual axis so that the map on the right would sit on top of the orange map. By doing this, those thin border lines will take the color of the orange map, which made it easier to see them.
Finally, in order to help prevent Hawaii blending into the background and to give the maps a subtle pop, I added a border to the orange map. This border gets added to the outside of the polygons, which means that the map is just slightly larger than the real map, creating a nice thin border around the US as a whole. In the end, I was really happy with how the maps turned out.
Back to Data Prep
Before I move onto the rest of the build process, I want to take a step back and talk a bit more about data prep. Having decided to focus on race, I no longer needed the full data set with every metric. In fact, I wanted to trim down my data set as much as possible so that I could get the best possible performance from it. So, I decided to build an entirely new Tableau Prep data flow. I started with the Operatorsdata from the original data set, then manually changed the field names to be more descriptive (as there were only six races, it seemed unnecessary to join back to the list of variables). I then pivoted the races so that the data was tall and skinny rather than short and wide. Next I joined in the county name information, then added in some additional information about the number of farms (from the Farms data) and the income of the farms (from the Economicsdata). Finally, I joined this together with my county polygon data (for creating the Albers projection), removed any unnecessary fields, and wrote it all out to a hyper file as shown below.
This gave me a nice, clean data set without any extraneous data that I didn’t need.
Final Layout
Okay, I’ve been rattling on for a while now, so let’s finish up by talking about the layout. I wanted to introduce the topic then dive into some details and further analysis, so I decided to create multiple dashboards and leverage dashboard buttons for navigation from one to another. To introduce the topic, I created a simple dashboard with maps for each of the six races.
But I then wanted to drill into the data a bit more and provide some additional context, so, on the next tab, I showed the same maps but added some annotations to show areas of high and low concentration of each race.
Since the map of white operators was so much bolder than the other five races, I used the third tab to do some additional analysis of farms with white operators. This tab highlighted all the counties with very large percentages of white operators as well as the handful of counties with relatively low percentages.
On the fourth tab, I explored the most and least diverse counties in the United States.
Finally, on the fifth tab, I wanted to give the user an opportunity to interact with the data themselves. Thus, this map included slider for each of the six races, allowing you to specify different ranges and see which counties meet those criteria.
Tip 3: Persevere
Okay, I think I’ve talked enough, but before I share the final viz, let me just share one final tip—Persevere! There is little doubt that this data is challenging, but that’s a good thing. Challenging yourself is a great way to improve your skills. So accept the challenge and keep working towards your goals. This data has some great little stories hidden within it, so keep working with it and attack each obstacle one at a time.
Alright, here’s the final viz (click on the image to see the fully interactive version). Thanks for reading and feel free leave me your thoughts in the comments section.
Ken Flerlage, April 25, 2019
No comments: