Streaking - MLB Wins, Losses, and Streaks
One evening, I was chatting with Ken Flerlage over text. If I had to visualize what Ken and I talk about, it would look something like this:
Ken was working on a visualization and sent me the following image to see if I could guess what it represented (sort of like Ben Jones' #DataQuiz).
My guess was that it was related to population. I wasn't right, but I wasn't too far off. He told me that it was the breakdown of race in congress. (A few weeks later, he released his Diversity in Congress viz which included this chart. In my opinion, this is a must see visualization; take a few minutes to check it out).
Okay, you are trying to figure out where I am going with this. Well, I was immediately inspired by this simple chart. Although I am not a huge baseball fan (it's hard to be when you live near Cincinnati unless it's the mid 1970's), I thought this chart had huge potential to represent individual at bats for selected major league baseball hitters.
So, with this chart in mind, I started looking for data...and looking...and looking. I could find the overall at bats, hits, singles, doubles, triples, and home runs for each player for each game, but I could not find them in order (if anyone can find this data, please please let me know). I wanted the first at bat, the second, the third, etc. and I simply could not find it.
Although it was not what I was looking for, I did find some good data on baseball-reference.com. The data showed one team's full season, every game, the score, their record, how many games out of first they were, how many games they had won or lost in a row, and many other metrics. I decided to copy one team's full season data into an Excel spreadsheet to just see what I could find.
I took this sample data set and dropped it into Tableau. I created a bar chart showing wins as a positive figure and losses as a negative figure. I also sized the bars based on run differential (i.e. how many more runs were scored than the other team - or how many more runs the other team scored against this team). It looked something like this:
That's actually pretty cool, right? As I looked at this bar chart, I started to notice some trends. In this case, I saw a lot of blue bars in a row, which were a lot of wins in a row. I thought it would be cool to look not only at wins and losses, but also at winning and losing streaks. If you aren't aware, winning and losing streaks are a fairly common topic in baseball. With 162 games played in a season, you had better have a fair number of long winning streaks or you're simply not going to make it (like the Reds - sad face).
As I started to think about how to visualize this, I determined that a line chart would be the best. My concern, however, was real estate on the dashboard. I also thought about the feel of baseball. It's not a linear, squared off type of game. I just didn't feel like a standard bar chart and line chart were going to provide the right feel for this visualization. (As a side note, in a business environment, I would have absolutely gone with a standard line and bar chart). This is when I decided to wrap the bar chart and line chart around the logo in a circular format.
As you may know, I recently wrote about radial bar charts and therefore had spent quite some time in understanding and building them. (I encourage you to read this blog post because much of this blog post will reference that one). So I decided to start with the line chart. This "radial line chart" is very similar to a radial bar chart. Generally, you have a starting radius like a radial bar chart and you add a value to that radius to determine your point. For this line, I was dealing with winning and losing streaks. Like the bar chart, I determined that wins would be positive and losses would be negative. One win would be +1. Two wins in a row would be +2. Five wins in a row would be +5. Losses would work in the same manner but with negative figures. So, the point would be placed at an X and Y coordinated determined by the angle (like in the radial bar chart blog) and the radius plus the win or loss streak. For example, if I determine the starting radius to be 50, then the radius for a 5 game winning streak would be 55. The radius for a 6 game losing streak would be 44. Using the same trigonometry used for a radial bar chart, I simply plotted those points and connected them via a line.
The radial line chart looked cool and you could easily see the win streaks protruding outward and the losing streaks inward. I decided to double-encode that information by sizing the bar based on the length of the streak (absolute value of that streak to be more precise) and to color code it based on the team colors. I'll admit, I was super excited about how it turned out:
I hope you enjoyed the viz and the blog post. As always, if you have questions or comments, please feel free to contact me at any time.
Ken was working on a visualization and sent me the following image to see if I could guess what it represented (sort of like Ben Jones' #DataQuiz).
My guess was that it was related to population. I wasn't right, but I wasn't too far off. He told me that it was the breakdown of race in congress. (A few weeks later, he released his Diversity in Congress viz which included this chart. In my opinion, this is a must see visualization; take a few minutes to check it out).
Okay, you are trying to figure out where I am going with this. Well, I was immediately inspired by this simple chart. Although I am not a huge baseball fan (it's hard to be when you live near Cincinnati unless it's the mid 1970's), I thought this chart had huge potential to represent individual at bats for selected major league baseball hitters.
So, with this chart in mind, I started looking for data...and looking...and looking. I could find the overall at bats, hits, singles, doubles, triples, and home runs for each player for each game, but I could not find them in order (if anyone can find this data, please please let me know). I wanted the first at bat, the second, the third, etc. and I simply could not find it.
Although it was not what I was looking for, I did find some good data on baseball-reference.com. The data showed one team's full season, every game, the score, their record, how many games out of first they were, how many games they had won or lost in a row, and many other metrics. I decided to copy one team's full season data into an Excel spreadsheet to just see what I could find.
I took this sample data set and dropped it into Tableau. I created a bar chart showing wins as a positive figure and losses as a negative figure. I also sized the bars based on run differential (i.e. how many more runs were scored than the other team - or how many more runs the other team scored against this team). It looked something like this:
That's actually pretty cool, right? As I looked at this bar chart, I started to notice some trends. In this case, I saw a lot of blue bars in a row, which were a lot of wins in a row. I thought it would be cool to look not only at wins and losses, but also at winning and losing streaks. If you aren't aware, winning and losing streaks are a fairly common topic in baseball. With 162 games played in a season, you had better have a fair number of long winning streaks or you're simply not going to make it (like the Reds - sad face).
As I started to think about how to visualize this, I determined that a line chart would be the best. My concern, however, was real estate on the dashboard. I also thought about the feel of baseball. It's not a linear, squared off type of game. I just didn't feel like a standard bar chart and line chart were going to provide the right feel for this visualization. (As a side note, in a business environment, I would have absolutely gone with a standard line and bar chart). This is when I decided to wrap the bar chart and line chart around the logo in a circular format.
As you may know, I recently wrote about radial bar charts and therefore had spent quite some time in understanding and building them. (I encourage you to read this blog post because much of this blog post will reference that one). So I decided to start with the line chart. This "radial line chart" is very similar to a radial bar chart. Generally, you have a starting radius like a radial bar chart and you add a value to that radius to determine your point. For this line, I was dealing with winning and losing streaks. Like the bar chart, I determined that wins would be positive and losses would be negative. One win would be +1. Two wins in a row would be +2. Five wins in a row would be +5. Losses would work in the same manner but with negative figures. So, the point would be placed at an X and Y coordinated determined by the angle (like in the radial bar chart blog) and the radius plus the win or loss streak. For example, if I determine the starting radius to be 50, then the radius for a 5 game winning streak would be 55. The radius for a 6 game losing streak would be 44. Using the same trigonometry used for a radial bar chart, I simply plotted those points and connected them via a line.
The radial line chart looked cool and you could easily see the win streaks protruding outward and the losing streaks inward. I decided to double-encode that information by sizing the bar based on the length of the streak (absolute value of that streak to be more precise) and to color code it based on the team colors. I'll admit, I was super excited about how it turned out:
Now this told a cool story, at least I thought it did. Even though the chart was in a radial format, you could very easily see the longer winning streaks based on the length and the size. It was also very evident that the Boston Red Sox had very few losing streaks of any size.
However, I wanted to bring in more information, namely the bar chart I had originally created. Therefore, I decided to create a radial bar chart. I will admit, this was very easy to do in Tableau after writing my radial bar chart blog post. Like the original bar chart, the length of the bars showed run differential. Again, I double-encoded this information using the width of the bar. Lastly, similar to the radial line chart, I utilized the team colors to show the differences in wins and losses. The radial bar chart looked like the following:
Okay, what now? I had two charts that I liked, that showed different information, but information that was directly proportional to one another. I decided to do some layering to put these charts together in a nice package. I started by placing the radial bar chart and logo on a dashboard then overlaid the radial line chart with a transparent background. I was very careful to line up the games on the line chart to the corresponding games on the bar chart. It looked like this:
At this point, I had flashbacks to the Tableau Conference. Although it was completely unintentional, I realized that my visualization resembled Ludovic Tavernier's weather viz at the IronViz final. I sent Ludovic a message on twitter that evening:
And he responded:
Man, I love that guy! If you are not familiar with his viz from the IronViz final, you can see it and read about how he did it in his blog post. It was amazing! Perhaps I was subconsciously inspired by his work (it was very inspiring) or I just inadvertently went down the same road as he did, but either way, I was certainly happy with the result.
Now, I will say that although Tableau now allows for transparent backgrounds, it does not allow for that back layer to be interactive. So, I lost all interactivity with the bar chart. (because the line chart was on top of it). To be honest, I was perfectly okay with that. I decided to apply highlight actions to counteract this problem. The highlight actions would allow the user to select any point or points in the line chart and it would highlight the corresponding games in the bar chart...the best of both worlds.
Now that I had developed the chart with one single team's season, I now had to do it with 30 teams! I went through baseball-reference.com and copy/pasted all the team data into a single spreadsheet and then refreshed the data source in Tableau. From there, I added a team filter to both the bar and line charts.
The bar chart was easy to clean up for each team. The only thing I needed to do was change the colors for wins and for losses. I ultimately wrote a case statement that dictated the colors for every win and every loss of every team. The line chart was different. The color schemes used in the line chart were custom diverging colors with 2 steps, one for each color. I could not find a way to write a calculation that would dictate the color for each team in that diverging color palette. One of my other concerns was how to layout the dashboard. I thought about small multiples, but to be honest, I needed much more flexibility. Because of these two problems, I opted to make each team its own chart, with its own diverging color palette that would ultimately be added to the dashboard how I chose, with me being in full control.
I built all the charts, both bars and lines for each team, all with different colors matching their team logos. Now it was time to add them to the dashboard. But wait, I had better figure out the layout first. There are 6 divisions in Major League Baseball each containing 5 teams. I toyed with the idea of putting all 30 teams on one dashboard, but I thought it would be too busy. I opted to use six dashboards, each representing one division and use dashboard navigation buttons (as I love to do) to navigate between them.
As far as the dashboard itself goes, I have always loved Jacob Osulfka's work...it is just so clean and perfectly designed. One viz in particular that I absolutely love is his MLB Pitcher Heatmaps. The viz is fantastic, but what I really love is the fact that he used color for this charts, but the titles, text, and everything else was gray...it never took away from the charts themselves. I decided to employ that style to my viz as well so that the charts popped. In my opinion, it worked out very well.
From there, I simply built one dashboard by layering the charts. Once I had one built as I liked, I noted the exact locations of each element (they were all floated to allow for the layering) and then duplicated the dashboard five times for the other divisions. I then laid out each team as I did on the first dashboard.
As I reviewed the visualization, I started to notice some very interesting trends...which after all, is what data visualization is all about. Although the reader could certainly see these trends as well, I thought it would add just a bit more to the visualization if I called out some of those trends and statistics. I originally started using annotations, but I believe it took away from the beauty. I ultimately opted for placing stars (that were actually charts) near the streak or win in question then encoded a tooltip to provide the associated information. It added context and did not take away from the visual appeal.
The final visualization might be my favorite that I've ever created. Below is the main landing page showing just one of the divisions. The full visualization can be viewed here: Streaking: MLB Wins, Losses, and Streaks.
I hope you enjoyed the viz and the blog post. As always, if you have questions or comments, please feel free to contact me at any time.
Brilliant and a great narration!
ReplyDeleteDo this for previous years please!
ReplyDelete