March is traditionally the first seed challenge into Tableau's Iron Viz competition that takes place at the annual conference. If you have ever played with Tableau, I would encourage you to enter the competition here. No matter your skill level, competing with some of the best Tableau artists will challenge you to push your own abilities to the limit. I didn't enter any of the seed competitions last year because I didn't think I could win. This year I've been more involved with the community than ever before, and realize that this is more about having fun with your peers than it is about winning.
Without further ado, my finished entry is below. You can click on the image to interact with it and see who Wikipedia considers to be a "significant birth" on your birthday.. or pretty much any day from the last 200 years!
Without further ado, my finished entry is below. You can click on the image to interact with it and see who Wikipedia considers to be a "significant birth" on your birthday.. or pretty much any day from the last 200 years!
Scraping data from 365 wikipedia pages in under 5 minutes
Wikipedia is a world of data.. maybe too much data to digest! When entering this contest I didn't know what topic to choose, but I know I wanted something with a lot of data to produce a visually appealing viz. After poking around, I stumbled on a Wikipedia page for a specific day of the year: en.wikipedia.org/en/wiki/January_1. For every day of the year, Wikipedia has a dedicated page. As an added bonus, every single page has a nice list of significant events, births and deaths! This was great news except for the fact that the data didn't paste exactly how I wanted it, and even if it did, I was going to have to do this over 350 times!
enter pythonI took this challenge as a means to start learning Python. After Googling many different solutions, I happened upon one that looks very similar to my problem and was able to come up with some fairly simple code that scraped all 365 Wikipedia pages in under 5 minutes!! (Leap day was left out due to incorrect formatting) If you aren't interested in my rough description of how the code works, just download the Python to the right and try it for yourself! | Download the code![]()
|
how the code works
To the right is a condensed version of the code I used to pull this data. The only thing it is missing is the list of dates in their entirety. I am going to try my hand at explaining what this is doing line by line.. I hope it is valuable! I understand if you don't read the wall of words below, it is more to help me try and explain what I did. Nothing is learned if you simply run borrowed code without trying to understand what it is doing.
|
Building the viz
This viz is an example of taking fairly simple data and turning it into something personal and interesting. The Wikipedia scraping churned out 138,133 rows of significant births, dating back to BC times.. dates that Tableau's continuous axes can't even recognize! I went through many different iterations of chart types, color schemes, and dashboard designs before settling on my final design, and I'm really quite happy with how it turned out. Most of the dashboard is out of the box Tableau stuff, but there are a few neat tricks that I tried to fold in that you might not have seen before. So instead of going through my entire design, I just want to explain two specific elements of the dashboard.
Moving color legends
Recently the Wall Street Journal put out of viz that garnered a lot of buzz in the community that was about battling infectious diseases (find it here). While most viewers probably just looked at the heat maps, there was something hiding in there that I wanted to try in Tableau. Notice as you move over a mark in their heat map, there is an indicator at the bottom of where that value lands on the color legend. This little triangle moves along as you hover over any value in the heat map, I think that is something that has value and is something different for Tableau to try and accomplish. While I couldn't get the motion to be quite as responsive as WSJ, I was able to replicate that action using Tableau. Originally I had this action turned on for my heat map, but the hover action response was too slow so I axed it. The indicator now only shows you where you are at relative to the birth date you enter.
Building this took two worksheets. First I wanted to build my own color legend instead of using Tableau's color legend. The legend is simply showing distinct birth counts for all days in the data set, the minimum being 1 and the most significant births for any given day turned out to be 23. I needed room to the left and right so the axis was fixed to -1 and 25.
Now I wanted to build my interactive marker to highlight your relation to the rest of the dates. To make this work I needed to limit my view to a single mark, the number of births on your birthday. To do this I have a conditional date filter that limits the range to just your birthday, as declared by the dashboard's date parameter. Now that the data is limited to your birthday, I can just place Number of Records on the Columns shelf and I have my single mark. Adjust this axis accordingly to match your custom color legend, and you triangle mark should be working.
To implement this into my dashboard, I floated the triangle sheet on top of the color legend sheet, and turned off the title for the color legend. Doing this produces the effect that it is one worksheet doing all the work. Note.. you might be able to accomplish this using one sheet and a dual axis.. I just didn't get that far.
If you wanted to make this more interactive like the WSJ version, you could replace the conditional date filter with a hover action from your dashboard. By doing this you will get the triangle mark to slide along the axis.. just note that it may not perform as well as you would like it to.
Now I wanted to build my interactive marker to highlight your relation to the rest of the dates. To make this work I needed to limit my view to a single mark, the number of births on your birthday. To do this I have a conditional date filter that limits the range to just your birthday, as declared by the dashboard's date parameter. Now that the data is limited to your birthday, I can just place Number of Records on the Columns shelf and I have my single mark. Adjust this axis accordingly to match your custom color legend, and you triangle mark should be working.
To implement this into my dashboard, I floated the triangle sheet on top of the color legend sheet, and turned off the title for the color legend. Doing this produces the effect that it is one worksheet doing all the work. Note.. you might be able to accomplish this using one sheet and a dual axis.. I just didn't get that far.
If you wanted to make this more interactive like the WSJ version, you could replace the conditional date filter with a hover action from your dashboard. By doing this you will get the triangle mark to slide along the axis.. just note that it may not perform as well as you would like it to.
Action filters and conditional filters working together
Please please if you have a better way to do this let me know!!!
When you open the dashboard, I want you to enter your birthday to personalize your experience. Entering your birthday will show you data about your birth year, birth month, birthday, and who Wikipedia has listed as a significant birth on that day. All of this is controlled by the birthday parameter and conditional filters. I didn't want to limit a viewer to just seeing the significant births on their birthday.. so I needed a way to have the birth list be controlled by the birthday parameter or by the hover action embedded into the heat map. For the life of me, I could not figure out how to tell one sheet to take the conditional filter first then override that condition by use of the hover action.
My solution involves floating two sheets on top of each other.. one controlled by the conditional filter and one controlled by the hover action. The image on the left is a sheet controlled by the birthday parameter using a conditional filter. The image on the right is an exact replica of that sheet, except it is controlled by a hover action from the heat map. The dashboard action is configured to send the date from the heat map to the worksheet, and empty the table when the selection is cleared. By floating the action controlled sheet on top of the condition controlled sheet, I get the functionality that I desire. When a user enters a new birthday, the results float to the top and the birth list is exactly relevant to the user. If that user wants to look at other significant births in their birth year, they are now free to do so as activating the action brings the new results to the front.
When you open the dashboard, I want you to enter your birthday to personalize your experience. Entering your birthday will show you data about your birth year, birth month, birthday, and who Wikipedia has listed as a significant birth on that day. All of this is controlled by the birthday parameter and conditional filters. I didn't want to limit a viewer to just seeing the significant births on their birthday.. so I needed a way to have the birth list be controlled by the birthday parameter or by the hover action embedded into the heat map. For the life of me, I could not figure out how to tell one sheet to take the conditional filter first then override that condition by use of the hover action.
My solution involves floating two sheets on top of each other.. one controlled by the conditional filter and one controlled by the hover action. The image on the left is a sheet controlled by the birthday parameter using a conditional filter. The image on the right is an exact replica of that sheet, except it is controlled by a hover action from the heat map. The dashboard action is configured to send the date from the heat map to the worksheet, and empty the table when the selection is cleared. By floating the action controlled sheet on top of the condition controlled sheet, I get the functionality that I desire. When a user enters a new birthday, the results float to the top and the birth list is exactly relevant to the user. If that user wants to look at other significant births in their birth year, they are now free to do so as activating the action brings the new results to the front.