Curtis Harris
  • Home
  • Blog
  • Resume

Iron Viz - Significant Births throughout history

3/20/2015

0 Comments

 
March is traditionally the first seed challenge into Tableau's Iron Viz competition that takes place at the annual conference. If you have ever played with Tableau, I would encourage you to enter the competition here. No matter your skill level, competing with some of the best Tableau artists will challenge you to push your own abilities to the limit. I didn't enter any of the seed competitions last year because I didn't think I could win. This year I've been more involved with the community than ever before, and realize that this is more about having fun with your peers than it is about winning.

Without further ado, my finished entry is below. You can click on the image to interact with it and see who Wikipedia considers to be a "significant birth" on your birthday.. or pretty much any day from the last 200 years!
Picture


Scraping data from 365 wikipedia pages in under 5 minutes

Wikipedia is a world of data.. maybe too much data to digest! When entering this contest I didn't know what topic to choose, but I know I wanted something with a lot of data to produce a visually appealing viz. After poking around, I stumbled on a Wikipedia page for a specific day of the year: en.wikipedia.org/en/wiki/January_1. For every day of the year, Wikipedia has a dedicated page. As an added bonus, every single page has a nice list of significant events, births and deaths! This was great news except for the fact that the data didn't paste exactly how I wanted it, and even if it did, I was going to have to do this over 350 times! 

enter python

I took this challenge as a means to start learning Python. After Googling many different solutions, I happened upon one that looks very similar to my problem and was able to come up with some fairly simple code that scraped all 365 Wikipedia pages in under 5 minutes!! (Leap day was left out due to incorrect formatting)

If you aren't interested in my rough description of how the code works, just download the Python to the right and try it for yourself!

Download the code

ironvizbirths.py
File Size: 5 kb
File Type: py
Download File

how the code works

To the right is a condensed version of the code I used to pull this data. The only thing it is missing is the list of dates in their entirety. I am going to try my hand at explaining what this is doing line by line.. I hope it is valuable! I understand if you don't read the wall of words below, it is more to help me try and explain what I did. Nothing is learned if you simply run borrowed code without trying to understand what it is doing. 

  1. Importing the BeautifulSoup package - BeautifulSoup is used for parsing HTML documents
  2. Importing the requests package - Requests is used to get information from specified URLs.
  3. Importing the json package - JSON is used to handle JSON data.. I don't really understand it, but do know enough to translate the result set.
  4. The code begins by defining a list of strings called "wikidates", but you could name this whatever you like. In this instance we want to use the syntax at the end of the Wikipedia url MonthName_DayNumber or January_1. We are defining this list because we want Python to loop through each day of the year and gather data automatically. If we didn't do this, there would be no advantage to using Python over copy and paste.
  5. The next step is to create an empty data frame named "wikibirths". We will get to why we did this on line 20 of the code.
  6. Now we start to define our function.. this is the working area of the code. For this project I created a function called "find_births" that accepts a called "wikidate" (the dates from our "wikidates" list). 
  7. Tell the function what URL to get data from, with the appended date as a URL parameter (a little Tableau-ish)
  8. response = requests.get(url) - this is simply defining a set of data called response that is made up of all the components of a web page. Think of looking at a page and pressing Control+U.
  9. soup = Soup(response.content) - taking that same data from the previous line and adding it to your soup. (no clue, but it works)
  10. births_span = soup.find("span", {"id": "Births"}) - this is where the real magic starts. BeautifulSoup is analyzing the response content and looking for the precise spot where the Births section starts in the page source. 
  11. The following few lines of code start looking for each list item (li) in the Births section of the page, strips out the html, transform it in to plain text, and sends the complete list to our empty data frame named "wikibirths". I told you I would get back to that. 
  12. Now that our function is complete, we want to tell Python to do this for every date we have defined. Line 20 of the code says, "Take a string from the wikidates data set and send it to the find_births function. When you are done running the function with that wikidate, start over with the next, and so on and so on until the list is complete. 
  13. As an added bonus, you can see I concatenated my wikidate with the response text. This gave me an extra element of data that copying and pasting couldn't accomplish.
  14. Lastly I am creating a file named wikidates.txt and sending the results from step 12 to that file in a JSON format. From there it is as easy as running the file through http://konklone.io/json/ and sending to Excel.
Picture

Building the viz

This viz is an example of taking fairly simple data and turning it into something personal and interesting. The Wikipedia scraping churned out 138,133 rows of significant births, dating back to BC times.. dates that Tableau's continuous axes can't even recognize! I went through many different iterations of chart types, color schemes, and dashboard designs before settling on my final design, and I'm really quite happy with how it turned out. Most of the dashboard is out of the box Tableau stuff, but there are a few neat tricks that I tried to fold in that you might not have seen before. So instead of going through my entire design, I just want to explain two specific elements of the dashboard.

Moving color legends

Recently the Wall Street Journal put out of viz that garnered a lot of buzz in the community that was about battling infectious diseases (find it here). While most viewers probably just looked at the heat maps, there was something hiding in there that I wanted to try in Tableau. Notice as you move over a mark in their heat map, there is an indicator at the bottom of where that value lands on the color legend. This little triangle moves along as you hover over any value in the heat map, I think that is something that has value and is something different for Tableau to try and accomplish. While I couldn't get the motion to be quite as responsive as WSJ, I was able to replicate that action using Tableau. Originally I had this action turned on for my heat map, but the hover action response was too slow so I axed it. The indicator now only shows you where you are at relative to the birth date you enter.
Picture
Picture
Building this took two worksheets. First I wanted to build my own color legend instead of using Tableau's color legend. The legend is simply showing distinct birth counts for all days in the data set, the minimum being 1 and the most significant births for any given day turned out to be 23. I needed room to the left and right so the axis was fixed to -1 and 25. 

Now I wanted to build my interactive marker to highlight your relation to the rest of the dates. To make this work I needed to limit my view to a single mark, the number of births on your birthday. To do this I have a conditional date filter that limits the range to just your birthday, as declared by the dashboard's date parameter. Now that the data is limited to your birthday, I can just place Number of Records on the Columns shelf and I have my single mark. Adjust this axis accordingly to match your custom color legend, and you triangle mark should be working. 

To implement this into my dashboard, I floated the triangle sheet on top of the color legend sheet, and turned off the title for the color legend. Doing this produces the effect that it is one worksheet doing all the work. Note.. you might be able to accomplish this using one sheet and a dual axis.. I just didn't get that far. 

If you wanted to make this more interactive like the WSJ version, you could replace the conditional date filter with a hover action from your dashboard. By doing this you will get the triangle mark to slide along the axis.. just note that it may not perform as well as you would like it to.
Picture
Picture

Action filters and conditional filters working together

Please please if you have a better way to do this let me know!!! 

When you open the dashboard, I want you to enter your birthday to personalize your experience. Entering your birthday will show you data about your birth year, birth month, birthday, and who Wikipedia has listed as a significant birth on that day. All of this is controlled by the birthday parameter and conditional filters. I didn't want to limit a viewer to just seeing the significant births on their birthday.. so I needed a way to have the birth list be controlled by the birthday parameter or by the hover action embedded into the heat map. For the life of me, I could not figure out how to tell one sheet to take the conditional filter first then override that condition by use of the hover action.

My solution involves floating two sheets on top of each other.. one controlled by the conditional filter and one controlled by the hover action. The image on the left is a sheet controlled by the birthday parameter using a conditional filter. The image on the right is an exact replica of that sheet, except it is controlled by a hover action from the heat map. The dashboard action is configured to send the date from the heat map to the worksheet, and empty the table when the selection is cleared. By floating the action controlled sheet on top of the condition controlled sheet, I get the functionality that I desire. When a user enters a new birthday, the results float to the top and the birth list is exactly relevant to the user. If that user wants to look at other significant births in their birth year, they are now free to do so as activating the action brings the new results to the front. 
Picture
Picture
Picture
0 Comments



Leave a Reply.

    Tweets by @Harris7Curtis

    Archives

    May 2016
    April 2016
    March 2016
    October 2015
    July 2015
    March 2015
    February 2015
    January 2015
    November 2014
    October 2014
    September 2014

    RSS Feed

Powered by Create your own unique website with customizable templates.