aw.stats

Every once in awhile the internet gifts me a little inspiration rather than the normal disappointment. I have done a few posts in the past based on inspiration from around the web like this one on confederate monuments or this one looking at temperature trends with the ggridges package. This time as I was browsing Twitter I found this tweet by The Economist:

Democracy is in decline in Turkey and Russia. One reason for this is the erosion of civil liberties pic.twitter.com/lUDrFGQq4R
— The Economist (@TheEconomist) January 9, 2019

The Democracy Index data, from The Economist Intelligence unit, shows the declining trend in the countries of Turkey and Russia. The thing that hit me was the image of the chart and how easy it was to intepret the data, the context, and in general I really enjoyed the presentation so I sat down and decided to make my own version of the chart.

I have long been interested in curating compelling data to use on my site here and I was recently reminded of The World Bank website and decided to take a look at countries and their unemployment rate trends.

Getting the data

The World Bank website is a great resource for lots of different pieces of data and thankfully the one we are interested in, Unemployment, total (% of total labor force), is available for download so grab the csv and put it somewhere you can access it. Instead of looking at all the countries included in that file I want to subset it by something and luckily I had the perfect thing just lying around. Using this file of the top 50 most populous countries in the world (around about 2017) we can winnow that larger country list down.

library(tidyverse)
library(awtools)
library(shiny)

unemploy<-read.csv('unemploy.csv',stringsAsFactors = FALSE)

pop<-read.csv('https://raw.githubusercontent.com/awhstin/Dataset-List/master/pop50.csv', stringsAsFactors = FALSE)

Now in our environment we have two dataframes. One is unemploy that is the csv that can be downloaded from The World Bank link above. The other is pop which is a list of the top 50 most populous countries I have had lying around for awhile but recently added to Github.

If we inspect the unemploy dataframe we see that it is in a wide format where the years are columns but we want it in a long format where the years are in one column and each rate by year is a row. Luckily the tidyr::gather function can help us here. Then we will get only the years on or after 2010, and filter by the list of 50 most populous countries.

The next part we want to do is generate a grouping variable for the rates. Finally we generate some variables to help with our final plot, here I decided to highlight the United States values specifically.

#manipulate and filter
unemploy<-unemploy %>%
  gather(year, rate, 5:62) %>%
  mutate(year = gsub('X', '', year),
         rate = round(rate, 3)) %>%
  filter(year >= 2010 &
         Country.Name %in% pop$Country) 

unemployment<-unemploy %>%
  mutate_at(.funs = funs(level = ntile(.,8)),
            .vars = vars(year , rate)) %>%
  mutate(alph = ifelse(Country.Name == 'United States', 1, 0),
         phil = factor(rate_level, levels = 1:8),
         labl = ifelse(Country.Name == 'United States', rate, NA))

Now if we look at the unemployment dataframe we can see that it is in long format and the variables we created are there. I had some fun naming them so it should be pretty self explanatory where they go!

ggplot(unemployment, aes(x = year, y = rate, group = rate, fill = phil, alpha = alph))+ 
  geom_col(position = 'fill', color = 'white', show.legend = FALSE)+
  geom_label(aes(label = labl, x = year, y = rate), 
             position='fill',
             size = 4, 
             color= '#333333', 
             family = 'IBM Plex Mono',
             show.legend = FALSE) +
  scale_alpha_continuous(range = c(.35,.95)) +
  a_plex_theme(grid = FALSE, emphasis = 'x') +
  scale_fill_brewer(palette = 'Spectral', direction = -1) +
  theme(axis.title.y = element_blank(),
        axis.text.y = element_blank(),
        axis.ticks.y = element_blank(),
        axis.title.x = element_blank()) +
  scale_x_discrete(position = "top") +
  labs(title=paste('United States Unemployment Rate (2010-2017)'),
       subtitle='Share of the labor force that is without work but\navailable for and seeking employment',
       caption='Data: World Bank\n
       https://data.worldbank.org/indicator/SL.UEM.TOTL.ZS')

I am pretty pleased with the result. The bands of rates creates a pretty interesting effect and the geom_label works perfectly in this situation since it mirrors the alpha. If you change the country you are looking at, or even select a few and use facet_grid it paints an interesting picture but I think it needs something more. With pretty much the same code as above we can make a little Shiny app so we can select the country we are interested in.

Let’s use Shiny and get interactive with it!