aw.stats

On this website I use awtools which is a light (read not fully built) aesthetics package for all the charts and visuals. Every once in awhile I like to make tweaks so I thought I could take a minute to display some of the edits I made. Most the changes are to the color palettes, but there are a few spacing edits as well as tweaks to dark theme so why not makes a few charts. To do this we are going to need some data. Not too long ago I came across an article, The Cost of Hurricane Harvey: Only One Recent Storm Comes Close, which was discussed at length on Twitter for its visualization merit and I have long been interested in the data so that seems like a great dataset to revisit.

library(rvest)
library(ggplot2)
library(dplyr)
library(awtools) 
library(ggbeeswarm)
library(ggforce)

#Natural Disasters from the NYT article
disasters <- read.csv('https://static01.nyt.com/newsgraphics/2017/08/29/expensive-storms/79088630ae1af934d7840e104a0e3f1e8a6c7bf1/data-2.tsv', 
                      sep='\t', 
                      stringsAsFactors = FALSE)

costs <- disasters %>%
  mutate(
    Disaster=case_when(
      col2 == '#397dc2' ~ 'Hurricane',
      col2 == '#efba2b' | col2 == '#9b0e11' ~ 'Drought/Fire',
      col2 == '#699d8f' ~ 'Flooding',
      col2 == '#9d76b0' ~ 'Storm',
      col2 == '#61c6e2' ~ 'Winter Storm'
    )
  )

Now that the data is grouped I think the first question I would want to answer is a simple one: what does the distribution of disasters look like? I will group by disasters then plot them along a timeline so why not also use the new a_dark_theme which is my default theme using the IBM Plex Mono with a dark background and a couple other minor tweaks. We can size by the cost of the disaster and try to see if we notice any trends.

ggplot(costs, aes(x=year,y=Disaster,color=Disaster))+
  ggbeeswarm::geom_quasirandom(alpha=.75,aes(size=cost),groupOnX = FALSE, show.legend = FALSE)+
  a_dark_theme() +
  a_main_color() +
  labs(title='Billion Dollar Natural Disasters',
       subtitle='The most costly naturdal disasters from 1980 - 2017',
       y='',
       caption='Data: New York Times\nOriginal Article: "The Cost of Hurricane Harvey: Only One Recent Storm Comes Close"')

Although it looks like there may be a slight upward trend in number of disasters overall there are clear increases in the Drought/Fire and Storm groups. This makes me ask then what do these sorts of trends do to the cost of the disasters, maybe something like a running total of cost of disaster year over year.

We already have the data so let’s do it.

total.cost <- costs %>%
  group_by(Disaster) %>%
  arrange(Disaster,year) %>%
  mutate(total = cumsum(cost))

Now that we have the running total of cost we need to display it. I think some sort of hybrid line chart that also has points indicating the disasters themselves.

#running total
ggplot(total.cost, aes(x=year,
                       y=total,
                       fill = Disaster, 
                       group = Disaster)) +   
  geom_line(size=.5,aes(color=Disaster)) +
  geom_point(aes(color=Disaster)) +
  a_plex_theme() +
  a_main_fill() +
  a_main_color() +
  labs(title='Natural Disasters and Runaway Cost',
       subtitle='Running total of cost of natural disasters (over $1 billion in estimated cost) from 1980 - 2017',
       caption='Data: New York Times\nOriginal Article: "The Cost of Hurricane Harvey: Only One Recent Storm Comes Close"')

I see that the cost of hurricanes (even if I were to remove Katrina) have been rising a bit more rapidly than the other groups. Similarly we see that within the groups the steps seem to get larger since 2010.

My final question I want to ask of the data is a bit self serving and was inspired by this tweet.

I am super excited to finally give ggforce the update it deserves. Read all about the new CRAN release here #rstats #ggplot2 https://t.co/9XNxmqldQT
— Thomas Lin Pedersen (@thomasp85) March 7, 2019

In this release there are tons of awesome new features but one I wanted to try was the new marking geoms that are included. To do this I will grab the top ten disasters in cost by group and plot them out over the years.

top.ten<-costs %>%
  group_by(Disaster) %>%
  top_n(n = 10, wt = cost)

labels<-costs %>%
  group_by(Disaster) %>%
  top_n(n = 1, wt = cost) %>%
  filter(!Disaster %in% c('Winter Storm','Flooding'))

Now we can use the new geom_mark_circle to label a few.

#running total
ggplot(top.ten, aes(x=year, y=cost, color = Disaster, group = Disaster)) +   
  geom_point() +
  geom_mark_circle(data=labels, aes(label=name,
                                    color=Disaster, 
                                    fill=NA),
                   label.family = 'IBM Plex Mono',
                   label.fontsize = 8,
                   label.fill = NA,
                   label.fontface = 'plain') +
  a_plex_theme() +
  a_main_fill() +
  a_main_color() +
  labs(title='Most Costly Disasters',
       subtitle='Top ten natural disasters (over $1 billion in estimated cost) by group 1980 - 2017',
       caption='Data: New York Times\nOriginal Article: "The Cost of Hurricane Harvey: Only One Recent Storm Comes Close"')