Written on

How did we get here? Three different Premier League stories.

Found in [R , Premier League] by @awhstin on

This season of the English Premier League has been nothing short of fantastic. Even though Manchester City has run away with the title (playing beautiful football in the process) pretty much all the other positions in the table are up for grabs. As an ardent Arsenal fan it hasn’t been my favorite season with Arsenal currently sitting in sixth but for other clubs it has been a banner year. One of the clubs who is having a great year is Burnley, who were promoted for the 16-17 season, finished bottom half in that season with 40 points and have now found themselves seventh already surpassing last year’s points. So the question I want to ask is how did we get here? What path and direction did both Arsenal and Burnley take throughout their tenure in England’s Football Leagues? Joining along for the ride is Leicester who shocked the football world winning the Premier League in the 15-16 season and find themselves in eighth.

One of my favorite datasets to work with is the engsoccerdata package which contains complete soccer datasets and the one we will be focusing on is england which contains the results of all top 4 tier soccer games in England 1888-2017. (We will add their current 2018 positions later)

Before we start, this season I have been barely keeping up with my predictions for the Premier League but if you want to check that page out I would appreciate it!

library(engsoccerdata) #devtools::install_github('jalapic/engsoccerdata')
library(plotly) #for the final interactive viz
library(awtools) #Optional - just for the theme

Now that we have the packages loaded we are going to take some shortcuts and calculate the tables for each division and season.

team<-england %>%
  gather(team,club,3:4) %>%
  mutate(win=ifelse(team=='home' & result == 'H',3,
                    ifelse(team=='visitor' & result == 'A',3,

tables<-team %>%
  group_by(division,Season,club) %>%
  summarise(points=sum(win)) %>%
  arrange(division,Season,desc(points)) %>%
  group_by(Season) %>% 
  mutate(position = row_number())

What happened there is that for every result we calculated the points based on who won. Using dplyr we ordered, grouped, and operated within those structures to add an position or id for each team in a season. I picture it like this using the current table as an example, Manchester City are winning the Premier League so they would get position 1, and Wolverhampton Wolves are winning the Championship which is the next league down. There are 20 teams in the Premier League, so the Wolves then would have a position of 21.

Now that we have that order we can get the specific teams I mentioned earlier: Arsenal, Burnley, and Leicester.

teams<-tables %>%
  filter(club %in% c('Arsenal','Burnley','Leicester City')) %>%

#add current data
                    club=c('Arsenal','Burnley','Leicester City'),

Now we have our specific team data we plot the timeline. We could use straight ggplot2 to plot this and make a really good looking static chart but one thing that is great about the plotly package for R is how easily it takes your ggplots and translates them into the plotly.js framwork. Just add plotly::ggplotly.

timeline<-ggplot(teams, aes(Season, position, color = club))+
  geom_point(alpha=.35, cex=.75)+
  scale_x_continuous(position=c('top'), breaks = seq(1888, 2018, by=10))+ 
  scale_color_manual(values=c("#D80919", "#91d2f2", "#FAB92C"))+
  labs(x='Season', y='Position (Divison 1-4)')

#display it with ggplotly
ggplotly(timeline, tooltip = c("Season","club","position"))

There you have it. Three teams who find themselves next to one another in the 2018 season. One is a top flight team that is struggling. Another is a team that, like a phoenix, has risen all the way from Division 4 to find itself in the Premier League top 10. And the last has meandered between the top two leagues felling giants, losing easy ones, and doing the unthinkable only to forget about that the very next season.