Watching the Premier League this year has been full of ups and downs (especially for an Arsenal fan) where one week is packed full of goals and the next is a true nil-nil bore. During these types of games or just any general down time I find myself diving into some of the history behind the teams I am watching or listening to some football podcasts. During a recent match between Arsenal and Manchester United I was listening to a podcast and then post-match that was discussing the history of these clubs in the context of their matches together but also just the tradition of them in general. One thing that kept coming up with regards to Manchester United (besides the fact that they were sad they lost) was that it has a very strong history of winning things but also that it seems to be THE English club.
The wins aspect is definitely true, everywhere I looked Manchester United is talked about being the most winningest (still doesn’t sound right) in many different regards. I decided though to combine these two different thoughts to ask ‘what club is the most domestic or most continental’ in terms of wins. So in a round about way that may have been too meandering for some of you this is the result of that exercise.
We don’t really need much by way of packages. Just tidyverse for data manipulation, awtools purely for plot aesthetics so you can use what you want and plotly for the interactive visual.
library(tidyverse) library(awtools) library(plotly)
This being a quick little exercise and getting the data was not all that difficult just tedious I decided to just add it to my dataset list so it is easy to play with. All of the competition data was just gathered via Wikipedia on each of the competition pages. Couple that data with the always great engsoccerdata we pretty much have everythign we need.
team_data<-read.csv('https://raw.githubusercontent.com/jalapic/engsoccerdata/master/data-raw/england_club_data.csv',stringsAsFactors = FALSE) competitions<-read.csv('https://raw.githubusercontent.com/awhstin/Dataset-List/master/epl_winningestteams.csv',stringsAsFactors = FALSE)
Finally we just bring everything together by first joining the competitions data to the team data, then using the total wins I decided to grab the top 25 clubs. Finally just adding some fields that will help arrange the data how we want it in the final plot.
#join team_comps <- competitions %>% left_join(.,team_data) totals <-team_comps %>% group_by(team) %>% tally() %>% top_n(25) team_comps <- team_comps %>% inner_join(.,totals) %>% arrange(-n,year) %>% group_by(team) %>% mutate(id=row_number()) %>% ungroup()
Now we just create the
ggplot2 object to feed to
plotly. There is the handy
text argument available which you can leverage to add our manually created tooltip to the object to pass to the final viz.
Finally here! We create the custom mouse-over labels, and pass our
ggplot object to the
t<-ggplot(team_comps, aes(reorder(team,n), 1, color=comp,text=paste0(team,' won the ',comp,' in ',year))) + coord_flip() + geom_bar(stat='identity', aes(group=-id, fill=comp), width=1, color='white', size=1 ) + a_scale_fill() + a_plex_theme(plot_title_size=27.5,grid = FALSE) + labs( x=NULL, y=NULL, fill=NULL ) + theme(panel.grid.major.x = element_line(color="#dedede", linetype = 'dotted')) ggplotly(t, tooltip = c('text'))
Winningest English Clubs
I have seen quite a few visualizations like this in regards to specific competitions but I have not seen something that includes lots of competitions (though I haven’t looked too hard) so this was truly a fun exercise to put together. The first thing I notice is that Liverpool seems to have a greater proportion of their wins coming from both domestic and continental competitions whereas Manchester United does not. Maybe they are THE English club? There seem to be some other clubs that seem to be in contention as well.