Written on

How many is too many? A look at goals in the Premier League

Found in [R , data visualization , soccer] by @awhstin on

One consistent thread about this 2020 season of the Premier league that is woven through most of what I hear or read around it is that this season is mad. Mad results and mad goals. Frankly I agree and one couldn’t be surprised to feel this way when watching results like Aston Villa beating Liverpool 7-2. But the second piece about the ‘mad’ (read lots of) goals is something that I also believed especially when, while writing this, the first 0-0 draw of the season just happened. This seems wild because for the last couple seasons at least there has been a 0-0 draw the first week so anecdotally these feelings about the season are mostly warranted.

So why not try to put some numbers behind those feelings. Well mostly the second one. AND why not test out a new (to me) package echarts4r. This package by @jdatap on Twitter is an R package to leverage the Echarts Javascript library. The R package can be found at echarts4r. Another note is that of course when messing around with EPL data the engsoccerdata package by @jalapic has to be where you start.

library(tidyverse)
library(formattable)
library(lubridate)
library(echarts4r)
#devtools::install_github('jalapic/engsoccerdata')
#library(engsoccerdata)

Once you have those packages installed and loaded we can grab the data we want. There are two main pieces the first is the historical dataset england and the second is the current season which we can get from the function england_current.

raw_data <- england
current_games <- england_current()

Now we have those datasets we can transform and massage that data into something to work with. Then once we have all the data together we can summarize the total goals by week (because I play a lot of fantasy football weeks are my default) and get a running total of the previous years and the current season.

pl_all <- raw_data %>%
  bind_rows(.,current_games) %>%
  filter(division == 1 &
         Season >= max(Season)-5) %>%
  mutate(week=week(Date))

pl_totalgoals <- pl_all %>%
  group_by(Season, week) %>%
  dplyr::summarize(totgoal=sum(totgoal)) %>%
  ungroup() %>%
  group_by(Season) %>%
  dplyr::mutate(
   count = row_number(),
    run=cumsum(totgoal),
   Season = as.character(Season))

Great! I usually like to take a first glance at the data so I will use the handy function tibble to quickly to see what totals over the last few seasons look like compared to the current season. There have only been 5 weeks in this current season so that is what I will cut it off at.

cur_stats <- pl_totalgoals %>%
  filter(count<=5) %>%
  group_by(Season) %>%
  dplyr::summarize(Total=sum(totgoal)) %>%
  tibble()

cur_stats
## # A tibble: 6 x 2
##   Season Total
##   <chr>  <int>
## 1 2015     133
## 2 2016     142
## 3 2017     141
## 4 2018     169
## 5 2019     132
## 6 2020     171

Interesting that it seems this season is higher than the last few but not by much. I think it is time for a more nuanced look at this data by week using the echarts4r package. I like to visualize facets of the Premier League as cumulative sums or running totals because in regards to points and goals a team is often measured across the whole season so that is exactly what I will do here. A look at the running total of goals in the Premier League by week for the last few seasons.

The echarts4r documentation is very helpful in putting this together so as you are playing around with this keep some of these links/materials handy:

  • echarts4r: main page with tutorials, examples and more
  • Echarts JS original: the original library
  • Theme Builder: there are a lot of built in themes but to help me understand the different elements this theme builder was super helpful!


e_common(
  font_family = "IBM Plex Mono",
  theme = NULL
)

max <- list(
  name = "Max",
  type = "max"
)


pl_totalgoals %>% 
  e_charts(x = count) %>% 
  e_line(serie = run, smooth=TRUE) %>% 
  e_title("How many is too many?", "Running total of goals by week in the Premier League") %>%  # Add title & subtitle
  e_theme_custom('{"color":["#f7dc05",
            "#3d98d3",
            "#ec0b88",
            "#5e35b1",
            "#f9791e",
            "#3dd378",
            "#787464",
            "#c6c6c6",
            "#baa9d0",
            "#009688"]}') %>%
  e_legend(right = 0)  %>%
  e_tooltip(trigger = "axis") %>%
  e_highlight(series_name = "2020") %>%
  e_mark_point(serie='2020', data=max) %>% 
  e_datazoom() %>% 
  e_zoom(
    dataZoomIndex = 0,
    start = 0,
    end = 25
  ) 

This current season is higher than the last few but only time will tell if it continues on this current trajectory. The 2019 season also seems to have skyrocketed past previous years at some point as well which is interesting. There is a lot more to learn and investigate within the engsoccerdata package and every time I use it I spiral down a lot of different rabbit holes and the author of the package also did his own recent look at the current goal trend in 2020 which you can see here!