Written on

Ebb & flow: using Google Trends data to explore the opioid epidemic

Found in [R , data visualization , google] by @awhstin on

Upon reading the news of the recent guilty plea and settlement by Purdue for $8billion this story thrust a news story that I had unfortunately lost touch with back into the spotlight. The opioid epidemic. A story that for some parts of 2017 and 2018 was front and center of the American news cycle had seemingly all but completely dropped away until this settlement news came out.

With all that is happening in 2020 it is understandable that I could not stay up on every news story so I thought I would take some time to re-educate myself on where the opioid epidemic is now. A recent article written on 7/15/20 from The Upshot, ‘In Shadow of Pandemic, U.S. Drug Overdose Deaths Resurge to Record’, describes the trend being exacerbated by coronavirus and that 2020 looks to be one of the worst. So if things are not getting better where does this story lie in the public eye I thought surely others are more on top of this than I was.

I decided to take a look at some data, namely Google Trends. Luckily there is a super easy to use package gtrendsR.

Here are a couple details about the trends data:

Google Trends normalizes search data to make comparisons between terms easier. Search results are normalized to the time and location of a query by the following process:
* Each data point is divided by the total searches of the geography and time range it represents to compare relative popularity. Otherwise, places with the most search volume would always be ranked highest.
* The resulting numbers are then scaled on a range of 0 to 100 based on a topic’s proportion to all searches on all topics.
* Different regions that show the same search interest for a term don’t always have the same total search volumes.

Finally this is how I read these searches and what they mean to my understanding:

opioid symptoms: I think this search has to do with use, people searching to see what symptoms of opioid overuse are
opioid treatment: This search I believe speaks to those who know someone who has a problem and are looking for help
opioid epidemic: Finally this I believe gauges the public interest in the epidemic as a whole

Of course there are probably very large holes in my assertions about these trends but you have to start somewhere. The reason I am using these three is to try and gauge the difference between the issue which is captured within the symptoms and treatment searches and the news which is captured by the opioid epidemic. Sadly my assumption is that the searches for symptoms and treatment will still be fairly high but the interest in the issue from the public will be fairly low.

code

The first thing I want to do is get the three separate trend queries in. I leverage the gtrendsR::gtrends function to search for the last 5 years for trends within the US which makes getting this data pretty straightforward. Here is a link to the different options within the function.

library(tidyverse)
library(gtrendsR)
library(echarts4r)

symptoms <- gtrends(c("opioid symptoms"),
                     gprop = "web",
                     time = "today+5-y",
                     geo=c("US"),
                     onlyInterest = TRUE)

treatment <- gtrends(c("opioid treatment"),
                   gprop = "web",
                   time = "today+5-y",
                   geo=c("US"),
                   onlyInterest = TRUE)

epidemic <- gtrends(c("opioid epidemic"),
                    gprop = "web",
                    time = "today+5-y",
                    geo=c("US"),
                    onlyInterest = TRUE)

Once we have the data it is just the matter of isolating the data we want, namely the interest data, and then combining them all.

#combine
wth <- symptoms[1] %>%
  data.frame(stringsAsFactors = FALSE) %>%
  mutate(origin='symptoms')

sym <- treatment[1] %>%
  data.frame(stringsAsFactors = FALSE) %>%
  mutate(origin='treatment')

epi <- epidemic[1] %>%
  data.frame(stringsAsFactors = FALSE) %>%
  mutate(origin='epidemic')

interest <- wth %>%
  bind_rows(.,sym,epi)

Finally it is just a decision over how to visualize this data. I am going to use the echarts4r package because I have enjoyed experiementing with it recently but what type of chart? In my head I picture the google trends data as a fluid stream of consciousness where the topics go in and out of the public eye so I landed on a river, or stream, chart.

e_common(
  font_family = "IBM Plex Mono",
  theme = NULL
)

interest %>%
  group_by(origin) %>%
  e_charts(x = interest_over_time.date) %>%
  e_river(interest_over_time.hits, rm_y=TRUE)  %>%
  e_theme_custom('{"color":[
            "#784fca",
            "#67dc95",
            "#ca784f"]}') %>%
  e_tooltip(trigger = "axis") %>%
  e_legend(right = 0, type='scroll')  %>%
  e_title("Google Trends: The Opioid Epidemic",
          'A detailed look at google searches for opioid symptoms, treatment and epidemic.')


Again the first thing I have to say is that I don’t believe that my assumptions for these trends is perfect. But at first glance my theory that while there was a peak of trends of people searching for information about the epidemic it has since trailed off but trends for symptoms and treatment have essentially continued at pace or even grown. Like I mentioned in the beginning of this the story of the opioid epidemic fell off my radar and in a year like 2020 I think that might happen to a lot of people with tons of different news stories. One point I want to make is that when, inevitably, news stories come in and out of your attention it is important to read and form opinions of our own. The ability to test these opinions has never been better (in my opinion) and along with your favorite news sources these sort of independent opinions create a conduit for real honest discourse on important topics. That is something I think we can all agree we need more of.