Visualise data

Introduction to ggplot2 by visualising numeric data.

We will start with the gapminder data set. We look at its contents

glimpse(gapminder)
## Rows: 1,704
## Columns: 6
## $ country   <fct> Afghanistan, Afghanistan, Afghanistan, Afghanistan, Afgha...
## $ continent <fct> Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asi...
## $ year      <int> 1952, 1957, 1962, 1967, 1972, 1977, 1982, 1987, 1992, 199...
## $ lifeExp   <dbl> 28.801, 30.332, 31.997, 34.020, 36.088, 38.438, 39.854, 4...
## $ pop       <int> 8425333, 9240934, 10267083, 11537966, 13079460, 14880372,...
## $ gdpPercap <dbl> 779.4453, 820.8530, 853.1007, 836.1971, 739.9811, 786.113...

Scatter plots and multiple panels using facet_wrap()

Animating changes

Racing bars! We will create a simple bar graph showing the evolution of GDP per capita for the top 8 countries

IMDB movie ratings: Scatterplots and relationships

For this section, we will use a sample of movies released since 2000 with data from IMDB. We have data on movies from the following six genres:

  • Action
  • Adventure
  • Comedy
  • Drama
  • Animation
  • Documentary
imdb <- read_csv(here::here("data", "movies.csv"))
imdb_short <- imdb %>% 
  filter(genre %in% c("Action", "Adventure", "Comedy", "Drama", "Animation", "Documentary"),
         year >= 2000)
glimpse(imdb_short)
## Rows: 1,762
## Columns: 11
## $ title               <chr> "Avatar", "Jurassic World", "The Avengers", "Th...
## $ genre               <chr> "Action", "Action", "Action", "Action", "Action...
## $ director            <chr> "James Cameron", "Colin Trevorrow", "Joss Whedo...
## $ year                <dbl> 2009, 2015, 2012, 2008, 2015, 2012, 2004, 2013,...
## $ duration            <dbl> 178, 124, 173, 152, 141, 164, 93, 146, 151, 103...
## $ gross               <dbl> 760505847, 652177271, 623279547, 533316061, 458...
## $ budget              <dbl> 2.37e+08, 1.50e+08, 2.20e+08, 1.85e+08, 2.50e+0...
## $ cast_facebook_likes <dbl> 4834, 8458, 87697, 57802, 92000, 106759, 1148, ...
## $ votes               <dbl> 886204, 418214, 995415, 1676169, 462669, 114433...
## $ reviews             <dbl> 3777, 1934, 2425, 5312, 1752, 3514, 688, 1208, ...
## $ rating              <dbl> 7.9, 7.0, 8.1, 9.0, 7.5, 8.5, 7.2, 7.6, 7.3, 8....

IMDB movie ratings: Boxplots, violin plots

Let us consider the rating movies got according to their genre. How can we visualise the distribution of ratings?

ggplot(imdb_short,
       aes(x=rating, y = genre, fill = genre,  alpha = 0.2))+
  geom_boxplot()+
  theme_minimal()+
  theme(legend.position = "none")

ggplot(imdb_short,
       aes(x=rating, y = genre, fill = genre,  alpha = 0.2))+
  geom_violin()+
  theme_minimal()+
  theme(legend.position = "none")

Multiple panels using facet_wrap() and facet_grid()

imdb_short %>% 
  filter(genre %in% c("Action", "Comedy", "Drama"),
         year >= 2010) %>% 
ggplot(aes(x=rating,  fill = genre,  alpha = 0.2))+
  geom_boxplot()+
  theme_minimal()+
  theme(legend.position = "none")+
  facet_grid(
    rows= vars(year),
    cols= vars(genre)
  )

imdb_short %>% 
  filter(genre %in% c("Action", "Comedy", "Drama"),
         year >= 2010) %>% 
ggplot(aes(x=rating,  fill = genre,  alpha = 0.2))+
  geom_boxplot()+
  theme_minimal()+
  theme(legend.position = "none")+
  facet_grid(
    rows= vars(cut(budget, 3)),
    cols= vars(genre)
  )