Worked examples

Exploratory Data Analysis
Data modelling

This section contains worked examples, most of them with fully annotated R code that you can use as reference.

Exploratory Data Analysis

Before we start any analysis with our data, we must learn to import a dataset, understand the variables it contains, visualise it, and manipulate it in a systematic way. This may seem tedious, but it is an essential part that should precede any statistical analysis.

Data modelling

Data modelling allows us to go beyond analysing individual variables. We try to understand and learn form our data and, as such, the most important equation is that

\[ Data = Model + error \]

We start with an Exploratory Data Analysis (EDA) for modelling and then there are three approaches we cover:

Exploratory Data Analysis (EDA) for modelling: Having imported and cleaned our data, we start looking at summary statistics and plots that will be useful in framing our modelling approach
Testing for differences in mean values across samples: How do we know whether there is a statistically significant difference between two groups A and B? E.g., between those who took a drug versus those than a placebo? Or whether there is a difference in the percentage of people who approve of Donald Trump is lower than those who disapprove of him?
Fitting a linear regression model to understand the variables that are associated with a numerical variable \(Y\). Our main interest is to use our model first for explanation, and then for prediction. We try to explain the effect that specific explanatory variables \(X\)’s have on \(Y\)
Fitting a binary classification model, where the difference is that the outcome variable \(Y\) is binary (0/1). Again we want to use our model primarily for explanation, e.g., what is the effect of different explanatory variables \(X\)’s on, e.g., the probability that someone with Covid-19 will die?

Last updated on August 14, 2020