Handling dates/times
The lubridate package is one of the most useful packages to handle dates and times in R. Your data may contain dates as a string of characters (class <chr>), and you need to convert them to date objects before you can do any kind of analysis.
lubridate to convert strings to Date objects
Let us look at an example: We will define Christmas as a string in various formats and will then try to convert it to a Date object so we can manipulate it.
library(lubridate)
today <- Sys.Date()
today
## [1] "2020-08-25"
class(today)
## [1] "Date"
date1 <- "25-12-2020"
date2 <- "12-25-2020"
date3 <- "2000-12-25"
date4 <- "Dec 25, 2020"
class(date1)
## [1] "character"
class(date4)
## [1] "character"
# dmy: day-month-year
xmas1 <- lubridate::dmy(date1)
class(xmas1)
## [1] "Date"
# mdy: month-day-year
xmas2 <- lubridate::mdy(date2)
class(xmas2)
## [1] "Date"
# ymd: year-month-date, ISO8601 standard
# https://en.wikipedia.org/wiki/ISO_8601
xmas3 <- ymd(date3)
class(xmas3)
## [1] "Date"
# mdy: month-day-year
xmas4 <- lubridate::mdy(date4)
class(xmas4)
## [1] "Date"
# once we have it as a Data object, we can do calculations...
xmas1 - today
## Time difference of 122 days
# ... but these calculations will not work if the date is a string (character)
date1 - today
## Error in `-.Date`(date1, today): can only subtract from "Date" objects
On your own
We’ll use data from http://www.tfl.gov.uk to analyse usage of the London Bike Sharing scheme. This data has already been downloaded for you and exists in a CSV (Comma Separated Values), along with weather information.
bike <- read_csv(here::here("data", "londonBikes.csv"))
glimpse(bike)
## Rows: 3,439
## Columns: 14
## $ date <chr> "01-01-11", "02-01-11", "03-01-11", "04-01-11", "05-0...
## $ bikes_hired <dbl> 4555, 6250, 7262, 13430, 13757, 9595, 9294, 9338, 105...
## $ season <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
## $ max_temp <dbl> 7.2, 4.0, 2.9, NA, 7.1, NA, 10.8, 10.4, 7.2, 8.9, 8.3...
## $ min_temp <dbl> NA, NA, NA, 0.3, 3.8, NA, 1.0, NA, NA, -1.9, NA, 3.7,...
## $ avg_temp <dbl> 5.6, 2.9, 1.4, 2.7, 5.6, 4.1, 6.1, 6.9, 3.1, 4.3, 5.8...
## $ avg_humidity <dbl> 84, 79, 80, 87, 84, 92, 92, 82, 79, 87, 82, 89, 89, 8...
## $ avg_pressure <dbl> 1025, 1028, 1024, 1013, 1000, 996, 999, 997, 1012, 10...
## $ avg_windspeed <dbl> 10, 8, 6, 6, 19, 5, 11, 23, 16, 14, 16, 16, 23, 24, 2...
## $ rainfall_mm <dbl> 0.0, 0.5, 0.0, 0.0, 0.0, 0.5, 11.4, 13.0, 1.0, 0.0, 7...
## $ rain <lgl> TRUE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, FAL...
## $ fog <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, FALSE,...
## $ thunderstorm <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALS...
## $ snow <lgl> FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, FALSE, FALSE,...
date is a character string, and is given as 01-01-2011, 02-01-2011, 03-01-2011, meaning 1st, 2nd, 3rd of January, etc. In other words, the format of the string is dmy, or day-month-year.