Eurostat Data

Eurostat Data with the eurostat package

The eurostat package provides access to well over 9000 datasets from the Eurostat. It may seem a challenging task to find the correct dataset, but you are essentially looking for the code that describes the dataset. We an get a table of contents, namely all of the codes contained in the eurostat database.

library(eurostat)
library(fpp2) # for time series decomposition
library(seasonal)
library(tmap) #mapping eurostat data

# Get Eurostat data listing
# Function get_eurostat_toc() downloads a table of contents of eurostat datasets. 
# The values in column ‘code’ should be used to download a selected dataset.
toc <- get_eurostat_toc()

# Check the first 20 rows 
head(toc, 20) %>% 
  kable()
title code type last update of data last table structure change data start data end values
Database by themes data folder NA NA NA NA NA
General and regional statistics general folder NA NA NA NA NA
European and national indicators for short-term analysis euroind folder NA NA NA NA NA
Business and consumer surveys (source: DG ECFIN) ei_bcs folder NA NA NA NA NA
Consumer surveys (source: DG ECFIN) ei_bcs_cs folder NA NA NA NA NA
Consumers - monthly data ei_bsco_m dataset 28.05.2021 28.05.2021 1980M01 2021M05 NA
Consumers - quarterly data ei_bsco_q dataset 28.05.2021 29.04.2021 1990Q1 2021Q2 NA
Business surveys - NACE Rev. 2 activity (source: DG ECFIN) ei_bcs_bs folder NA NA NA NA NA
Industry - monthly data ei_bsin_m_r2 dataset 28.05.2021 28.05.2021 1980M01 2021M05 NA
Industry - quarterly data ei_bsin_q_r2 dataset 28.05.2021 29.04.2021 1980Q1 2021Q2 NA
Construction - monthly data ei_bsbu_m_r2 dataset 28.05.2021 03.06.2021 1980M01 2021M05 NA
Construction - quarterly data ei_bsbu_q_r2 dataset 28.05.2021 29.04.2021 1981Q1 2021Q2 NA
Retail sale - monthly data ei_bsrt_m_r2 dataset 28.05.2021 28.05.2021 1984M01 2021M05 NA
Sentiment indicators - monthly data ei_bssi_m_r2 dataset 28.05.2021 18.06.2021 1980M01 2021M05 NA
Services - monthly data ei_bsse_m_r2 dataset 28.05.2021 18.06.2021 1988M01 2021M05 NA
Services - quarterly data ei_bsse_q_r2 dataset 28.05.2021 29.04.2021 2001Q2 2021Q2 NA
Euro-zone Business Climate Indicator - monthly data ei_bsci_m_r2 dataset 28.05.2021 28.05.2021 1985M01 2021M05 NA
Financial services - monthly data ei_bsfs_m dataset 28.05.2021 03.06.2021 2006M04 2021M05 NA
Financial services - quarterly data ei_bsfs_q dataset 28.05.2021 03.06.2021 2007Q3 2021Q2 NA
Employment expectations indicator ei_bsee_m_r2 dataset 28.05.2021 28.05.2021 1980M01 2021M05 NA

House Price Index (HPI)

The Eurostat House Price Index (HPI) measures price changes of all residential properties purchased by households (flats, detached houses, terraced houses, etc.), both new and existing, independently of their final use and their previous owners. First, we node that the code id for this dataset is teicp270. Once we know the relevant code id, we can download eurostat data using the get_eurostat(id) function.

hpi <- get_eurostat(id="teicp270")
glimpse(hpi)
## Rows: 1,256
## Columns: 5
## $ indic  <chr> "TOTAL", "TOTAL", "TOTAL", "TOTAL", "TOTAL", "TOTAL", "TOTAL", ~
## $ unit   <chr> "I15_NSA", "I15_NSA", "I15_NSA", "I15_NSA", "I15_NSA", "I15_NSA~
## $ geo    <chr> "AT", "BE", "BG", "CY", "CZ", "DE", "DK", "EA", "EA19", "EE", "~
## $ time   <date> 2018-01-01, 2018-01-01, 2018-01-01, 2018-01-01, 2018-01-01, 20~
## $ values <dbl> 117.5, 107.7, 120.7, 103.1, 125.0, 118.3, 114.0, 111.3, 111.3, ~
head(hpi,40) %>% 
  kable()
indic unit geo time values
TOTAL I15_NSA AT 2018-01-01 117.5
TOTAL I15_NSA BE 2018-01-01 107.7
TOTAL I15_NSA BG 2018-01-01 120.7
TOTAL I15_NSA CY 2018-01-01 103.1
TOTAL I15_NSA CZ 2018-01-01 125.0
TOTAL I15_NSA DE 2018-01-01 118.3
TOTAL I15_NSA DK 2018-01-01 114.0
TOTAL I15_NSA EA 2018-01-01 111.3
TOTAL I15_NSA EA19 2018-01-01 111.3
TOTAL I15_NSA EE 2018-01-01 115.2
TOTAL I15_NSA ES 2018-01-01 115.0
TOTAL I15_NSA EU 2018-01-01 112.4
TOTAL I15_NSA EU27_2020 2018-01-01 112.2
TOTAL I15_NSA EU28 2018-01-01 112.4
TOTAL I15_NSA FI 2018-01-01 102.7
TOTAL I15_NSA FR 2018-01-01 105.4
TOTAL I15_NSA HR 2018-01-01 109.4
TOTAL I15_NSA HU 2018-01-01 138.7
TOTAL I15_NSA IE 2018-01-01 127.3
TOTAL I15_NSA IS 2018-01-01 138.8
TOTAL I15_NSA IT 2018-01-01 98.6
TOTAL I15_NSA LT 2018-01-01 119.8
TOTAL I15_NSA LU 2018-01-01 116.8
TOTAL I15_NSA LV 2018-01-01 126.0
TOTAL I15_NSA MT 2018-01-01 111.2
TOTAL I15_NSA NL 2018-01-01 119.9
TOTAL I15_NSA NO 2018-01-01 113.6
TOTAL I15_NSA PL 2018-01-01 109.5
TOTAL I15_NSA PT 2018-01-01 125.6
TOTAL I15_NSA RO 2018-01-01 116.1
TOTAL I15_NSA SE 2018-01-01 113.2
TOTAL I15_NSA SI 2018-01-01 118.0
TOTAL I15_NSA SK 2018-01-01 119.6
TOTAL I15_NSA TR 2018-01-01 130.9
TOTAL I15_NSA UK 2018-01-01 113.6
TOTAL PCH_Q1_NSA AT 2018-01-01 0.8
TOTAL PCH_Q1_NSA BE 2018-01-01 0.0
TOTAL PCH_Q1_NSA BG 2018-01-01 0.9
TOTAL PCH_Q1_NSA CY 2018-01-01 -2.0
TOTAL PCH_Q1_NSA CZ 2018-01-01 2.2

Typically, the downloaded data has codes and abbreviations for all of the variables, but we can use label_eurostat to get a more verbose description.

house_price_index_data <-  hpi %>% 
  label_eurostat()

head(house_price_index_data,40) %>% 
  kable()
indic unit geo time values
Total Index, 2015=100 (NSA) Austria 2018-01-01 117.5
Total Index, 2015=100 (NSA) Belgium 2018-01-01 107.7
Total Index, 2015=100 (NSA) Bulgaria 2018-01-01 120.7
Total Index, 2015=100 (NSA) Cyprus 2018-01-01 103.1
Total Index, 2015=100 (NSA) Czechia 2018-01-01 125.0
Total Index, 2015=100 (NSA) Germany (until 1990 former territory of the FRG) 2018-01-01 118.3
Total Index, 2015=100 (NSA) Denmark 2018-01-01 114.0
Total Index, 2015=100 (NSA) Euro area (EA11-1999, EA12-2001, EA13-2007, EA15-2008, EA16-2009, EA17-2011, EA18-2014, EA19-2015) 2018-01-01 111.3
Total Index, 2015=100 (NSA) Euro area - 19 countries (from 2015) 2018-01-01 111.3
Total Index, 2015=100 (NSA) Estonia 2018-01-01 115.2
Total Index, 2015=100 (NSA) Spain 2018-01-01 115.0
Total Index, 2015=100 (NSA) European Union (EU6-1958, EU9-1973, EU10-1981, EU12-1986, EU15-1995, EU25-2004, EU27-2007, EU28-2013, EU27-2020) 2018-01-01 112.4
Total Index, 2015=100 (NSA) European Union - 27 countries (from 2020) 2018-01-01 112.2
Total Index, 2015=100 (NSA) European Union - 28 countries (2013-2020) 2018-01-01 112.4
Total Index, 2015=100 (NSA) Finland 2018-01-01 102.7
Total Index, 2015=100 (NSA) France 2018-01-01 105.4
Total Index, 2015=100 (NSA) Croatia 2018-01-01 109.4
Total Index, 2015=100 (NSA) Hungary 2018-01-01 138.7
Total Index, 2015=100 (NSA) Ireland 2018-01-01 127.3
Total Index, 2015=100 (NSA) Iceland 2018-01-01 138.8
Total Index, 2015=100 (NSA) Italy 2018-01-01 98.6
Total Index, 2015=100 (NSA) Lithuania 2018-01-01 119.8
Total Index, 2015=100 (NSA) Luxembourg 2018-01-01 116.8
Total Index, 2015=100 (NSA) Latvia 2018-01-01 126.0
Total Index, 2015=100 (NSA) Malta 2018-01-01 111.2
Total Index, 2015=100 (NSA) Netherlands 2018-01-01 119.9
Total Index, 2015=100 (NSA) Norway 2018-01-01 113.6
Total Index, 2015=100 (NSA) Poland 2018-01-01 109.5
Total Index, 2015=100 (NSA) Portugal 2018-01-01 125.6
Total Index, 2015=100 (NSA) Romania 2018-01-01 116.1
Total Index, 2015=100 (NSA) Sweden 2018-01-01 113.2
Total Index, 2015=100 (NSA) Slovenia 2018-01-01 118.0
Total Index, 2015=100 (NSA) Slovakia 2018-01-01 119.6
Total Index, 2015=100 (NSA) Turkey 2018-01-01 130.9
Total Index, 2015=100 (NSA) United Kingdom 2018-01-01 113.6
Total Percentage change q/q-1 (NSA) Austria 2018-01-01 0.8
Total Percentage change q/q-1 (NSA) Belgium 2018-01-01 0.0
Total Percentage change q/q-1 (NSA) Bulgaria 2018-01-01 0.9
Total Percentage change q/q-1 (NSA) Cyprus 2018-01-01 -2.0
Total Percentage change q/q-1 (NSA) Czechia 2018-01-01 2.2

We note that our dataframe contains both the value of the index (unit = I15_NSA), as well as the percentage change (unit = PCH_Q1_NSA). We will select the I15_NSA index, a few countries and the EU-28 index, and plot the evolution of house prices over time.

hpi_data <- hpi %>% 
  
  # choose the UK, France, Poland, Spain, Portugal, Germany, Italy, and the EU28
  filter(geo %in%  c("UK", "FR", "PL", "ES","PT", "DE","IT","EU28") ) %>%  
  
  # choose value of the index (unit =   `I15_NSA`) 
    filter(unit == "I15_NSA")

ggplot(hpi_data, aes(x=time, y=values, group=geo, colour=geo))+
  geom_point()+
  geom_line()+
  theme_bw()+
  labs(
    title= "House price index in the EU (2015 = 100)",
    x = "Time",
    y = "Housing Price Index", 
    caption = "Source: Eurostat, code id = teicp270"
  )

Tourism Seasonality in the Meditteranean

The eurostat database has a dedicated tourism section. I wanted to check monthly nights spent at hotels– the relevant code id = tour_occ_nim in the four Mediterranean countries, Portugal, Spain, Italy, and Greece since 2000.

The code below downloads the data and plots time series plots for all countries.

# create a dataframe tourism_data that contains the Eurostat data for
# code id = "tour_occ_nim", namely value of monthly nights spent at hotels
tourism_data <- get_eurostat(id="tour_occ_nim")

med_tourism <-  tourism_data %>%   
  
  # choose Portugal, Spain, Italy, and Greece
  filter(geo %in%  c("PT", "ES", "IT", "EL" ) ) %>%
  
  #use label_eurostat to get verbose descriptions of codes
  label_eurostat() %>% 
  
  # choose number of total hotel accommodations since Jan 1, 2000
  filter (c_resid == "Total", 
          nace_r2 == "Hotels and similar accommodation", 
          unit == "Number",
          time >= "2000-01-01") %>% 
  
  # express values in million of nights
  mutate(values = values/1000000) 

ggplot(med_tourism, aes(x=time, y=values, group=geo, colour=geo))+
  geom_point()+
  geom_line()+
  geom_smooth(se=FALSE)+
  facet_wrap(~geo)+
  theme_bw()+
  labs(title="Hotel stays in the Medditeranean, 2000-present", 
       y= "Millions of nights spent in hotels",
       x = "Year",
       caption = "Source: Eurostat, code = tour_occ_nim")+
  theme(legend.position="none")

All countries exhibit the same seasonal pattern: there is a peak in July-August, and the minimum number is around December-January.

Look at the impact of Covid-19 on all countries!

#first define **ts** (time series ) objects; one for each country  

portugal_tourism <- med_tourism %>% 
  
  #select the country you are interested in, in this case Portugal
  filter (geo == "Portugal") %>% 
  
  #sort by time in ascending order, so  earliest observation is first
  arrange(time) %>%
  
  #we just want to keep the values 
  select(values) %>% 
  
  #time series (ts) starts Jan 2000 and has monthlyfrequency (12 months/yr)
  ts(start=2000, frequency = 12) 



spain_tourism <- med_tourism %>% 
  filter (geo == "Spain") %>% 
  arrange(time) %>% 
  select(values) %>% 
  ts(start=2000, frequency = 12)

italy_tourism <- med_tourism %>% 
  filter (geo == "Italy") %>% 
  arrange(time) %>% 
  select(values) %>%   
  ts(start=2000, frequency = 12)

greece_tourism <- med_tourism %>% 
  filter (geo == "Greece") %>% 
  arrange(time) %>% 
  select(values) %>%   
  ts(start=2000, frequency = 12)


#Season plot for Spain and Greece: the seasonal pattern is consistent since 2000
ggseasonplot(spain_tourism, year.labels=TRUE, year.labels.left=TRUE) +
  labs(
    title = "Seasonal plot: Hotel stays in Spain",
    y = "Millions of nights spent in hotels"
  )+
    theme_bw()

ggseasonplot(greece_tourism, year.labels=TRUE, year.labels.left=TRUE) +
  labs(
    title = "Seasonal plot: Hotel stays in Greece",
    y = "Millions of nights spent in hotels"
  )+
  theme_bw()

An interesting question is which country has the greatest seasonality distortion, namely, how much bigger is the summer peak from the winter bottom. For this we produce a subseries plot, one that emphasises the seasonal patterns and where the data for each season are collected together in separate mini time plots. The horizontal lines indicate the means for each month. This form of plot enables the underlying seasonal pattern to be seen clearly, and also shows the changes in seasonality over time. It is especially useful in identifying changes within particular seasons.

ggsubseriesplot(portugal_tourism)+
  labs(
    title = "Seasonal subseries plot: Hotel stays in Portugal 2000-present",
    subtitle = "Horizontal lines indicate monthly averages",
    y = "Millions of nights spent in hotels", 
    caption = "Source:Eurostat"
  )+
  theme_bw()

ggsubseriesplot(spain_tourism)+
  labs(
    title = "Seasonal subseries plot: Hotel stays in Spain 2000-present",
    subtitle = "Horizontal lines indicate monthly averages",
    y = "Millions of nights spent in hotels", 
    caption = "Source:Eurostat"
  )+
  theme_bw()

ggsubseriesplot(italy_tourism)+
  labs(
    title = "Seasonal subseries plot: Hotel stays in Italy 2000-present",
    subtitle = "Horizontal lines indicate monthly averages",
    y = "Millions of nights spent in hotels", 
        caption = "Source:Eurostat"
  )+
  theme_bw()

ggsubseriesplot(greece_tourism)+
  labs(
    title = "Seasonal subseries plot: Hotel stays in Greece 2000-present",
    subtitle = "Horizontal lines indicate monthly averages",
    y = "Millions of nights spent in hotels", 
    caption = "Source:Eurostat"
  )+
  theme_bw()

Visually, the approximate ratio of max:min averages for each of the four Mediterranean countries is as follows:

  • Portugal 6:2 = 3
  • Spain 39:13 = 3
  • Italy 43:10 = 4.3
  • Greece 13.5:1= 13.5

Disposable income of private households by NUTS 2 regions

Using the eurostat data, we can create maps of, e.g., disposable income at a regional level. NUTS or Nomenclature of Territorial Units for Statistics is a geocode standard for referencing subdivisions (regions, counties, districts, etc.) within a country.

We will work with the Disposable income of private households by NUTS 2 regions database

income_data <- get_eurostat(id="tgs00026") %>% 
  select(geo,time,values) %>% 
  dplyr::mutate(cat = cut_to_classes(values))


income_2016 <- income_data %>% 
  filter(time == "2016-01-01")

# Download geospatial data from GISCO
geodata <- get_eurostat_geospatial(output_class = "sf",
                                   resolution = "60",
                                   nuts_level = 2,
                                   year = 2016) 


map_data <- inner_join(geodata, income_2016)


ggplot(data=map_data) + geom_sf(aes(fill=cat),color="dim grey", size=.1) + 
  scale_fill_brewer(palette = "Accent") +
  guides(fill = guide_legend(reverse=T, title = "euro")) +
  labs(title="Disposable household income in 2016",
       caption="(C) EuroGeographics for the administrative boundaries 
                Map produced in R with a help from Eurostat-package <github.com/ropengov/eurostat/>") +
  theme_light() + theme(legend.position=c(.8,.8)) +
  coord_sf(xlim=c(-12,44), ylim=c(35,70))

Acknowledgments