Eurostat Data
Eurostat Data with the eurostat package
The eurostat package provides access to well over 9000 datasets from the Eurostat. It may seem a challenging task to find the correct dataset, but you are essentially looking for the code that describes the dataset. We an get a table of contents, namely all of the codes contained in the eurostat database.
library(eurostat)
library(fpp2) # for time series decomposition
library(seasonal)
library(tmap) #mapping eurostat data
# Get Eurostat data listing
# Function get_eurostat_toc() downloads a table of contents of eurostat datasets.
# The values in column ‘code’ should be used to download a selected dataset.
toc <- get_eurostat_toc()
# Check the first 20 rows
head(toc, 20) %>%
kable()
| title | code | type | last update of data | last table structure change | data start | data end | values |
|---|---|---|---|---|---|---|---|
| Database by themes | data | folder | NA | NA | NA | NA | NA |
| General and regional statistics | general | folder | NA | NA | NA | NA | NA |
| European and national indicators for short-term analysis | euroind | folder | NA | NA | NA | NA | NA |
| Business and consumer surveys (source: DG ECFIN) | ei_bcs | folder | NA | NA | NA | NA | NA |
| Consumer surveys (source: DG ECFIN) | ei_bcs_cs | folder | NA | NA | NA | NA | NA |
| Consumers - monthly data | ei_bsco_m | dataset | 28.05.2021 | 28.05.2021 | 1980M01 | 2021M05 | NA |
| Consumers - quarterly data | ei_bsco_q | dataset | 28.05.2021 | 29.04.2021 | 1990Q1 | 2021Q2 | NA |
| Business surveys - NACE Rev. 2 activity (source: DG ECFIN) | ei_bcs_bs | folder | NA | NA | NA | NA | NA |
| Industry - monthly data | ei_bsin_m_r2 | dataset | 28.05.2021 | 28.05.2021 | 1980M01 | 2021M05 | NA |
| Industry - quarterly data | ei_bsin_q_r2 | dataset | 28.05.2021 | 29.04.2021 | 1980Q1 | 2021Q2 | NA |
| Construction - monthly data | ei_bsbu_m_r2 | dataset | 28.05.2021 | 03.06.2021 | 1980M01 | 2021M05 | NA |
| Construction - quarterly data | ei_bsbu_q_r2 | dataset | 28.05.2021 | 29.04.2021 | 1981Q1 | 2021Q2 | NA |
| Retail sale - monthly data | ei_bsrt_m_r2 | dataset | 28.05.2021 | 28.05.2021 | 1984M01 | 2021M05 | NA |
| Sentiment indicators - monthly data | ei_bssi_m_r2 | dataset | 28.05.2021 | 18.06.2021 | 1980M01 | 2021M05 | NA |
| Services - monthly data | ei_bsse_m_r2 | dataset | 28.05.2021 | 18.06.2021 | 1988M01 | 2021M05 | NA |
| Services - quarterly data | ei_bsse_q_r2 | dataset | 28.05.2021 | 29.04.2021 | 2001Q2 | 2021Q2 | NA |
| Euro-zone Business Climate Indicator - monthly data | ei_bsci_m_r2 | dataset | 28.05.2021 | 28.05.2021 | 1985M01 | 2021M05 | NA |
| Financial services - monthly data | ei_bsfs_m | dataset | 28.05.2021 | 03.06.2021 | 2006M04 | 2021M05 | NA |
| Financial services - quarterly data | ei_bsfs_q | dataset | 28.05.2021 | 03.06.2021 | 2007Q3 | 2021Q2 | NA |
| Employment expectations indicator | ei_bsee_m_r2 | dataset | 28.05.2021 | 28.05.2021 | 1980M01 | 2021M05 | NA |
House Price Index (HPI)
The Eurostat House Price Index (HPI) measures price changes of all residential properties purchased by households (flats, detached houses, terraced houses, etc.), both new and existing, independently of their final use and their previous owners. First, we node that the code id for this dataset is teicp270. Once we know the relevant code id, we can download eurostat data using the get_eurostat(id) function.
hpi <- get_eurostat(id="teicp270")
glimpse(hpi)
## Rows: 1,256
## Columns: 5
## $ indic <chr> "TOTAL", "TOTAL", "TOTAL", "TOTAL", "TOTAL", "TOTAL", "TOTAL", ~
## $ unit <chr> "I15_NSA", "I15_NSA", "I15_NSA", "I15_NSA", "I15_NSA", "I15_NSA~
## $ geo <chr> "AT", "BE", "BG", "CY", "CZ", "DE", "DK", "EA", "EA19", "EE", "~
## $ time <date> 2018-01-01, 2018-01-01, 2018-01-01, 2018-01-01, 2018-01-01, 20~
## $ values <dbl> 117.5, 107.7, 120.7, 103.1, 125.0, 118.3, 114.0, 111.3, 111.3, ~
head(hpi,40) %>%
kable()
| indic | unit | geo | time | values |
|---|---|---|---|---|
| TOTAL | I15_NSA | AT | 2018-01-01 | 117.5 |
| TOTAL | I15_NSA | BE | 2018-01-01 | 107.7 |
| TOTAL | I15_NSA | BG | 2018-01-01 | 120.7 |
| TOTAL | I15_NSA | CY | 2018-01-01 | 103.1 |
| TOTAL | I15_NSA | CZ | 2018-01-01 | 125.0 |
| TOTAL | I15_NSA | DE | 2018-01-01 | 118.3 |
| TOTAL | I15_NSA | DK | 2018-01-01 | 114.0 |
| TOTAL | I15_NSA | EA | 2018-01-01 | 111.3 |
| TOTAL | I15_NSA | EA19 | 2018-01-01 | 111.3 |
| TOTAL | I15_NSA | EE | 2018-01-01 | 115.2 |
| TOTAL | I15_NSA | ES | 2018-01-01 | 115.0 |
| TOTAL | I15_NSA | EU | 2018-01-01 | 112.4 |
| TOTAL | I15_NSA | EU27_2020 | 2018-01-01 | 112.2 |
| TOTAL | I15_NSA | EU28 | 2018-01-01 | 112.4 |
| TOTAL | I15_NSA | FI | 2018-01-01 | 102.7 |
| TOTAL | I15_NSA | FR | 2018-01-01 | 105.4 |
| TOTAL | I15_NSA | HR | 2018-01-01 | 109.4 |
| TOTAL | I15_NSA | HU | 2018-01-01 | 138.7 |
| TOTAL | I15_NSA | IE | 2018-01-01 | 127.3 |
| TOTAL | I15_NSA | IS | 2018-01-01 | 138.8 |
| TOTAL | I15_NSA | IT | 2018-01-01 | 98.6 |
| TOTAL | I15_NSA | LT | 2018-01-01 | 119.8 |
| TOTAL | I15_NSA | LU | 2018-01-01 | 116.8 |
| TOTAL | I15_NSA | LV | 2018-01-01 | 126.0 |
| TOTAL | I15_NSA | MT | 2018-01-01 | 111.2 |
| TOTAL | I15_NSA | NL | 2018-01-01 | 119.9 |
| TOTAL | I15_NSA | NO | 2018-01-01 | 113.6 |
| TOTAL | I15_NSA | PL | 2018-01-01 | 109.5 |
| TOTAL | I15_NSA | PT | 2018-01-01 | 125.6 |
| TOTAL | I15_NSA | RO | 2018-01-01 | 116.1 |
| TOTAL | I15_NSA | SE | 2018-01-01 | 113.2 |
| TOTAL | I15_NSA | SI | 2018-01-01 | 118.0 |
| TOTAL | I15_NSA | SK | 2018-01-01 | 119.6 |
| TOTAL | I15_NSA | TR | 2018-01-01 | 130.9 |
| TOTAL | I15_NSA | UK | 2018-01-01 | 113.6 |
| TOTAL | PCH_Q1_NSA | AT | 2018-01-01 | 0.8 |
| TOTAL | PCH_Q1_NSA | BE | 2018-01-01 | 0.0 |
| TOTAL | PCH_Q1_NSA | BG | 2018-01-01 | 0.9 |
| TOTAL | PCH_Q1_NSA | CY | 2018-01-01 | -2.0 |
| TOTAL | PCH_Q1_NSA | CZ | 2018-01-01 | 2.2 |
Typically, the downloaded data has codes and abbreviations for all of the variables, but we can use label_eurostat to get a more verbose description.
house_price_index_data <- hpi %>%
label_eurostat()
head(house_price_index_data,40) %>%
kable()
| indic | unit | geo | time | values |
|---|---|---|---|---|
| Total | Index, 2015=100 (NSA) | Austria | 2018-01-01 | 117.5 |
| Total | Index, 2015=100 (NSA) | Belgium | 2018-01-01 | 107.7 |
| Total | Index, 2015=100 (NSA) | Bulgaria | 2018-01-01 | 120.7 |
| Total | Index, 2015=100 (NSA) | Cyprus | 2018-01-01 | 103.1 |
| Total | Index, 2015=100 (NSA) | Czechia | 2018-01-01 | 125.0 |
| Total | Index, 2015=100 (NSA) | Germany (until 1990 former territory of the FRG) | 2018-01-01 | 118.3 |
| Total | Index, 2015=100 (NSA) | Denmark | 2018-01-01 | 114.0 |
| Total | Index, 2015=100 (NSA) | Euro area (EA11-1999, EA12-2001, EA13-2007, EA15-2008, EA16-2009, EA17-2011, EA18-2014, EA19-2015) | 2018-01-01 | 111.3 |
| Total | Index, 2015=100 (NSA) | Euro area - 19 countries (from 2015) | 2018-01-01 | 111.3 |
| Total | Index, 2015=100 (NSA) | Estonia | 2018-01-01 | 115.2 |
| Total | Index, 2015=100 (NSA) | Spain | 2018-01-01 | 115.0 |
| Total | Index, 2015=100 (NSA) | European Union (EU6-1958, EU9-1973, EU10-1981, EU12-1986, EU15-1995, EU25-2004, EU27-2007, EU28-2013, EU27-2020) | 2018-01-01 | 112.4 |
| Total | Index, 2015=100 (NSA) | European Union - 27 countries (from 2020) | 2018-01-01 | 112.2 |
| Total | Index, 2015=100 (NSA) | European Union - 28 countries (2013-2020) | 2018-01-01 | 112.4 |
| Total | Index, 2015=100 (NSA) | Finland | 2018-01-01 | 102.7 |
| Total | Index, 2015=100 (NSA) | France | 2018-01-01 | 105.4 |
| Total | Index, 2015=100 (NSA) | Croatia | 2018-01-01 | 109.4 |
| Total | Index, 2015=100 (NSA) | Hungary | 2018-01-01 | 138.7 |
| Total | Index, 2015=100 (NSA) | Ireland | 2018-01-01 | 127.3 |
| Total | Index, 2015=100 (NSA) | Iceland | 2018-01-01 | 138.8 |
| Total | Index, 2015=100 (NSA) | Italy | 2018-01-01 | 98.6 |
| Total | Index, 2015=100 (NSA) | Lithuania | 2018-01-01 | 119.8 |
| Total | Index, 2015=100 (NSA) | Luxembourg | 2018-01-01 | 116.8 |
| Total | Index, 2015=100 (NSA) | Latvia | 2018-01-01 | 126.0 |
| Total | Index, 2015=100 (NSA) | Malta | 2018-01-01 | 111.2 |
| Total | Index, 2015=100 (NSA) | Netherlands | 2018-01-01 | 119.9 |
| Total | Index, 2015=100 (NSA) | Norway | 2018-01-01 | 113.6 |
| Total | Index, 2015=100 (NSA) | Poland | 2018-01-01 | 109.5 |
| Total | Index, 2015=100 (NSA) | Portugal | 2018-01-01 | 125.6 |
| Total | Index, 2015=100 (NSA) | Romania | 2018-01-01 | 116.1 |
| Total | Index, 2015=100 (NSA) | Sweden | 2018-01-01 | 113.2 |
| Total | Index, 2015=100 (NSA) | Slovenia | 2018-01-01 | 118.0 |
| Total | Index, 2015=100 (NSA) | Slovakia | 2018-01-01 | 119.6 |
| Total | Index, 2015=100 (NSA) | Turkey | 2018-01-01 | 130.9 |
| Total | Index, 2015=100 (NSA) | United Kingdom | 2018-01-01 | 113.6 |
| Total | Percentage change q/q-1 (NSA) | Austria | 2018-01-01 | 0.8 |
| Total | Percentage change q/q-1 (NSA) | Belgium | 2018-01-01 | 0.0 |
| Total | Percentage change q/q-1 (NSA) | Bulgaria | 2018-01-01 | 0.9 |
| Total | Percentage change q/q-1 (NSA) | Cyprus | 2018-01-01 | -2.0 |
| Total | Percentage change q/q-1 (NSA) | Czechia | 2018-01-01 | 2.2 |
We note that our dataframe contains both the value of the index (unit = I15_NSA), as well as the percentage change (unit = PCH_Q1_NSA). We will select the I15_NSA index, a few countries and the EU-28 index, and plot the evolution of house prices over time.
hpi_data <- hpi %>%
# choose the UK, France, Poland, Spain, Portugal, Germany, Italy, and the EU28
filter(geo %in% c("UK", "FR", "PL", "ES","PT", "DE","IT","EU28") ) %>%
# choose value of the index (unit = `I15_NSA`)
filter(unit == "I15_NSA")
ggplot(hpi_data, aes(x=time, y=values, group=geo, colour=geo))+
geom_point()+
geom_line()+
theme_bw()+
labs(
title= "House price index in the EU (2015 = 100)",
x = "Time",
y = "Housing Price Index",
caption = "Source: Eurostat, code id = teicp270"
)

Tourism Seasonality in the Meditteranean
The eurostat database has a dedicated tourism section.
I wanted to check monthly nights spent at hotels– the relevant code id = tour_occ_nim in the four Mediterranean countries, Portugal, Spain, Italy, and Greece since 2000.
The code below downloads the data and plots time series plots for all countries.
# create a dataframe tourism_data that contains the Eurostat data for
# code id = "tour_occ_nim", namely value of monthly nights spent at hotels
tourism_data <- get_eurostat(id="tour_occ_nim")
med_tourism <- tourism_data %>%
# choose Portugal, Spain, Italy, and Greece
filter(geo %in% c("PT", "ES", "IT", "EL" ) ) %>%
#use label_eurostat to get verbose descriptions of codes
label_eurostat() %>%
# choose number of total hotel accommodations since Jan 1, 2000
filter (c_resid == "Total",
nace_r2 == "Hotels and similar accommodation",
unit == "Number",
time >= "2000-01-01") %>%
# express values in million of nights
mutate(values = values/1000000)
ggplot(med_tourism, aes(x=time, y=values, group=geo, colour=geo))+
geom_point()+
geom_line()+
geom_smooth(se=FALSE)+
facet_wrap(~geo)+
theme_bw()+
labs(title="Hotel stays in the Medditeranean, 2000-present",
y= "Millions of nights spent in hotels",
x = "Year",
caption = "Source: Eurostat, code = tour_occ_nim")+
theme(legend.position="none")

All countries exhibit the same seasonal pattern: there is a peak in July-August, and the minimum number is around December-January.
Look at the impact of Covid-19 on all countries!
#first define **ts** (time series ) objects; one for each country
portugal_tourism <- med_tourism %>%
#select the country you are interested in, in this case Portugal
filter (geo == "Portugal") %>%
#sort by time in ascending order, so earliest observation is first
arrange(time) %>%
#we just want to keep the values
select(values) %>%
#time series (ts) starts Jan 2000 and has monthlyfrequency (12 months/yr)
ts(start=2000, frequency = 12)
spain_tourism <- med_tourism %>%
filter (geo == "Spain") %>%
arrange(time) %>%
select(values) %>%
ts(start=2000, frequency = 12)
italy_tourism <- med_tourism %>%
filter (geo == "Italy") %>%
arrange(time) %>%
select(values) %>%
ts(start=2000, frequency = 12)
greece_tourism <- med_tourism %>%
filter (geo == "Greece") %>%
arrange(time) %>%
select(values) %>%
ts(start=2000, frequency = 12)
#Season plot for Spain and Greece: the seasonal pattern is consistent since 2000
ggseasonplot(spain_tourism, year.labels=TRUE, year.labels.left=TRUE) +
labs(
title = "Seasonal plot: Hotel stays in Spain",
y = "Millions of nights spent in hotels"
)+
theme_bw()

ggseasonplot(greece_tourism, year.labels=TRUE, year.labels.left=TRUE) +
labs(
title = "Seasonal plot: Hotel stays in Greece",
y = "Millions of nights spent in hotels"
)+
theme_bw()

An interesting question is which country has the greatest seasonality distortion, namely, how much bigger is the summer peak from the winter bottom. For this we produce a subseries plot, one that emphasises the seasonal patterns and where the data for each season are collected together in separate mini time plots. The horizontal lines indicate the means for each month. This form of plot enables the underlying seasonal pattern to be seen clearly, and also shows the changes in seasonality over time. It is especially useful in identifying changes within particular seasons.
ggsubseriesplot(portugal_tourism)+
labs(
title = "Seasonal subseries plot: Hotel stays in Portugal 2000-present",
subtitle = "Horizontal lines indicate monthly averages",
y = "Millions of nights spent in hotels",
caption = "Source:Eurostat"
)+
theme_bw()

ggsubseriesplot(spain_tourism)+
labs(
title = "Seasonal subseries plot: Hotel stays in Spain 2000-present",
subtitle = "Horizontal lines indicate monthly averages",
y = "Millions of nights spent in hotels",
caption = "Source:Eurostat"
)+
theme_bw()

ggsubseriesplot(italy_tourism)+
labs(
title = "Seasonal subseries plot: Hotel stays in Italy 2000-present",
subtitle = "Horizontal lines indicate monthly averages",
y = "Millions of nights spent in hotels",
caption = "Source:Eurostat"
)+
theme_bw()

ggsubseriesplot(greece_tourism)+
labs(
title = "Seasonal subseries plot: Hotel stays in Greece 2000-present",
subtitle = "Horizontal lines indicate monthly averages",
y = "Millions of nights spent in hotels",
caption = "Source:Eurostat"
)+
theme_bw()

Visually, the approximate ratio of max:min averages for each of the four Mediterranean countries is as follows:
- Portugal 6:2 = 3
- Spain 39:13 = 3
- Italy 43:10 = 4.3
- Greece 13.5:1= 13.5
Disposable income of private households by NUTS 2 regions
Using the eurostat data, we can create maps of, e.g., disposable income at a regional level. NUTS or Nomenclature of Territorial Units for Statistics is a geocode standard for referencing subdivisions (regions, counties, districts, etc.) within a country.
We will work with the Disposable income of private households by NUTS 2 regions database
income_data <- get_eurostat(id="tgs00026") %>%
select(geo,time,values) %>%
dplyr::mutate(cat = cut_to_classes(values))
income_2016 <- income_data %>%
filter(time == "2016-01-01")
# Download geospatial data from GISCO
geodata <- get_eurostat_geospatial(output_class = "sf",
resolution = "60",
nuts_level = 2,
year = 2016)
map_data <- inner_join(geodata, income_2016)
ggplot(data=map_data) + geom_sf(aes(fill=cat),color="dim grey", size=.1) +
scale_fill_brewer(palette = "Accent") +
guides(fill = guide_legend(reverse=T, title = "euro")) +
labs(title="Disposable household income in 2016",
caption="(C) EuroGeographics for the administrative boundaries
Map produced in R with a help from Eurostat-package <github.com/ropengov/eurostat/>") +
theme_light() + theme(legend.position=c(.8,.8)) +
coord_sf(xlim=c(-12,44), ylim=c(35,70))

Acknowledgments
- This page is derived in part from Tutorial (vignette) for the
eurostatR package.