2  Packages

One of the best features of R are the large number of contributed packages from the statistical community. The list of all packages on CRAN is impressive! Take a look at it here. You find R packages for almost any statistical method out there. Many statisticians provide R packages to accompany their research. Some packages also provide additional functionality for R or include datasets.

2.1 The xts package

Let us look at a time series specific package: the xts package. It can be installed using the install.packages() function.

The xts package provides the class xts, which has certain advantages over ts. A ts object can specify the frequency of a time series only as a portion of a year (1 for yearly, 4 for quarterly, 12 for monthly data). This scheme is convenient for regular macroeconomic time series but impractical for daily data (leap year problem), high-frequency data, or irregularly collected data. In an xts object, we are much more flexible and manually assign a specific time index to each observation in the time series.

Once installed, the package only has to be loaded at the beginning of a new R session, which is done with the command library(xts).

library(xts)
myts = ts(c(66, 76, 55, 12, 4, 66, 8, 99), start = 2020, frequency = 4)
as.xts(myts)  # convert a ts object into an xts object
        [,1]
2020 Q1   66
2020 Q2   76
2020 Q3   55
2020 Q4   12
2021 Q1    4
2021 Q2   66
2021 Q3    8
2021 Q4   99
# we may assign irregular time points:
dates = seq.Date(as.Date("2023-01-01"), by = 7, length.out = 7)
dates[8] = as.Date("2023-03-01")
my.xts = xts(myts, dates)
plot(my.xts)

2.2 Data packages

For teaching, I have created the package teachingdata, which contains some current datasets. The package is not available on CRAN (your package must meet specific quality standards and go through a review process to be accepted there), but I have created a GitHub repository to make it accessible. We need the package remotes and its function install_github() to install a package from a GitHub repository.

install.packages("remotes")
remotes::install_github("ottosven/teachingdata")

Let’s have a closer look at the data from the teachingdata package.

library(teachingdata)
data(package = "teachingdata")
plot(gdp, main = "Quarterly GDP Germany")
plot(infl, main="Monthly CPI inflation rate Germany")
plot(temp, main="Average temperature Germany")
plot(covidcases$date, covidcases$GER, type="l", 
     main="Incidence number of reported Covid-19 infections Germany")

2.3 The tidyverse

The tidyverse is a collection of packages that lets you import, manipulate, explore, visualize, and model data in a harmonized and consistent way.

Installing the tidyverse package:

install.packages("tidyverse")

In this lecture, we will mainly use R to theoretically understand the learned statistical and econometric methods and apply them illustratively. For this purpose, base R is entirely sufficient. However, tidyverse has become state of the art for applied work with large data sets and is especially recommended for data management and visualization.

To give you a flavor of the tidyverse, let us briefly discuss the ggplot2 and tibble packages, which are part of the tidyverse.

Nice plots can be produced using the R-package ggplot2. Let’s plot the iris dataset, which is contained in base R.

class(iris) # iris is a data.frame
[1] "data.frame"
iris |> 
  ggplot(aes(x = Sepal.Length, y = Petal.Length, color = Species)) + 
  geom_point()

A data.frame in the tidyverse is called tibble. A tibble is sometimes more flexible and convenient for manipulating and printing data. Let’s transform the iris data frame into a tibble.

iris.tbl = as_tibble(iris) 
iris.tbl # iris.tbl is a tibble
# A tibble: 150 × 5
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
          <dbl>       <dbl>        <dbl>       <dbl> <fct>  
 1          5.1         3.5          1.4         0.2 setosa 
 2          4.9         3            1.4         0.2 setosa 
 3          4.7         3.2          1.3         0.2 setosa 
 4          4.6         3.1          1.5         0.2 setosa 
 5          5           3.6          1.4         0.2 setosa 
 6          5.4         3.9          1.7         0.4 setosa 
 7          4.6         3.4          1.4         0.3 setosa 
 8          5           3.4          1.5         0.2 setosa 
 9          4.4         2.9          1.4         0.2 setosa 
10          4.9         3.1          1.5         0.1 setosa 
# ℹ 140 more rows

As an extension, a tsibble object is a tibble with an additional time series structure. It contains a specific index variable corresponding to the observation’s time index. Let us convert the covidcases data into a tsibble. To visualize a tsibble we also need the fable package.

In a tsibble object, we can define so-called key variables, which define the subjects or individuals measured over time. Key variables also allow easy processing of panel data in R.

In the covidcases example, the key variables are the federal states, and the time series is the incidence numbers. Since a simultaneous display of the curves of all federal states would produce a very cluttered plot, we select only the total Germany, Nordrhein-Westfalen, and Berlin. The different steps can be represented in tidyverse as a sequence of multiple operations using the pipe operator |> (other pipes like %>%do a similar job).

covid.tsibble = as_tsibble(covidcases, index=date) |>
  pivot_longer(-date, names_to = "state", values_to = "incidence") |>
  filter(state %in% c("GER", "NW", "BE"))
covid.tsibble
# A tsibble: 1,689 x 3 [1D]
# Key:       state [3]
   date       state incidence
   <date>     <chr>     <dbl>
 1 2021-09-11 BE         83.5
 2 2021-09-11 NW        103. 
 3 2021-09-11 GER        82.7
 4 2021-09-12 BE         84.3
 5 2021-09-12 NW        101. 
 6 2021-09-12 GER        80.1
 7 2021-09-13 BE         83.7
 8 2021-09-13 NW         99.3
 9 2021-09-13 GER        81.8
10 2021-09-14 BE         84.9
# ℹ 1,679 more rows
covid.tsibble |>
  autoplot(incidence) + theme(axis.title.x=element_blank())

For an introduction to the tidyverse and to learn more about the packages and functions used above, have a look at the book R for Data Science. To learn more about visualizing and analyzing time series data using the tsibble and fable packages, I recommend the textbook Forecasting: principles and practice.