Previously, we described the essentials of R programming and provided quick start guides for importing data into R. The traditional R base functions read.table(), read.delim() and read.csv() import data into R as a data frame. However, the most modern R package readr provides several functions (read_delim(), read_tsv() and read_csv()), which are faster than R base functions and import data into R as a tbl_df (pronounced as tibble diff).
tbl_df object is a data frame providing a nicer printing method, useful when working with large data sets.
Preleminary tasks
Launch RStudio as described here: Running RStudio and setting up your working directory
Installing and loading tibble package
# Installing
install.packages("tibble")
# Loading
library("tibble")
Create a new tibble
To create a new tibble from combining multiple vectors, use the function data_frame():
# Create
friends_data <- data_frame(
name = c("Nicolas", "Thierry", "Bernard", "Jerome"),
age = c(27, 25, 29, 26),
height = c(180, 170, 185, 169),
married = c(TRUE, FALSE, TRUE, TRUE)
)
# Print
friends_data
Source: local data frame [4 x 4]
name age height married
<chr> <dbl> <dbl> <lgl>
1 Nicolas 27 180 TRUE
2 Thierry 25 170 FALSE
3 Bernard 29 185 TRUE
4 Jerome 26 169 TRUE
Compared to the traditional data.frame(), the modern data_frame():
- never converts string as factor
- never changes the names of variables
- never create row names
Convert your data as a tibble
Note that, if you use the readr package to import your data into R, then you dont need to do this step. readr imports already data as tbl_df.
To convert a traditional data as a tibble use the function as_data_frame() [in tibble package], which works on data frames, lists, matrices and tables:
library("tibble")
# Loading data
data("iris")
# Class of iris
class(iris)
[1] "data.frame"
# Print the frist 6 rows
head(iris, 6)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
# Convert iris data to a tibble
my_data <- as_data_frame(iris)
class(my_data)
[1] "tbl_df" "tbl" "data.frame"
# Print my data
my_data
Source: local data frame [150 x 5]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
<dbl> <dbl> <dbl> <dbl> <fctr>
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 5.0 3.4 1.5 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
10 4.9 3.1 1.5 0.1 setosa
.. ... ... ... ... ...
Note that, only the first 10 rows are displayed
In the situation where you want to turn a tibble back to a data frame, use the function as.data.frame(my_data).
Advantages of tibbles compared to data frames
Tibbles have nice printing method that show only the first 10 rows and all the columns that fit on the screen. This is useful when you work with large data sets.
- When printed, the data type of each column is specified (see below):
: for double : for factor : for character : for logical
my_data
Source: local data frame [150 x 5]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
<dbl> <dbl> <dbl> <dbl> <fctr>
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 5.0 3.4 1.5 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
10 4.9 3.1 1.5 0.1 setosa
.. ... ... ... ... ...
Its possible to change the default printing appearance as follow:
- Change the maximum and the minimum rows to print: options(tibble.print_max = 20, tibble.print_min = 6)
- Always show all rows: options(tibble.print_max = Inf)
- Always show all columns: options(tibble.width = Inf)
- Subsetting a tibble will always return a tibble. You dont need to use drop = FALSE compared to traditional data.frames.
Summary
Create a tibble: data_frame()
Convert your data to a tibble: as_data_frame()
- Change default printing appearance of a tibble: options(tibble.print_max = 20, tibble.print_min = 6)
Infos
This analysis has been performed using R (ver. 3.2.3).