Quantcast
Channel: Easy Guides
Viewing all articles
Browse latest Browse all 183

Tibble Data Format in R: Best and Modern Way to Work with Your Data

$
0
0



Previously, we described the essentials of R programming and provided quick start guides for importing data into R. The traditional R base functions read.table(), read.delim() and read.csv() import data into R as a data frame. However, the most modern R package readr provides several functions (read_delim(), read_tsv() and read_csv()), which are faster than R base functions and import data into R as a tbl_df (pronounced as “tibble diff”).

tbl_df object is a data frame providing a nicer printing method, useful when working with large data sets.


In this article, we’ll present the tibble R package, developed by Hadley Wickham. The tibble R package provides easy to use functions for creating tibbles, which is a modern rethinking of data frames.


tibble data format: tbl_df

Preleminary tasks

Launch RStudio as described here: Running RStudio and setting up your working directory

Installing and loading tibble package

# Installing
install.packages("tibble")

# Loading
library("tibble")

Create a new tibble

To create a new tibble from combining multiple vectors, use the function data_frame():

# Create
friends_data <- data_frame(
  name = c("Nicolas", "Thierry", "Bernard", "Jerome"),
  age = c(27, 25, 29, 26),
  height = c(180, 170, 185, 169),
  married = c(TRUE, FALSE, TRUE, TRUE)
)

# Print
friends_data
Source: local data frame [4 x 4]

     name   age height married
    <chr> <dbl>  <dbl>   <lgl>
1 Nicolas    27    180    TRUE
2 Thierry    25    170   FALSE
3 Bernard    29    185    TRUE
4  Jerome    26    169    TRUE

Compared to the traditional data.frame(), the modern data_frame():

  • never converts string as factor
  • never changes the names of variables
  • never create row names


Convert your data as a tibble

Note that, if you use the readr package to import your data into R, then you don’t need to do this step. readr imports already data as tbl_df.

To convert a traditional data as a tibble use the function as_data_frame() [in tibble package], which works on data frames, lists, matrices and tables:

library("tibble")

# Loading data
data("iris")
# Class of iris
class(iris)
[1] "data.frame"
# Print the frist 6 rows
head(iris, 6)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa
# Convert iris data to a tibble
my_data <- as_data_frame(iris)
class(my_data)
[1] "tbl_df"     "tbl"        "data.frame"
# Print my data
my_data
Source: local data frame [150 x 5]

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
          <dbl>       <dbl>        <dbl>       <dbl>  <fctr>
1           5.1         3.5          1.4         0.2  setosa
2           4.9         3.0          1.4         0.2  setosa
3           4.7         3.2          1.3         0.2  setosa
4           4.6         3.1          1.5         0.2  setosa
5           5.0         3.6          1.4         0.2  setosa
6           5.4         3.9          1.7         0.4  setosa
7           4.6         3.4          1.4         0.3  setosa
8           5.0         3.4          1.5         0.2  setosa
9           4.4         2.9          1.4         0.2  setosa
10          4.9         3.1          1.5         0.1  setosa
..          ...         ...          ...         ...     ...

Note that, only the first 10 rows are displayed

In the situation where you want to turn a tibble back to a data frame, use the function as.data.frame(my_data).

Advantages of tibbles compared to data frames

  1. Tibbles have nice printing method that show only the first 10 rows and all the columns that fit on the screen. This is useful when you work with large data sets.

  2. When printed, the data type of each column is specified (see below):
    • : for double
    • : for factor
    • : for character
    • : for logical
my_data
Source: local data frame [150 x 5]

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
          <dbl>       <dbl>        <dbl>       <dbl>  <fctr>
1           5.1         3.5          1.4         0.2  setosa
2           4.9         3.0          1.4         0.2  setosa
3           4.7         3.2          1.3         0.2  setosa
4           4.6         3.1          1.5         0.2  setosa
5           5.0         3.6          1.4         0.2  setosa
6           5.4         3.9          1.7         0.4  setosa
7           4.6         3.4          1.4         0.3  setosa
8           5.0         3.4          1.5         0.2  setosa
9           4.4         2.9          1.4         0.2  setosa
10          4.9         3.1          1.5         0.1  setosa
..          ...         ...          ...         ...     ...

It’s possible to change the default printing appearance as follow:


  • Change the maximum and the minimum rows to print: options(tibble.print_max = 20, tibble.print_min = 6)
  • Always show all rows: options(tibble.print_max = Inf)
  • Always show all columns: options(tibble.width = Inf)


  1. Subsetting a tibble will always return a tibble. You don’t need to use drop = FALSE compared to traditional data.frames.

Summary


  • Create a tibble: data_frame()

  • Convert your data to a tibble: as_data_frame()

  • Change default printing appearance of a tibble: options(tibble.print_max = 20, tibble.print_min = 6)


Infos

This analysis has been performed using R (ver. 3.2.3).


Viewing all articles
Browse latest Browse all 183

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>