- Pleleminary tasks
- Install and load dplyr package for renaming columns
- dplyr::mutate(): Add new variables by preserving existing ones
- dplyr::transmute(): Make new variables by dropping existing ones
- Use mutate() and transmute() programmatically inside a function:
- transform(): R base function to compute and add new variables
- Summary
- Related articles
- Infos
Previously, we described the essentials of R programming and provided quick start guides for importing data into R as well as converting your data into a tibble data format, which is modern convention way to work with your data. We also described crutial steps to reshape your data with R for easier analyses.
- mutate(): Computes and adds new variable(s). Preserves existing variables. Its similar to the R base function transform().
- transmute(): Computes new variable(s). Drops existing variables.
Figure adapted from RStudio data wrangling cheatsheet
Pleleminary tasks
Launch RStudio as described here: Running RStudio and setting up your working directory
Prepare your data as described here: Best practices for preparing your data and save it in an external .txt tab or .csv files
Import your data into R as described here: Fast reading of data from txt|csv files into R: readr package.
Here, well use the R built-in iris data set, which we start by converting to a tibble data frame (tbl_df). Tibble is a modern rethinking of data frame providing a nicer printing method. This is useful when working with large data sets.
# Create my_data
my_data <- iris[, -5]
# Convert to a tibble
library("tibble")
my_data <- as_data_frame(my_data)
# Print
my_data
Source: local data frame [150 x 4]
Sepal.Length Sepal.Width Petal.Length Petal.Width
<dbl> <dbl> <dbl> <dbl>
1 5.1 3.5 1.4 0.2
2 4.9 3.0 1.4 0.2
3 4.7 3.2 1.3 0.2
4 4.6 3.1 1.5 0.2
5 5.0 3.6 1.4 0.2
6 5.4 3.9 1.7 0.4
7 4.6 3.4 1.4 0.3
8 5.0 3.4 1.5 0.2
9 4.4 2.9 1.4 0.2
10 4.9 3.1 1.5 0.1
.. ... ... ... ...
Install and load dplyr package for renaming columns
- Install dplyr
install.packages("dplyr")
- Load dplyr:
library("dplyr")
dplyr::mutate(): Add new variables by preserving existing ones
- Add new columns (sepal_by_petal_*) by preserving existing ones:
mutate(my_data,
sepal_by_petal_l = Sepal.Length/Petal.Length
)
Source: local data frame [150 x 5]
Sepal.Length Sepal.Width Petal.Length Petal.Width sepal_by_petal_l
(dbl) (dbl) (dbl) (dbl) (dbl)
1 5.1 3.5 1.4 0.2 3.642857
2 4.9 3.0 1.4 0.2 3.500000
3 4.7 3.2 1.3 0.2 3.615385
4 4.6 3.1 1.5 0.2 3.066667
5 5.0 3.6 1.4 0.2 3.571429
6 5.4 3.9 1.7 0.4 3.176471
7 4.6 3.4 1.4 0.3 3.285714
8 5.0 3.4 1.5 0.2 3.333333
9 4.4 2.9 1.4 0.2 3.142857
10 4.9 3.1 1.5 0.1 3.266667
.. ... ... ... ... ...
dplyr::transmute(): Make new variables by dropping existing ones
- Add new columns (sepal_by_petal_*) by dropping existing ones:
transmute(my_data,
sepal_by_petal_l = Sepal.Length/Petal.Length,
sepal_by_petal_w = Sepal.Width/Petal.Width
)
Source: local data frame [150 x 2]
sepal_by_petal_l sepal_by_petal_w
(dbl) (dbl)
1 3.642857 17.50000
2 3.500000 15.00000
3 3.615385 16.00000
4 3.066667 15.50000
5 3.571429 18.00000
6 3.176471 9.75000
7 3.285714 11.33333
8 3.333333 17.00000
9 3.142857 14.50000
10 3.266667 31.00000
.. ... ...
Use mutate() and transmute() programmatically inside a function:
There are three ways to quote inputs that dplyr understands:
- With a formula, ~Sepal.Length.
- With quote(), quote(Sepal.Length).
- As a string: Sepal.Length.
# Use formula
mutate_(my_data,
sepal_by_petal_l = ~Sepal.Length/Petal.Length,
sepal_by_petal_w = ~Sepal.Width/Petal.Width
)
# Or use quote
transmute_(my_data,
sepal_by_petal_l = quote(Sepal.Length/Petal.Length),
sepal_by_petal_w = quote(Sepal.Width/Petal.Width)
)
# or, this
transmute_(my_data,
sepal_by_petal_l = "Sepal.Length/Petal.Length",
sepal_by_petal_w = "Sepal.Width/Petal.Width"
)
transform(): R base function to compute and add new variables
dplyr::mutate() works similarly to the R base function transform(), except that in mutate() you can refer to variables youve just created. This is not possible in transform().
my_data2 <- transform(my_data, neg_sepal_length = -Sepal.Length)
head(my_data2)
Sepal.Length Sepal.Width Petal.Length Petal.Width neg_sepal_length
1 5.1 3.5 1.4 0.2 -5.1
2 4.9 3.0 1.4 0.2 -4.9
3 4.7 3.2 1.3 0.2 -4.7
4 4.6 3.1 1.5 0.2 -4.6
5 5.0 3.6 1.4 0.2 -5.0
6 5.4 3.9 1.7 0.4 -5.4
Summary
- dplyr::mutate(iris, sepal = 2*Sepal.Length): Computes and appends new variable(s).
- dplyr::transmute(iris, sepal = 2*Sepal.Length): Makes new variable(s) and drops existing ones.
- transform(iris, sepal = 2*Sepal.Length): R base function similar to mutate().
Infos
This analysis has been performed using R (ver. 3.2.4).