Previously, we described the essentials of R programming and provided quick start guides for importing data into R as well as converting your data into a tibble data format, which is the best and modern way to work with your data. We also described crutial steps to reshape your data with R for easier analyses.
Pleleminary tasks
Launch RStudio as described here: Running RStudio and setting up your working directory
Prepare your data as described here: Best practices for preparing your data and save it in an external .txt tab or .csv files
Import your data into R as described here: Fast reading of data from txt|csv files into R: readr package.
Here, well use the R built-in iris data set, which we start by converting to a tibble data frame (tbl_df). Tibble is a modern rethinking of data frame providing a nicer printing method. This is useful when working with large data sets.
# Create my_data
my_data <- iris
# Convert to a tibble
library("tibble")
my_data <- as_data_frame(my_data)
# Print
my_data
Source: local data frame [150 x 5]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
<dbl> <dbl> <dbl> <dbl> <fctr>
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 5.0 3.4 1.5 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
10 4.9 3.1 1.5 0.1 setosa
.. ... ... ... ... ...
Install and load dplyr package for renaming columns
- Install dplyr
install.packages("dplyr")
- Load dplyr:
library("dplyr")
Renaming columns with dplyr::rename()
- Rename the column Sepal.Length to sepal_length and Sepal.Width to sepal_width:
rename(my_data, sepal_length = Sepal.Length,
sepal_width = Sepal.Width)
Source: local data frame [150 x 5]
sepal_length sepal_width Petal.Length Petal.Width Species
(dbl) (dbl) (dbl) (dbl) (fctr)
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 5.0 3.4 1.5 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
10 4.9 3.1 1.5 0.1 setosa
.. ... ... ... ... ...
Renaming columns with dplyr::select()
select() can be also used to rename variables as follow.
select(my_data, sepal_length = Sepal.Length,
sepal_width = Sepal.Width)
Source: local data frame [150 x 2]
sepal_length sepal_width
(dbl) (dbl)
1 5.1 3.5
2 4.9 3.0
3 4.7 3.2
4 4.6 3.1
5 5.0 3.6
6 5.4 3.9
7 4.6 3.4
8 5.0 3.4
9 4.4 2.9
10 4.9 3.1
.. ... ...
Note that, select() keeps only the variables you mentioned. In order to to keep all, you can use the function rename(), which is an alternative of select().
Renaming columns with R base functions
To rename the column Sepal.Length to sepal_length, the procedure is as follow:
- Get column names using the function names() or colnames()
- Change column names where name = Sepal.Length
# get column names
colnames(my_data)
[1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
# Rename column where names is "Sepal.Length"
names(my_data)[names(my_data) == "Sepal.Length"] <- "sepal_length"
names(my_data)[names(my_data) == "Sepal.Width"] <- "sepal_width"
my_data
Source: local data frame [150 x 5]
sepal_length sepal_width Petal.Length Petal.Width Species
(dbl) (dbl) (dbl) (dbl) (fctr)
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 5.0 3.4 1.5 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
10 4.9 3.1 1.5 0.1 setosa
.. ... ... ... ... ...
Its also possible to rename by index in names vector as follow.
names(my_data)[1] <- "sepal_length"
names(my_data)[2] <- "sepal_width"
Summary
Infos
This analysis has been performed using R (ver. 3.2.3).