Previously, we described the essentials of R programming and provided quick start guides for importing data into R as well as converting your data into a tibble data format, which is the best and modern way to work with your data. We next described crutial steps to reshape your data with R for easier analyses. Additionally, we provided quick start guides for subsetting data frame rows based on some logical criteria.

Here, you well learn how to subset data frame columns (i.e., variables) by names using the function select() [in dplyr package].

Subsetting Columns of a Data Frame in R

Pleleminary tasks

Launch RStudio as described here: Running RStudio and setting up your working directory
Prepare your data as described here: Best practices for preparing your data and save it in an external .txt tab or .csv files
Import your data into R as described here: Fast reading of data from txt|csv files into R: readr package.

Here, well use the R built-in iris data set, which we start by converting to a tibble data frame (tbl_df). Tibble is a modern rethinking of data frame providing a nicer printing method. This is useful when working with large data sets.

# Create my_data
my_data <- iris

# Convert to a tibble
library("tibble")
my_data <- as_data_frame(my_data)

# Print
my_data

Source: local data frame [150 x 5]

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
          <dbl>       <dbl>        <dbl>       <dbl>  <fctr>
1           5.1         3.5          1.4         0.2  setosa
2           4.9         3.0          1.4         0.2  setosa
3           4.7         3.2          1.3         0.2  setosa
4           4.6         3.1          1.5         0.2  setosa
5           5.0         3.6          1.4         0.2  setosa
6           5.4         3.9          1.7         0.4  setosa
7           4.6         3.4          1.4         0.3  setosa
8           5.0         3.4          1.5         0.2  setosa
9           4.4         2.9          1.4         0.2  setosa
10          4.9         3.1          1.5         0.1  setosa
..          ...         ...          ...         ...     ...

Install and load dplyr package

Install dplyr

install.packages("dplyr")

Load dplyr:

library("dplyr")

Selecting column by position

Select columns 1 to 2:

my_data[, 1:2]

Select column 1 and 3 but not 2:

my_data[, c(1, 3)]

Select columns by names

Select columns by names: Sepal.Length and Petal.Length

select(my_data, Sepal.Length, Petal.Length)

Source: local data frame [150 x 2]

   Sepal.Length Petal.Length
          (dbl)        (dbl)
1           5.1          1.4
2           4.9          1.4
3           4.7          1.3
4           4.6          1.5
5           5.0          1.4
6           5.4          1.7
7           4.6          1.4
8           5.0          1.5
9           4.4          1.4
10          4.9          1.5
..          ...          ...

Select all columns from Sepal.Length to Petal.Length

select(my_data, Sepal.Length:Petal.Length)

Source: local data frame [150 x 3]

   Sepal.Length Sepal.Width Petal.Length
          (dbl)       (dbl)        (dbl)
1           5.1         3.5          1.4
2           4.9         3.0          1.4
3           4.7         3.2          1.3
4           4.6         3.1          1.5
5           5.0         3.6          1.4
6           5.4         3.9          1.7
7           4.6         3.4          1.4
8           5.0         3.4          1.5
9           4.4         2.9          1.4
10          4.9         3.1          1.5
..          ...         ...          ...

There are several special functions that can be used inside select(): starts_with(), ends_with(), contains(), matches(), one_of(), etc.

# Select column whose name starts with "Petal"
select(my_data, starts_with("Petal"))

# Select column whose name ends with "Width"
select(my_data, ends_with("Width"))

# Select columns whose names contains "etal"
select(my_data, contains("etal"))
# Select columns whose name maches a regular expression
select(my_data, matches(".t."))

# selects variables provided in a character vector.
select(my_data, one_of(c("Sepal.Length", "Petal.Length")))

Drop columns

Note that, to remove a column from a data frame, prepend its name by minus -.

Dropping Sepal.Length and Petal.Length:

select(my_data, -Sepal.Length, -Petal.Length)

Dropping columns from Sepal.Length to Petal.Length:

select(my_data, -(Sepal.Length:Petal.Length))

Source: local data frame [150 x 2]

   Petal.Width Species
         (dbl)  (fctr)
1          0.2  setosa
2          0.2  setosa
3          0.2  setosa
4          0.2  setosa
5          0.2  setosa
6          0.4  setosa
7          0.3  setosa
8          0.2  setosa
9          0.2  setosa
10         0.1  setosa
..         ...     ...

Dropping columns whose name starts with Petal:

select(my_data, -starts_with("Petal"))

Source: local data frame [150 x 3]

   Sepal.Length Sepal.Width Species
          (dbl)       (dbl)  (fctr)
1           5.1         3.5  setosa
2           4.9         3.0  setosa
3           4.7         3.2  setosa
4           4.6         3.1  setosa
5           5.0         3.6  setosa
6           5.4         3.9  setosa
7           4.6         3.4  setosa
8           5.0         3.4  setosa
9           4.4         2.9  setosa
10          4.9         3.1  setosa
..          ...         ...     ...

Note that, if you want to drop columns by position, the syntax is as follow.

# Drop column 1
my_data[, -1]

# Drop columns 1 to 3
my_data[, -(1:3)]

# Drop columns 1 and 3 but not 2
my_data[, -c(1, 3)]

Use select() programmatically inside an R function

Dplyr uses non-standard evaluation (NSE), which is great for interactive use and save you typing. Behind the scene, NSE is powered by the lazyeval package.

select() is best-suited for interactive use. The function select_() should be used for calling from a function. In this case the input must be quoted.

There are three ways to quote inputs that dplyr understands:

With a formula, ~Sepal.Length.
With quote(), quote(Sepal.Length).
As a string: Sepal.Length.

For example, you can select the column Sepal.Length by typing the following R code:

select_(my_data, ~Sepal.Length)

Or, by using this:

select_(my_data, "Sepal.Length")

Its also possible to use function inside select_(). The R package lazyeval is required. It can be installed as follow:

install.packages("lazyeval")

Use lazyeval package to interpret functions inside select_():

# Select column names that match ".t."
select_(my_data, lazyeval::interp(~matches(x), x = ".t."))

# Select column names that start with "Petal"
select_(my_data, lazyeval::interp(~starts_with(x), x = "Petal"))

# Dropping columns: Sepal.Length and Sepal.Width
select_(my_data, quote(-Sepal.Length), quote(-Sepal.Width))

# Or use this
select_(my_data, .dots = list(quote(-Petal.Length), quote(-Petal.Width)))

Summary

Select columns by position: my_data[, 1:2]
Select columns by name: dplyr::select(my_data, Sepal.Length, Petal.Length)
Drop columns: dplyr::select(my_data, -Sepal.Length, -Petal.Length)
Helper functions: starts_with(), ends_with(), contains(), matches(), one_of()
- dplyr::select(my_data, starts_with(Petal))
- dplyr::select(my_data, ends_with(Length))

Infos

This analysis has been performed using R (ver. 3.2.3).

Subsetting Data Frame Columns in R

Pleleminary tasks

Install and load dplyr package

Selecting column by position

Select columns by names

Drop columns

Use select() programmatically inside an R function

Summary

Infos

Trending Articles

RAMAYAMPET Mandal Sarpanch | Upa-Sarpanch | Ward member Mobile Numbers Medak...

लड़कियां सेक्स के दौरान क्यों करती है उह! आह!लड़कियां सेक्स के दौरान क्यों करती...

Neem Baba Extra Questions Answer Class 6 English Poorvi

Throw Back: 4×4 — Sikilitele (Ft Castro) Prod by JQ

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Lowe faces four theft charges

Practice Sheet of Right form of verbs for HSC Students

Mafia, Murder & Mayhem In The Motor City: Detroit Mob Hit Timeline (1937-2007)

The 10 Tennessee Cities With The Largest Black Population For 2021

Materials Around Us Class 6 Worksheet Science Chapter 6

デスクトップヒープの枯渇

Best Suvichar in Hindi |बेस्ट सुविचार |शुभ विचार हिंदी में

Kanulanu Thaake Lyrics and translation | Manam (2014)

Korean Sex Porn Videos: XXX Videos & Free Porn Movies

Teen Shot In Miami Drive-By Dies From Injuries

Download: IQ Muzatasha feat Shy D & Pmj – Ulesi NiFertilizer Yamavuto

Mahakal Attitude Status

Property developer set up cannabis factory to help pay off debts...

♡

KB: How to troubleshoot issues when adding a Hyper-V host in System Center...

Pleleminary tasks

Install and load dplyr package

Selecting column by position

Select columns by names

Drop columns

Use select() programmatically inside an R function

Summary

Related articles

Infos

Trending Articles