Quantcast
Channel: Easy Guides
Viewing all articles
Browse latest Browse all 183

Reordering Data Frame Rows in R

$
0
0



Previously, we described the essentials of R programming and provided quick start guides for importing data into R as well as converting your data into a tibble data format, which is the best and modern way to work with your data. We also described crutial steps to reshape your data with R for easier analyses.


Here, you we’ll learn how to reorder (i.e., sort) rows, in your data table, by the value of one or more columns (i.e., variables). This can be done using either the R base function order() or the modern function arrange()[in dplyr package]. We recommend dplyr::arrange() because it requires less typing.


Reordering Data Frame Rows by Variables in R

Pleleminary tasks

  1. Launch RStudio as described here: Running RStudio and setting up your working directory

  2. Prepare your data as described here: Best practices for preparing your data and save it in an external .txt tab or .csv files

  3. Import your data into R as described here: Fast reading of data from txt|csv files into R: readr package.

Here, we’ll use the R built-in iris data set, which we start by converting to a tibble data frame (tbl_df). Tibble is a modern rethinking of data frame providing a nicer printing method. This is useful when working with large data sets.

# Create my_data
my_data <- iris

# Convert to a tibble
library("tibble")
my_data <- as_data_frame(my_data)

# Print
my_data
Source: local data frame [150 x 5]

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
          <dbl>       <dbl>        <dbl>       <dbl>  <fctr>
1           5.1         3.5          1.4         0.2  setosa
2           4.9         3.0          1.4         0.2  setosa
3           4.7         3.2          1.3         0.2  setosa
4           4.6         3.1          1.5         0.2  setosa
5           5.0         3.6          1.4         0.2  setosa
6           5.4         3.9          1.7         0.4  setosa
7           4.6         3.4          1.4         0.3  setosa
8           5.0         3.4          1.5         0.2  setosa
9           4.4         2.9          1.4         0.2  setosa
10          4.9         3.1          1.5         0.1  setosa
..          ...         ...          ...         ...     ...

Install and load dplyr package

  • Install dplyr
install.packages("dplyr")
  • Load dplyr:
library("dplyr")

Reorder rows with dplyr::arrange()


The dplyr function arrange() can be used to reorder (sort) rows by one or more variables.


  • Reorder rows by Sepal.Length in ascending order
arrange(my_data, Sepal.Length)
Source: local data frame [150 x 5]

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
          (dbl)       (dbl)        (dbl)       (dbl)  (fctr)
1           4.3         3.0          1.1         0.1  setosa
2           4.4         2.9          1.4         0.2  setosa
3           4.4         3.0          1.3         0.2  setosa
4           4.4         3.2          1.3         0.2  setosa
5           4.5         2.3          1.3         0.3  setosa
6           4.6         3.1          1.5         0.2  setosa
7           4.6         3.4          1.4         0.3  setosa
8           4.6         3.6          1.0         0.2  setosa
9           4.6         3.2          1.4         0.2  setosa
10          4.7         3.2          1.3         0.2  setosa
..          ...         ...          ...         ...     ...
  • Reorder rows by Sepal.Length in descending order. Use the function desc():
arrange(my_data, desc(Sepal.Length))
Source: local data frame [150 x 5]

   Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
          (dbl)       (dbl)        (dbl)       (dbl)    (fctr)
1           7.9         3.8          6.4         2.0 virginica
2           7.7         3.8          6.7         2.2 virginica
3           7.7         2.6          6.9         2.3 virginica
4           7.7         2.8          6.7         2.0 virginica
5           7.7         3.0          6.1         2.3 virginica
6           7.6         3.0          6.6         2.1 virginica
7           7.4         2.8          6.1         1.9 virginica
8           7.3         2.9          6.3         1.8 virginica
9           7.2         3.6          6.1         2.5 virginica
10          7.2         3.2          6.0         1.8 virginica
..          ...         ...          ...         ...       ...

Instead of using the function desc(), you can prepend the sorting variable by a minus sign to indicate descending order, as follow.

arrange(my_data, -Sepal.Length)
  • Reorder rows by multiple variables: Sepal.Length and Sepal.width
arrange(my_data, Sepal.Length, Sepal.Width)
Source: local data frame [150 x 5]

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
          (dbl)       (dbl)        (dbl)       (dbl)  (fctr)
1           4.3         3.0          1.1         0.1  setosa
2           4.4         2.9          1.4         0.2  setosa
3           4.4         3.0          1.3         0.2  setosa
4           4.4         3.2          1.3         0.2  setosa
5           4.5         2.3          1.3         0.3  setosa
6           4.6         3.1          1.5         0.2  setosa
7           4.6         3.2          1.4         0.2  setosa
8           4.6         3.4          1.4         0.3  setosa
9           4.6         3.6          1.0         0.2  setosa
10          4.7         3.2          1.3         0.2  setosa
..          ...         ...          ...         ...     ...

If the data contain missing values, they will always come at the end.

dplyr::arrange() is the homologous of R base function order(). It requires less typing.

Reorder rows with R base function order()

  • Reorder rows by Sepal.Length in ascending order
my_data[order(my_data$Sepal.Length), , drop = FALSE]
  • Reorder rows by Sepal.Length in descending order. Use the additional argument decreasing = TRUE:
row_order <- order(my_data$Sepal.Length, decreasing = TRUE)
my_data[row_order, , drop = FALSE]

Summary


To order rows by values of a column use the function arrange()[in dplyr package].


Infos

This analysis has been performed using R (ver. 3.2.3).


Viewing all articles
Browse latest Browse all 183

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>