Quantcast
Channel: Easy Guides
Viewing all articles
Browse latest Browse all 183

ggpubr R Package: ggplot2-Based Publication Ready Plots

$
0
0


Why ggpubr?

ggplot2 by Hadley Wickham is an excellent and flexible package for elegant data visualization in R. However the default generated plots requires some formatting before we can send them for publication. Furthermore, to customize a ggplot, the syntax is opaque and this raises the level of difficulty for researchers with no advanced R programming skills.

The ‘ggpubr’ package provides some easy-to-use functions for creating and customizing ‘ggplot2’- based publication ready plots.

Installation and loading

  • Install from CRAN as follow:
install.packages("ggpubr")
  • Or, install the latest version from GitHub as follow:
# Install
if(!require(devtools)) install.packages("devtools")
devtools::install_github("kassambara/ggpubr")
  • Load ggpubr as follow:
library(ggpubr)

Geting started

See the online documentation (http://www.sthda.com/english/rpkgs/ggpubr) for a complete list.

Density and histogram plots

  1. Create some data
set.seed(1234)
wdata = data.frame(
   sex = factor(rep(c("F", "M"), each=200)),
   weight = c(rnorm(200, 55), rnorm(200, 58)))
head(wdata, 4)
##   sex   weight
## 1   F 53.79293
## 2   F 55.27743
## 3   F 56.08444
## 4   F 52.65430
  1. Density plot with mean lines and marginal rug
# Change outline and fill colors by groups ("sex")
# Use custom palette
ggdensity(wdata, x = "weight",
   add = "mean", rug = TRUE,
   color = "sex", fill = "sex",
   palette = c("#00AFBB", "#E7B800"))


Note that:

  1. the argument palette is used for coloring or filling by groups. Allowed values include:
    • “grey” for grey color palettes;
    • brewer palettes e.g. “RdBu”, “Blues”, …; click here to see all brewer palettes.
    • or custom color palettes e.g. c(“blue”, “red”) or c(“#00AFBB”, “#E7B800”);
    • and scientific journal palettes from ggsci R package, e.g.: “npg”, “aaas”, “lancet”, “jco”, “ucscgb”, “uchicago”, “simpsons” and “rickandmorty”.
  2. the argument add can be used to add mean or median lines to density and to histogram plots. Allowed values are: “mean” and “median”.


  1. Histogram plot with mean lines and marginal rug
# Change outline and fill colors by groups ("sex")
# Use custom color palette
gghistogram(wdata, x = "weight",
   add = "mean", rug = TRUE,
   color = "sex", fill = "sex",
   palette = c("#00AFBB", "#E7B800"))

If you want to create the above histogram with the standard ggplot2 functions, the syntax is extremely complex for beginners (see the R script below). The ggpubr package is a wrapper around ggplot2 functions to make your life easier and to produce quickly a publication ready plot.

# ggplot2 standard syntax for creating histogram
# +++++++++++++++++++++++++++++++++++++
# Compute group mean
library("dplyr")
mu <- wdata %>%
group_by(sex) %>%
summarise(grp.mean = mean(weight))
# Plot
ggplot(data = wdata, aes(weight)) +
  geom_histogram(aes(color = sex, fill = sex),
                 position = "identity", alpha = 0.5)+
  geom_vline(data = mu, aes(xintercept=grp.mean, color = sex),
             linetype="dashed", size=1) +
  scale_color_manual(values = c("#00AFBB", "#E7B800"))+
  scale_fill_manual(values = c("#00AFBB", "#E7B800"))+
  theme_classic()+
  theme(
    axis.text.x = element_text(size = 12, colour = "black",face = "bold"),
    axis.text.y = element_text(size = 12, colour = "black",face = "bold"),
    axis.line.x = element_line(colour = "black", size = 1),
    axis.line.y = element_line(colour = "black", size = 1),
    legend.position = "bottom"
    )

Box plots, violin plots, dot plots and strip charts

  1. Load data
data("ToothGrowth")
df <- ToothGrowth
head(df, 4)
##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
  1. Box plots with jittered points
# Change outline colors by groups: dose
# Use custom color palette
# Add jitter points and change the shape by groups
 ggboxplot(df, x = "dose", y = "len",
    color = "dose", palette =c("#00AFBB", "#E7B800", "#FC4E07"),
    add = "jitter", shape = "dose")


Note that, when using ggpubr functions for drawing box plots, violin plots, dot plots, strip charts, bar plots, line plots or error plots, the argument add can be used for adding another plot element (e.g.: dot plot or error bars).

In this case, allowed values for the argument add are one or the combination of: “none”, “dotplot”, “jitter”, “boxplot”, “mean”, “mean_se”, “mean_sd”, “mean_ci”, “mean_range”, “median”, “median_iqr”, “median_mad”, “median_range”; see ?desc_statby for more details.


  1. Violin plots with box plots inside
# Change fill color by groups: dose
# add boxplot with white fill color
ggviolin(df, x = "dose", y = "len", fill = "dose",
   palette = c("#00AFBB", "#E7B800", "#FC4E07"),
   add = "boxplot", add.params = list(fill = "white"))

  1. Dot plots with summary statistics
# Change outline and fill colors by groups: dose
# Add mean + sd
ggdotplot(df, x = "dose", y = "len", color = "dose", fill = "dose", 
          palette = c("#00AFBB", "#E7B800", "#FC4E07"),
          add = "mean_sd", add.params = list(color = "gray"))


Recall that, possible summary statistics include “boxplot”, “mean”, “mean_se”, “mean_sd”, “mean_ci”, “mean_range”, “median”, “median_iqr”, “median_mad”, “median_range”; see ?desc_statby for more details.


  1. Strip chart with summary statistics
# Change points size
# Change point colors and shapes by groups: dose
# Use custom color palette
 ggstripchart(df, "dose", "len",  size = 2, shape = "dose",
   color = "dose", palette = c("#00AFBB", "#E7B800", "#FC4E07"),
   add = "mean_sd")

Bar plots

  1. Basic plot with labels outsite
# Data
df2 <- data.frame(dose=c("D0.5", "D1", "D2"),
   len=c(4.2, 10, 29.5))
print(df2)
##   dose  len
## 1 D0.5  4.2
## 2   D1 10.0
## 3   D2 29.5
# Change ouline and fill colors by groups: dose
# Use custom color palette
# Add labels
 ggbarplot(df2, x = "dose", y = "len",
   fill = "dose", color = "dose",
   palette = c("#00AFBB", "#E7B800", "#FC4E07"),
   label = TRUE)


  • Use lab.pos = “in”, to put labels inside bars
  • Use lab.col, to change label colors


  1. Bar plot with multiple groups
# Create some data
df3 <- data.frame(supp=rep(c("VC", "OJ"), each=3),
   dose=rep(c("D0.5", "D1", "D2"),2),
   len=c(6.8, 15, 33, 4.2, 10, 29.5))
print(df3)
##   supp dose  len
## 1   VC D0.5  6.8
## 2   VC   D1 15.0
## 3   VC   D2 33.0
## 4   OJ D0.5  4.2
## 5   OJ   D1 10.0
## 6   OJ   D2 29.5
# Plot "len" by "dose" and change color by a second group: "supp"
# Add labels inside bars
ggbarplot(df3, x = "dose", y = "len",
  fill = "supp", color = "supp", palette = c("#00AFBB", "#E7B800"),
  label = TRUE, lab.col = "white", lab.pos = "in")

  1. Bar plot visualizing the mean of each group with error bars
# Data: ToothGrowth data set we'll be used.
df <- ToothGrowth
head(df, 10)
##     len supp dose
## 1   4.2   VC  0.5
## 2  11.5   VC  0.5
## 3   7.3   VC  0.5
## 4   5.8   VC  0.5
## 5   6.4   VC  0.5
## 6  10.0   VC  0.5
## 7  11.2   VC  0.5
## 8  11.2   VC  0.5
## 9   5.2   VC  0.5
## 10  7.0   VC  0.5
# Visualize the mean of each group
# Change point and outline colors by groups: dose
# Add jitter points and errors (mean_se)
ggbarplot(df, x = "dose", y = "len", color = "dose",
          palette = c("#00AFBB", "#E7B800", "#FC4E07"),
          add = c("mean_se", "jitter"))

Line plots

  1. Line plots with multiple groups
# Plot "len" by "dose" and
# Change line types and point shapes by a second groups: "supp"
# Change color by groups "supp"
ggline(df3, x = "dose", y = "len",
  linetype = "supp", shape = "supp",
  color = "supp",  palette = c("#00AFBB", "#E7B800"))

  1. Line plot visualizing the mean of each group with error bars
# Visualize the mean of each group: dose
# Change colors by a second groups: supp
# Add jitter points and errors (mean_se)
ggline(df, x = "dose", y = "len", 
       color = "supp", 
       palette = c("#00AFBB", "#E7B800", "#FC4E07"),
       add = c("mean_se", "jitter"))

Pie chart

  1. Create some data
df4 <- data.frame(
  group = c("Male", "Female", "Child"),
  value = c(25, 25, 50))
head(df4)
##    group value
## 1   Male    25
## 2 Female    25
## 3  Child    50
  1. Pie chart
# Change fill color by group
# set outline line color to white
# Use custom color palette
# Show group names and value as labels
labs <- paste0(df4$group, " (", df4$value, "%)")
ggpie(df4, x = "value", fill = "group", color = "white",
   palette = c("#00AFBB", "#E7B800", "#FC4E07"),
   label = labs, lab.pos = "in", lab.font = "white")

Scatter plots

  1. Load and prepare data
data("mtcars")
df5 <- mtcars
df5$cyl <- as.factor(df5$cyl) # grouping variable
df5$name = rownames(df5) # for point labels
head(df5[, c("wt", "mpg", "cyl")], 3)
##                  wt  mpg cyl
## Mazda RX4     2.620 21.0   6
## Mazda RX4 Wag 2.875 21.0   6
## Datsun 710    2.320 22.8   4
  1. Scatter plots with regression line and confidence interval
ggscatter(df5, x = "wt", y = "mpg",
   color = "black", shape = 21, size = 4, # Points color, shape and size
   add = "reg.line",  # Add regressin line
   add.params = list(color = "blue", fill = "lightgray"), # Customize reg. line
   conf.int = TRUE, # Add confidence interval
   cor.coef = TRUE # Add correlation coefficient
   )


Note that, when using ggpubr functions for drawing scatter plots, allowed values for the argument add are one of “none”, “reg.line” (for adding linear regression line) or “loess” (for adding local regression fitting).


  1. Scatter plot with concentration ellipses and labels
# Change point colors and shapes by groups: cyl
# Use custom palette
# Add concentration ellipses with mean points (barycenters)
# Add marginal rug
# Add label and use repel = TRUE to avoid label overplotting
ggscatter(df5, x = "wt", y = "mpg",
   color = "cyl", shape = "cyl",
   palette = c("#00AFBB", "#E7B800", "#FC4E07"),
   ellipse = TRUE, mean.point = TRUE,
   rug = TRUE, label = "name", font.label = 10, repel = TRUE)


Note that, it’s possible to change the ellipse type by using the argument ellipse.type. Possible values are ‘convex’, ‘confidence’ or types supported by ggplot2::stat_ellipse() including one of c(“t”, “norm”, “euclid”).


Cleveland’s dot plots

# Change colors by  group cyl
ggdotchart(df5, x = "mpg", label = "name",
   group = "cyl", color = "cyl",
   palette = c("#00AFBB", "#E7B800", "#FC4E07") )

ggpar(): customize ggplot easily

The function ggpar() [in ggpubr] can be used to simply and easily customize any ggplot2-based graphs. The graphical parameters that can be changed using ggpar() include:

  • Main titles, axis labels and legend titles
  • Legend position and appearance
  • colors
  • Axis limits
  • Axis transformations: log and sqrt
  • Axis ticks
  • Themes
  • Rotate a plot

Note that all the arguments accepted by the function ggpar() can be also directly passed to the plotting functions in ggpubr package.

We start by creating a basic box plot colored by groups as follow:

df <- ToothGrowth
p <- ggboxplot(df, x = "dose", y = "len",
               color = "dose")
print(p)

Main titles, axis labels and legend titles

# Change title texts and fonts
ggpar(p, main = "Plot of length \n by dose",
      xlab ="Dose (mg)", ylab = "Teeth length",
      legend.title = "Dose (mg)",
      font.main = c(14,"bold.italic", "red"),
      font.x = c(14, "bold", "#2E9FDF"),
      font.y = c(14, "bold", "#E7B800"))

# Hide titles
ggpar(p, xlab = FALSE, ylab = FALSE)


Note that,

  1. font.main, font.x, font.y are vectors of length 3 indicating respectively the size (e.g.: 14), the style (e.g.: “plain”, “bold”, “italic”, “bold.italic”) and the color (e.g.: “red”) of main title, xlab and ylab, respectively. For example font.x = c(14, “bold”, “red”). Use font.x = 14, to change only font size; or use font.x = “bold”, to change only font face.
  2. you can use \n, to split long title into multiple lines.


Legend position and appearance

ggpar(p,
 legend = "right", legend.title = "Dose (mg)",
 font.legend = c(10, "bold", "red"))


Note that, the legend argument is a character vector specifying legend position. Allowed values are one of c(“top”, “bottom”, “left”, “right”, “none”). Default is “bottom” side position. to remove the legend use legend = “none”. Legend position can be also specified using a numeric vector c(x, y). Their values should be between 0 and 1. c(0,0) corresponds to the “bottom left” and c(1,1) corresponds to the “top right” position.


Color palettes

As mentioned above, the argument palette is used to change group color palettes. Allowed values include:

  • Custom color palettes e.g. c(“blue”, “red”) or c(“#00AFBB”, “#E7B800”);
  • “grey” for grey color palettes;
  • brewer palettes e.g. “RdBu”, “Blues”, …; click here to see all brewer palettes.
  • and scientific journal palettes from ggsci R package, e.g.: “npg”, “aaas”, “lancet”, “jco”, “ucscgb”, “uchicago”, “simpsons” and “rickandmorty”.
# Use custom color palette
ggpar(p, palette = c("#00AFBB", "#E7B800", "#FC4E07"))

# Use brewer palette
ggpar(p, palette = "Dark2" )

# Use grey palette
ggpar(p, palette = "grey")
   

# Use scientific journal palette from ggsci package
# Allowed values: "npg", "aaas", "lancet", "jco", 
#   "ucscgb", "uchicago", "simpsons" and "rickandmorty".
ggpar(p, palette = "npg") # nature

Axis limits and scales

The following arguments can be used:


  • xlim, ylim: a numeric vector of length 2, specifying x and y axis limits (minimum and maximum values), respectively. e.g.: ylim = c(0, 50).
  • xscale, yscale: x and y axis scale, respectively. Allowed values are one of c(“none”, “log2”, “log10”, “sqrt”); e.g.: yscale=“log2”.
  • format.scale: logical value. If TRUE, axis tick mark labels will be formatted when xscale or yscale = “log2” or “log10”.


# Change y axis limits
ggpar(p, ylim = c(0, 50))

# Change y axis scale to log2
ggpar(p, yscale = "log2")

# Format axis scale
ggpar(p, yscale = "log2", format.scale = TRUE)

Axis ticks: customize tick marks and labels

The following arguments can be used:


  • ticks: logical value. Default is TRUE. If FALSE, hide axis tick marks.
  • tickslab: logical value. Default is TRUE. If FALSE, hide axis tick labels.
  • font.tickslab: Font style (size, face, color) for tick labels, e.g.: c(14, “bold”, “red”).
  • xtickslab.rt, ytickslab.rt: Rotation angle of x and y axis tick labels, respectively. Default value is 0.
  • xticks.by, yticks.by: numeric value controlling x and y axis breaks, respectively. For example, if yticks.by = 5, a tick mark is shown on every 5. Default value is NULL.


# Axis tick labels style: "plain", "italic", "bold" or "bold.italic"
# Rotation angle = 45
ggpar(p, font.tickslab = c(12, "bold", "#2E9FDF"),
      xtickslab.rt = 45, ytickslab.rt = 45)

# Hide ticks and tickslab
ggpar(p, ticks = FALSE, tickslab = FALSE)

Themes

The R package ggpubr contains two main functions for changing the default ggplot theme to a publication ready theme:

  • theme_pubr(): change the theme to a publication ready theme
  • labs_pubr(): Format only plot labels to a publication ready style

theme_pubr() will produce plots with bold axis labels, bold tick mark labels and legend at the bottom leaving extra space for the plotting area.


The argument ggtheme can be used in any ggpubr plotting functions to change the plot theme. Default value is theme_pubr() for publication ready theme. Allowed values include ggplot2 official themes: theme_gray(), theme_bw(), theme_minimal(), theme_classic(), theme_void(), etc. It’s also possible to use the function “+” to add a theme.


# Gray theme
p + theme_gray()

# Minimal theme
p + theme_minimal()

# Format only plot labels to a publication ready style
# by using the function labs_pubr()
p + theme_minimal() + labs_pubr(base_size = 16)

Rotate a plot

  • Create some data
set.seed(1234)
wdata = data.frame(
   sex = factor(rep(c("F", "M"), each=200)),
   weight = c(rnorm(200, 55), rnorm(200, 58)))
  • Create a density plot and change plot orientation
# Basic density plot
p <- ggdensity(wdata, x = "weight") + theme_gray()
p

# Horizontal plot
ggpar(p, orientation = "horizontal" ) + theme_gray()

# y axis reversed
ggpar(p, orientation = "reverse" ) + theme_gray()

More

See the online documentation (http://www.sthda.com/english/rpkgs/ggpubr) for a complete list.

Infos

This analysis has been performed using R software (ver. 3.2.4) and ggpubr (ver. 0.1.0.999)


Viewing all articles
Browse latest Browse all 183

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>