Why ggpubr?
ggplot2 by Hadley Wickham is an excellent and flexible package for elegant data visualization in R. However the default generated plots requires some formatting before we can send them for publication. Furthermore, to customize a ggplot, the syntax is opaque and this raises the level of difficulty for researchers with no advanced R programming skills.
The ‘ggpubr’ package provides some easy-to-use functions for creating and customizing ‘ggplot2’- based publication ready plots.
Geting started
See the online documentation (http://www.sthda.com/english/rpkgs/ggpubr) for a complete list.
Density and histogram plots
- Create some data
set.seed(1234)
wdata = data.frame(
sex = factor(rep(c("F", "M"), each=200)),
weight = c(rnorm(200, 55), rnorm(200, 58)))
head(wdata, 4)
## sex weight
## 1 F 53.79293
## 2 F 55.27743
## 3 F 56.08444
## 4 F 52.65430
- Density plot with mean lines and marginal rug
# Change outline and fill colors by groups ("sex")
# Use custom palette
ggdensity(wdata, x = "weight",
add = "mean", rug = TRUE,
color = "sex", fill = "sex",
palette = c("#00AFBB", "#E7B800"))
Note that:
- the argument palette is used for coloring or filling by groups. Allowed values include:
- “grey” for grey color palettes;
- brewer palettes e.g. “RdBu”, “Blues”, …; click here to see all brewer palettes.
- or custom color palettes e.g. c(“blue”, “red”) or c(“#00AFBB”, “#E7B800”);
- and scientific journal palettes from ggsci R package, e.g.: “npg”, “aaas”, “lancet”, “jco”, “ucscgb”, “uchicago”, “simpsons” and “rickandmorty”.
- the argument add can be used to add mean or median lines to density and to histogram plots. Allowed values are: “mean” and “median”.
- Histogram plot with mean lines and marginal rug
# Change outline and fill colors by groups ("sex")
# Use custom color palette
gghistogram(wdata, x = "weight",
add = "mean", rug = TRUE,
color = "sex", fill = "sex",
palette = c("#00AFBB", "#E7B800"))
If you want to create the above histogram with the standard ggplot2 functions, the syntax is extremely complex for beginners (see the R script below). The ggpubr package is a wrapper around ggplot2 functions to make your life easier and to produce quickly a publication ready plot.
# ggplot2 standard syntax for creating histogram
# +++++++++++++++++++++++++++++++++++++
# Compute group mean
library("dplyr")
mu <- wdata %>%
group_by(sex) %>%
summarise(grp.mean = mean(weight))
# Plot
ggplot(data = wdata, aes(weight)) +
geom_histogram(aes(color = sex, fill = sex),
position = "identity", alpha = 0.5)+
geom_vline(data = mu, aes(xintercept=grp.mean, color = sex),
linetype="dashed", size=1) +
scale_color_manual(values = c("#00AFBB", "#E7B800"))+
scale_fill_manual(values = c("#00AFBB", "#E7B800"))+
theme_classic()+
theme(
axis.text.x = element_text(size = 12, colour = "black",face = "bold"),
axis.text.y = element_text(size = 12, colour = "black",face = "bold"),
axis.line.x = element_line(colour = "black", size = 1),
axis.line.y = element_line(colour = "black", size = 1),
legend.position = "bottom"
)
Box plots, violin plots, dot plots and strip charts
- Load data
data("ToothGrowth")
df <- ToothGrowth
head(df, 4)
## len supp dose
## 1 4.2 VC 0.5
## 2 11.5 VC 0.5
## 3 7.3 VC 0.5
## 4 5.8 VC 0.5
- Box plots with jittered points
# Change outline colors by groups: dose
# Use custom color palette
# Add jitter points and change the shape by groups
ggboxplot(df, x = "dose", y = "len",
color = "dose", palette =c("#00AFBB", "#E7B800", "#FC4E07"),
add = "jitter", shape = "dose")
Note that, when using ggpubr functions for drawing box plots, violin plots, dot plots, strip charts, bar plots, line plots or error plots, the argument add can be used for adding another plot element (e.g.: dot plot or error bars).
In this case, allowed values for the argument add are one or the combination of: “none”, “dotplot”, “jitter”, “boxplot”, “mean”, “mean_se”, “mean_sd”, “mean_ci”, “mean_range”, “median”, “median_iqr”, “median_mad”, “median_range”; see ?desc_statby for more details.- Violin plots with box plots inside
# Change fill color by groups: dose
# add boxplot with white fill color
ggviolin(df, x = "dose", y = "len", fill = "dose",
palette = c("#00AFBB", "#E7B800", "#FC4E07"),
add = "boxplot", add.params = list(fill = "white"))
- Dot plots with summary statistics
# Change outline and fill colors by groups: dose
# Add mean + sd
ggdotplot(df, x = "dose", y = "len", color = "dose", fill = "dose",
palette = c("#00AFBB", "#E7B800", "#FC4E07"),
add = "mean_sd", add.params = list(color = "gray"))
- Strip chart with summary statistics
# Change points size
# Change point colors and shapes by groups: dose
# Use custom color palette
ggstripchart(df, "dose", "len", size = 2, shape = "dose",
color = "dose", palette = c("#00AFBB", "#E7B800", "#FC4E07"),
add = "mean_sd")
Bar plots
- Basic plot with labels outsite
# Data
df2 <- data.frame(dose=c("D0.5", "D1", "D2"),
len=c(4.2, 10, 29.5))
print(df2)
## dose len
## 1 D0.5 4.2
## 2 D1 10.0
## 3 D2 29.5
# Change ouline and fill colors by groups: dose
# Use custom color palette
# Add labels
ggbarplot(df2, x = "dose", y = "len",
fill = "dose", color = "dose",
palette = c("#00AFBB", "#E7B800", "#FC4E07"),
label = TRUE)
- Use lab.pos = “in”, to put labels inside bars
- Use lab.col, to change label colors
- Bar plot with multiple groups
# Create some data
df3 <- data.frame(supp=rep(c("VC", "OJ"), each=3),
dose=rep(c("D0.5", "D1", "D2"),2),
len=c(6.8, 15, 33, 4.2, 10, 29.5))
print(df3)
## supp dose len
## 1 VC D0.5 6.8
## 2 VC D1 15.0
## 3 VC D2 33.0
## 4 OJ D0.5 4.2
## 5 OJ D1 10.0
## 6 OJ D2 29.5
# Plot "len" by "dose" and change color by a second group: "supp"
# Add labels inside bars
ggbarplot(df3, x = "dose", y = "len",
fill = "supp", color = "supp", palette = c("#00AFBB", "#E7B800"),
label = TRUE, lab.col = "white", lab.pos = "in")
- Bar plot visualizing the mean of each group with error bars
# Data: ToothGrowth data set we'll be used.
df <- ToothGrowth
head(df, 10)
## len supp dose
## 1 4.2 VC 0.5
## 2 11.5 VC 0.5
## 3 7.3 VC 0.5
## 4 5.8 VC 0.5
## 5 6.4 VC 0.5
## 6 10.0 VC 0.5
## 7 11.2 VC 0.5
## 8 11.2 VC 0.5
## 9 5.2 VC 0.5
## 10 7.0 VC 0.5
# Visualize the mean of each group
# Change point and outline colors by groups: dose
# Add jitter points and errors (mean_se)
ggbarplot(df, x = "dose", y = "len", color = "dose",
palette = c("#00AFBB", "#E7B800", "#FC4E07"),
add = c("mean_se", "jitter"))
Line plots
- Line plots with multiple groups
# Plot "len" by "dose" and
# Change line types and point shapes by a second groups: "supp"
# Change color by groups "supp"
ggline(df3, x = "dose", y = "len",
linetype = "supp", shape = "supp",
color = "supp", palette = c("#00AFBB", "#E7B800"))
- Line plot visualizing the mean of each group with error bars
# Visualize the mean of each group: dose
# Change colors by a second groups: supp
# Add jitter points and errors (mean_se)
ggline(df, x = "dose", y = "len",
color = "supp",
palette = c("#00AFBB", "#E7B800", "#FC4E07"),
add = c("mean_se", "jitter"))
Pie chart
- Create some data
df4 <- data.frame(
group = c("Male", "Female", "Child"),
value = c(25, 25, 50))
head(df4)
## group value
## 1 Male 25
## 2 Female 25
## 3 Child 50
- Pie chart
# Change fill color by group
# set outline line color to white
# Use custom color palette
# Show group names and value as labels
labs <- paste0(df4$group, " (", df4$value, "%)")
ggpie(df4, x = "value", fill = "group", color = "white",
palette = c("#00AFBB", "#E7B800", "#FC4E07"),
label = labs, lab.pos = "in", lab.font = "white")
Scatter plots
- Load and prepare data
data("mtcars")
df5 <- mtcars
df5$cyl <- as.factor(df5$cyl) # grouping variable
df5$name = rownames(df5) # for point labels
head(df5[, c("wt", "mpg", "cyl")], 3)
## wt mpg cyl
## Mazda RX4 2.620 21.0 6
## Mazda RX4 Wag 2.875 21.0 6
## Datsun 710 2.320 22.8 4
- Scatter plots with regression line and confidence interval
ggscatter(df5, x = "wt", y = "mpg",
color = "black", shape = 21, size = 4, # Points color, shape and size
add = "reg.line", # Add regressin line
add.params = list(color = "blue", fill = "lightgray"), # Customize reg. line
conf.int = TRUE, # Add confidence interval
cor.coef = TRUE # Add correlation coefficient
)
- Scatter plot with concentration ellipses and labels
# Change point colors and shapes by groups: cyl
# Use custom palette
# Add concentration ellipses with mean points (barycenters)
# Add marginal rug
# Add label and use repel = TRUE to avoid label overplotting
ggscatter(df5, x = "wt", y = "mpg",
color = "cyl", shape = "cyl",
palette = c("#00AFBB", "#E7B800", "#FC4E07"),
ellipse = TRUE, mean.point = TRUE,
rug = TRUE, label = "name", font.label = 10, repel = TRUE)
Cleveland’s dot plots
# Change colors by group cyl
ggdotchart(df5, x = "mpg", label = "name",
group = "cyl", color = "cyl",
palette = c("#00AFBB", "#E7B800", "#FC4E07") )
ggpar(): customize ggplot easily
The function ggpar() [in ggpubr] can be used to simply and easily customize any ggplot2-based graphs. The graphical parameters that can be changed using ggpar() include:
- Main titles, axis labels and legend titles
- Legend position and appearance
- colors
- Axis limits
- Axis transformations: log and sqrt
- Axis ticks
- Themes
- Rotate a plot
Note that all the arguments accepted by the function ggpar() can be also directly passed to the plotting functions in ggpubr package.
We start by creating a basic box plot colored by groups as follow:
df <- ToothGrowth
p <- ggboxplot(df, x = "dose", y = "len",
color = "dose")
print(p)
Main titles, axis labels and legend titles
# Change title texts and fonts
ggpar(p, main = "Plot of length \n by dose",
xlab ="Dose (mg)", ylab = "Teeth length",
legend.title = "Dose (mg)",
font.main = c(14,"bold.italic", "red"),
font.x = c(14, "bold", "#2E9FDF"),
font.y = c(14, "bold", "#E7B800"))
# Hide titles
ggpar(p, xlab = FALSE, ylab = FALSE)
Note that,
- font.main, font.x, font.y are vectors of length 3 indicating respectively the size (e.g.: 14), the style (e.g.: “plain”, “bold”, “italic”, “bold.italic”) and the color (e.g.: “red”) of main title, xlab and ylab, respectively. For example font.x = c(14, “bold”, “red”). Use font.x = 14, to change only font size; or use font.x = “bold”, to change only font face.
- you can use \n, to split long title into multiple lines.
Legend position and appearance
ggpar(p,
legend = "right", legend.title = "Dose (mg)",
font.legend = c(10, "bold", "red"))
Color palettes
As mentioned above, the argument palette is used to change group color palettes. Allowed values include:
- Custom color palettes e.g. c(“blue”, “red”) or c(“#00AFBB”, “#E7B800”);
- “grey” for grey color palettes;
- brewer palettes e.g. “RdBu”, “Blues”, …; click here to see all brewer palettes.
- and scientific journal palettes from ggsci R package, e.g.: “npg”, “aaas”, “lancet”, “jco”, “ucscgb”, “uchicago”, “simpsons” and “rickandmorty”.
# Use custom color palette
ggpar(p, palette = c("#00AFBB", "#E7B800", "#FC4E07"))
# Use brewer palette
ggpar(p, palette = "Dark2" )
# Use grey palette
ggpar(p, palette = "grey")
# Use scientific journal palette from ggsci package
# Allowed values: "npg", "aaas", "lancet", "jco",
# "ucscgb", "uchicago", "simpsons" and "rickandmorty".
ggpar(p, palette = "npg") # nature
Axis limits and scales
The following arguments can be used:
- xlim, ylim: a numeric vector of length 2, specifying x and y axis limits (minimum and maximum values), respectively. e.g.: ylim = c(0, 50).
- xscale, yscale: x and y axis scale, respectively. Allowed values are one of c(“none”, “log2”, “log10”, “sqrt”); e.g.: yscale=“log2”.
- format.scale: logical value. If TRUE, axis tick mark labels will be formatted when xscale or yscale = “log2” or “log10”.
# Change y axis limits
ggpar(p, ylim = c(0, 50))
# Change y axis scale to log2
ggpar(p, yscale = "log2")
# Format axis scale
ggpar(p, yscale = "log2", format.scale = TRUE)
Axis ticks: customize tick marks and labels
The following arguments can be used:
- ticks: logical value. Default is TRUE. If FALSE, hide axis tick marks.
- tickslab: logical value. Default is TRUE. If FALSE, hide axis tick labels.
- font.tickslab: Font style (size, face, color) for tick labels, e.g.: c(14, “bold”, “red”).
- xtickslab.rt, ytickslab.rt: Rotation angle of x and y axis tick labels, respectively. Default value is 0.
- xticks.by, yticks.by: numeric value controlling x and y axis breaks, respectively. For example, if yticks.by = 5, a tick mark is shown on every 5. Default value is NULL.
# Axis tick labels style: "plain", "italic", "bold" or "bold.italic"
# Rotation angle = 45
ggpar(p, font.tickslab = c(12, "bold", "#2E9FDF"),
xtickslab.rt = 45, ytickslab.rt = 45)
# Hide ticks and tickslab
ggpar(p, ticks = FALSE, tickslab = FALSE)
Themes
The R package ggpubr contains two main functions for changing the default ggplot theme to a publication ready theme:
- theme_pubr(): change the theme to a publication ready theme
- labs_pubr(): Format only plot labels to a publication ready style
theme_pubr() will produce plots with bold axis labels, bold tick mark labels and legend at the bottom leaving extra space for the plotting area.
# Gray theme
p + theme_gray()
# Minimal theme
p + theme_minimal()
# Format only plot labels to a publication ready style
# by using the function labs_pubr()
p + theme_minimal() + labs_pubr(base_size = 16)
Rotate a plot
- Create some data
set.seed(1234)
wdata = data.frame(
sex = factor(rep(c("F", "M"), each=200)),
weight = c(rnorm(200, 55), rnorm(200, 58)))
- Create a density plot and change plot orientation
# Basic density plot
p <- ggdensity(wdata, x = "weight") + theme_gray()
p
# Horizontal plot
ggpar(p, orientation = "horizontal" ) + theme_gray()
# y axis reversed
ggpar(p, orientation = "reverse" ) + theme_gray()
More
See the online documentation (http://www.sthda.com/english/rpkgs/ggpubr) for a complete list.
Infos
This analysis has been performed using R software (ver. 3.2.4) and ggpubr (ver. 0.1.0.999)