As illustrated in my previous article, correspondence analysis (CA) is used to analyse the contingency table formed by two categorical variables.
This article describes how to perform correspondence analysis using MASS package
Required packages
MASS(for computing CA) and factoextra (for CA visualization) packages are used.
These packages can be installed as follow :
install.packages("MASS")
# install.packages("devtools")
devtools::install_github("kassambara/factoextra")
Note that, for factoextra a version >= 1.0.1 is required for this tutorial. If its already installed on your computer, you should re-install it to have the most updated version.
Load MASS and factoextra
library("MASS")
library("factoextra")
Data format
Well use the data sets housetasks [in factoextra].
data(housetasks)
head(housetasks)
Wife Alternating Husband Jointly
Laundry 156 14 2 4
Main_meal 124 20 5 4
Dinner 77 11 7 13
Breakfeast 82 36 15 7
Tidying 53 11 1 57
Dishes 32 24 4 53
The data is contingency table containing 13 housetasks and their repartition in the couple :
- rows are the different tasks
- values are the frequencies of the tasks done :
- by the wife only
- alternatively
- by the husband only
- or jointly
Correspondence analysis (CA)
The function corresp() [in MASS package] can be used. A simplified format is :
corresp(x, nf = 1)
- x : a data frame, matrix or table (contingency table)
- nf : number of dimensions to be included in the output
Example of usage :
res.ca <- corresp(housetasks, nf= 3)
The output of the function corresp() is an object of class correspondence structured as a list including :
names(res.ca)
[1] "cor" "rscore" "cscore" "Freq"
- cor: the square root of eigenvalues
- rscore, cscore: the row and column scores
- Freq: the initial contingency table
Interpretation of CA outputs
For the interpretation of result, read this article: Correspondence Analysis in R: The Ultimate Guide for the Analysis, the Visualization and the Interpretation.
Eigenvalues and scree plot
The proportion of inertia explained by the principal axes can be obtained using the function get_eigenvalue() [in factoextra] as follow :
eigenvalues <- get_eigenvalue(res.ca)
eigenvalues
eigenvalue variance.percent cumulative.variance.percent
Dim.1 0.5428893 48.69222 48.69222
Dim.2 0.4450028 39.91269 88.60491
Dim.3 0.1270484 11.39509 100.00000
The function fviz_screeplot() [in factoextra package] can be used to draw the scree plot (the percentages of inertia explained by the CA dimensions):
fviz_screeplot(res.ca)
Read more about eigenvalues and screeplot: Eigenvalues data visualization
Biplot of row and column variables
You can use the base R function biplot(res.ca) or use the function the function fviz_ca_biplot()[in factoextra package] to draw a nice looking plot:
fviz_ca_biplot(res.ca)
# Change the theme
fviz_ca_biplot(res.ca) +
theme_minimal()
Read more about fviz_ca_biplot(): fviz_ca_biplot
Row variables
The function get_ca_row()[in factoextra] is used to extract the results for row variables. This functions returns a list containing the coordinates, the cos2, the contribution and the inertia of row variables. The function fviz_ca_row() [in factoextra] is used to visualize only row points.
row <- get_ca_row(res.ca)
row
Correspondence Analysis - Results for rows
===================================================
Name Description
1 "$coord" "Coordinates for the rows"
2 "$cos2" "Cos2 for the rows"
3 "$contrib" "contributions of the rows"
4 "$inertia" "Inertia of the rows"
# Coordinates
head(row$coord)
Dim.1 Dim.2 Dim.3
Laundry -0.9918368 -0.4953220 -0.31672897
Main_meal -0.8755855 -0.4901092 -0.16406487
Dinner -0.6925740 -0.3081043 -0.20741377
Breakfeast -0.5086002 -0.4528038 0.22040453
Tidying -0.3938084 0.4343444 -0.09421375
Dishes -0.1889641 0.4419662 0.26694926
# Visualize row variables only
fviz_ca_row(res.ca) +
theme_minimal()
Column varables
The result for columns gives the same information as described for rows.
col <- get_ca_col(res.ca)
# Coordinates
head(col$coord)
Dim.1 Dim.2 Dim.3
Wife -0.83762154 -0.3652207 -0.19991139
Alternating -0.06218462 -0.2915938 0.84858939
Husband 1.16091847 -0.6019199 -0.18885924
Jointly 0.14942609 1.0265791 -0.04644302
# Visualize column variables only
fviz_ca_col(res.ca) +
theme_minimal()
References and further reading
- Correspondence Analysis in R: The Ultimate Guide for the Analysis, the Visualization and the Interpretation
- Correspondence Analysis using ade4 and factoextra
- Oleg Nenadic and Michael Greenacre. Correspondence Analysis in R, with Two- and. Three-dimensional Graphics: The ca Package. Journal of Statistical Software, May 2007. http://www.jstatsoft.org/v20/i03/paper
Infos
This analysis has been performed using R software (ver. 3.1.2), FactoMineR (ver. ) and factoextra (ver. 1.0.2)