Description
Subset and summarize the results of Principal Component Analysis (PCA), Correspondence Analysis (CA) and Multiple Correspondence Analysis (MCA) functions from several packages.
The function facto_summarize() [in factoextra package] is used.
Install and load factoextra
The package devtools is required for the installation as factoextra is hosted on github.
# install.packages("devtools")
devtools::install_github("kassambara/factoextra")
Load factoextra :
library("factoextra")
Usage
facto_summarize(X, element, result = c("coord", "cos2", "contrib"),
axes = 1:2, select = NULL)
Arguments
Argument | Description |
---|---|
X | an object of class PCA, CA and MCA [FactoMineR]; prcomp and princomp [stats]; dudi, pca, coa and acm [ade4]; ca [ca package]. |
element | allowed values are row and col for CA; var and ind for PCA or MCA. |
result | the result to be extracted for the element. Possible values are the combination of c(cos2, contrib, coord). |
axes | a numeric vector specifying the axes of interest. Default values are 1:2 for axes 1 and 2. |
select | a selection of variables. Allowed values are NULL or a list containing the arguments name, cos2 or contrib. Default is list(name = NULL, cos2 = NULL, contrib = NULL):
|
Details
If length(axes) > 1, then the columns contrib and cos2 correspond to the total contributions and total cos2 of the axes. In this case, the column coord is calculated as x^2 + y^2 + +; x, y, are the coordinates of the points on the specified axes.
Value
A data frame containing the (total) coord, cos2 and the contribution for the axes.
Examples
Principal component analysis
A principal component analysis (PCA) is performed using the built-in R function prcomp() and the decathlon2 [in factoextra] data
data(decathlon2)
decathlon2.active <- decathlon2[1:23, 1:10]
res.pca <- prcomp(decathlon2.active, scale = TRUE)
# Summarize variables on axes 1:2
facto_summarize(res.pca, "var", axes = 1:2)[,-1]
Dim.1 Dim.2 coord cos2 contrib
X100m -0.850625692 0.17939806 0.7557477 0.7557477 75.57477
Long.jump 0.794180641 -0.28085695 0.7096035 0.7096035 70.96035
Shot.put 0.733912733 -0.08540412 0.5459218 0.5459218 54.59218
High.jump 0.610083985 0.46521415 0.5886267 0.5886267 58.86267
X400m -0.701603377 -0.29017826 0.5764507 0.5764507 57.64507
X110m.hurdle -0.764125197 0.02474081 0.5844994 0.5844994 58.44994
Discus 0.743209016 -0.04966086 0.5548258 0.5548258 55.48258
Pole.vault -0.217268042 -0.80745110 0.6991827 0.6991827 69.91827
Javeline 0.428226639 -0.38610928 0.3324584 0.3324584 33.24584
X1500m 0.004278487 -0.78448019 0.6154275 0.6154275 61.54275
# Select the top 5 contributing variables
facto_summarize(res.pca, "var", axes = 1:2,
select = list(contrib = 5))[,-1]
Dim.1 Dim.2 coord cos2 contrib
X100m -0.850625692 0.1793981 0.7557477 0.7557477 75.57477
Long.jump 0.794180641 -0.2808570 0.7096035 0.7096035 70.96035
Pole.vault -0.217268042 -0.8074511 0.6991827 0.6991827 69.91827
X1500m 0.004278487 -0.7844802 0.6154275 0.6154275 61.54275
High.jump 0.610083985 0.4652142 0.5886267 0.5886267 58.86267
# Select variables with cos2 >= 0.6
facto_summarize(res.pca, "var", axes = 1:2,
select = list(cos2 = 0.6))[,-1]
Dim.1 Dim.2 coord cos2 contrib
X100m -0.850625692 0.1793981 0.7557477 0.7557477 75.57477
Long.jump 0.794180641 -0.2808570 0.7096035 0.7096035 70.96035
Pole.vault -0.217268042 -0.8074511 0.6991827 0.6991827 69.91827
X1500m 0.004278487 -0.7844802 0.6154275 0.6154275 61.54275
# Select by names
facto_summarize(res.pca, "var", axes = 1:2,
select = list(name = c("X100m", "Discus", "Javeline")))[,-1]
Dim.1 Dim.2 coord cos2 contrib
X100m -0.8506257 0.17939806 0.7557477 0.7557477 75.57477
Discus 0.7432090 -0.04966086 0.5548258 0.5548258 55.48258
Javeline 0.4282266 -0.38610928 0.3324584 0.3324584 33.24584
# Summarize individuals on axes 1:2
facto_summarize(res.pca, "ind", axes = 1:2)[,-1]
Dim.1 Dim.2 coord cos2 contrib
SEBRLE 0.1912074 -1.5541282 2.4518746 0.5050034 10.660324
CLAY 0.7901217 -2.4204156 6.4827039 0.5057178 28.185669
BERNARD -1.3292592 -1.6118687 4.3650507 0.4871654 18.978481
YURKOV -0.8694134 0.4328779 0.9432630 0.1199355 4.101143
ZSIVOCZKY -0.1057450 2.0233632 4.1051806 0.5779938 17.848611
McMULLEN 0.1185550 0.9916237 0.9973729 0.1543704 4.336404
MARTINEAU -2.3923532 1.2849234 7.3743818 0.5205607 32.062530
HERNU -1.8910497 -1.1784614 4.9648401 0.5543447 21.586261
BARRAS -1.7744575 0.4125321 3.3188820 0.6495490 14.429922
NOOL -2.7770058 1.5726757 10.1850700 0.6469840 44.282913
BOURGUIGNON -4.4137335 -1.2635770 21.0776704 0.9301572 91.642045
Sebrle 3.4514485 -1.2169193 13.3933893 0.7593400 58.232127
Clay 3.3162243 -1.6232908 13.6324164 0.8523470 59.271375
Karpov 4.0703560 0.7983510 17.2051623 0.8138146 74.805053
Macey 1.8484623 2.0638828 7.6764252 0.8165181 33.375762
Warners 1.3873514 -0.2819083 2.0042163 0.2662078 8.713984
Zsivoczky 0.4715533 0.9267436 1.0812163 0.2190667 4.700940
Hernu 0.2763118 1.1657260 1.4352654 0.4666709 6.240284
Bernard 1.3672590 1.4780354 4.0539857 0.6274807 17.626025
Schwarzl -0.7102777 -0.6584251 0.9380181 0.2170229 4.078340
Pogorelov -0.2143524 -0.8610557 0.7873639 0.1337231 3.423321
Schoenbeck -0.4953166 -1.3000530 1.9354762 0.5291161 8.415114
Barras -0.3158867 0.8193681 0.7711485 0.1466237 3.352820
Correspondence Analysis
The function CA() in FactoMineR package is used:
# Install and load FactoMineR to compute CA
# install.packages("FactoMineR")
library("FactoMineR")
data("housetasks")
res.ca <- CA(housetasks, graph = FALSE)
# Summarize row variables on axes 1:2
facto_summarize(res.ca, "row", axes = 1:2)[,-1]
Dim.1 Dim.2 coord cos2 contrib
Laundry -0.9918368 0.4953220 1.2290841 0.9245395 12.403601
Main_meal -0.8755855 0.4901092 1.0068569 0.9739621 8.833091
Dinner -0.6925740 0.3081043 0.5745869 0.9303433 3.558222
Breakfeast -0.5086002 0.4528038 0.4637054 0.9051733 3.722406
Tidying -0.3938084 -0.4343444 0.3437401 0.9748275 2.404604
Dishes -0.1889641 -0.4419662 0.2310416 0.7642703 1.497001
Shopping -0.1176813 -0.4033171 0.1765136 0.8113088 1.214543
Official 0.2266324 0.2536132 0.1156819 0.1194711 0.636781
Driving 0.7417696 0.6534143 0.9771724 0.7672477 7.788243
Finances 0.2707669 -0.6178684 0.4550760 0.9973464 2.948600
Insurance 0.6470759 -0.4737832 0.6431778 0.8848140 5.126245
Repairs 1.5287787 0.8642647 3.0841176 0.9326072 29.178865
Holidays 0.2524863 -1.4350066 2.1229933 0.9921522 19.477003
# Summarize column variables on axes 1:2
facto_summarize(res.ca, "col", axes = 1:2)[,-1]
Dim.1 Dim.2 coord cos2 contrib
Wife -0.83762154 0.3652207 0.83499601 0.9543242 28.72693
Alternating -0.06218462 0.2915938 0.08889388 0.1098815 1.29467
Husband 1.16091847 0.6019199 1.71003929 0.9795683 37.35808
Jointly 0.14942609 -1.0265791 1.07619274 0.9979998 31.40952
Multiple Correspondence Analysis
The function MCA() in FactoMineR package is used:
library(FactoMineR)
data(poison)
res.mca <- MCA(poison, quanti.sup = 1:2,
quali.sup = 3:4, graph=FALSE)
# Summarize variables on axes 1:2
res <- facto_summarize(res.mca, "var", axes = 1:2)
head(res)
name Dim.1 Dim.2 coord cos2 contrib
Nausea_n Nausea_n 0.2673909 0.12139029 0.08623348 0.3090033 0.6128991
Nausea_y Nausea_y -0.9581506 -0.43498187 1.10726185 0.3090033 2.1962218
Vomit_n Vomit_n 0.4790279 -0.40919465 0.39690803 0.5953620 2.1649529
Vomit_y Vomit_y -0.7185419 0.61379197 0.89304306 0.5953620 3.2474293
Abdo_n Abdo_n 1.3180221 -0.03574501 1.73845988 0.8457372 5.1722773
Abdo_y Abdo_y -0.6411999 0.01738946 0.41143974 0.8457372 2.5162430
# Summarize individuals on axes 1:2
res <- facto_summarize(res.mca, "ind", axes = 1:2)
head(res)
name Dim.1 Dim.2 coord cos2 contrib
1 1 -0.4525811 -0.26415072 0.2746052 0.46457063 0.4992822
2 2 0.8361700 -0.03193457 0.7002000 0.55670644 1.2730909
3 3 -0.4481892 0.13538726 0.2192032 0.59815656 0.3985513
4 4 0.8803694 -0.08536230 0.7823370 0.75476958 1.4224310
5 5 -0.4481892 0.13538726 0.2192032 0.59815656 0.3985513
6 6 -0.3594324 -0.43604390 0.3193260 0.06143111 0.5805927
Infos
This analysis has been performed using R software (ver. 3.1.2) and factoextra (ver. 1.0.2)