1 Required packages

Three R packages are required for this chapter:

cluster and e1071 for computing fuzzy clustering
factoextra for visualizing clusters

install.packages("cluster")
install.packages("e1071")
install.packages("factoextra")

2 Concept of fuzzy clustering

In K-means or PAM clustering, the data is divided into distinct clusters, where each element is affected exactly to one cluster. This type of clustering is also known as hard clustering or non-fuzzy clustering. Unlike K-means, Fuzzy clustering is considered as a soft clustering, in which each element has a probability of belonging to each cluster. In other words, each element has a set of membership coefficients corresponding to the degree of being in a given cluster.

Points close to the center of a cluster, may be in the cluster to a higher degree than points in the edge of a cluster. The degree, to which an element belongs to a given cluster, is a numerical value in [0, 1].

Fuzzy c-means (FCM) algorithm is one of the most widely used fuzzy clustering algorithms. It was developed by Dunn in 1973 and improved by Bezdek in 1981. Its frequently used in pattern recognition.

3 Algorithm of fuzzy clustering

FCM algorithm is very similar to the k-means algorithm and the aim is to minimize the objective function defined as follow:

\[ \sum\limits_{j=1}^k \sum\limits_{x_i \in C_j} u_{ij}^m (x_i - \mu_j)^2 \]

Where,

\(u_{ij}\) is the degree to which an observation \(x_i\) belongs to a cluster \(c_j\)
\(\mu_j\) is the center of the cluster j
\(u_{ij}\) is the degree to which an observation \(x_i\) belongs to a cluster \(c_j\)
\(m\) is the fuzzifier.

It can be seen that, FCM differs from k-means by using the membership values \(u_{ij}\) and the fuzzifier \(m\).

The variable \(u_{ij}^m\) is defined as follow:

\[ u_{ij}^m = \frac{1}{\sum\limits_{l=1}^k \left( \frac{| x_i - c_j |}{| x_i - c_k |}\right)^{\frac{2}{m-1}}} \]

The degree of belonging, \(u_{ij}\), is linked inversely to the distance from x to the cluster center.

The parameter \(m\) is a real number greater than 1 (\(1.0 < m < \infty\)) and it defines the level of cluster fuzziness. Note that, a value of \(m\) close to 1 gives a cluster solution which becomes increasingly similar to the solution of hard clustering such as k-means; whereas a value of \(m\) close to infinite leads to complete fuzzyness.

Note that, a good choice is to use m = 2.0 (Hathaway and Bezdek 2001).

In fuzzy clustering the centroid of a cluster is he mean of all points, weighted by their degree of belonging to the cluster:

\[ C_j = \frac{\sum\limits_{x \in C_j} u_{ij}^m x}{\sum\limits_{x \in C_j} u_{ij}^m} \]

Where,

\(C_j\) is the centroid of the cluster j
\(u_{ij}\) is the degree to which an observation \(x_i\) belongs to a cluster \(c_j\)

The algorithm of fuzzy clustering can be summarize as follow:

Specify a number of clusters k (by the analyst)
Assign randomly to each point coefficients for being in the clusters.
Repeat until the maximum number of iterations (given by maxit) is reached, or when the algorithm has converged (that is, the coefficients change between two iterations is no more than \(\epsilon\), the given sensitivity threshold):
- Compute the centroid for each cluster, using the formula above.
- For each point, compute its coefficients of being in the clusters, using the formula above.

The algorithm minimizes intra-cluster variance as well, but has the same problems as k-means; the minimum is a local minimum, and the results depend on the initial choice of weights. Hence, different initializations may lead to different results.

Using a mixture of Gaussians along with the expectation-maximization algorithm is a more statistically formalized method which includes some of these ideas: partial membership in classes.

3.1 R functions for fuzzy clustering

3.1.1 fanny(): Fuzzy analysis clustering

The function fanny() [in cluster package] can be used to compute fuzzy clustering. FANNY stands for fuzzy analysis clustering. A simplified format is:

fanny(x, k, memb.exp = 2, metric = "euclidean", 
      stand = FALSE, maxit = 500)

x: A data matrix or data frame or dissimilarity matrix
k: The desired number of clusters to be generated
memb.exp: The membership exponent (strictly larger than 1) used in the fit criteria. Its also known as the fuzzifier
metric: The metric to be used for calculating dissimilarities between observations
stand: Logical; if true, the measurements in x are standardized before calculating the dissimilarities
maxit: maximal number of iterations

The function fanny() returns an object including the following components:

membership: matrix containing the degree to which each observation belongs to a given cluster. Column names are the clusters and rows are observations
coeff: Dunns partition coefficient F(k) of the clustering, where k is the number of clusters. F(k) is the sum of all squared membership coefficients, divided by the number of observations. Its value is between 1/k and 1. The normalized form of the coefficient is also given. It is defined as \((F(k) - 1/k) / (1 - 1/k)\), and ranges between 0 and 1. A low value of Dunns coefficient indicates a very fuzzy clustering, whereas a value close to 1 indicates a near-crisp clustering.
clustering: the clustering vector containing the nearest crisp grouping of observations

A subset of USArrests data is used in the following example:

library(cluster)
set.seed(123)
# Load the data
data("USArrests")

# Subset of USArrests
ss <- sample(1:50, 20)
df <- scale(USArrests[ss,])

# Compute fuzzy clustering
res.fanny <- fanny(df, 4)

# Cluster plot using fviz_cluster()
# You can use also : clusplot(res.fanny)
library(factoextra)
fviz_cluster(res.fanny, frame.type = "norm",
             frame.level = 0.68)

# Silhouette plot
fviz_silhouette(res.fanny, label = TRUE)

##   cluster size ave.sil.width
## 1       1    4          0.52
## 2       2    6          0.10
## 3       3    6          0.41
## 4       4    4          0.04

The result of fanny() function can be printed as follow:

print(res.fanny)

## Fuzzy Clustering object of class 'fanny' :                      
## m.ship.expon.        2
## objective     6.052789
## tolerance        1e-15
## iterations         215
## converged            1
## maxit              500
## n                   20
## Membership coefficients (in %, rounded):
##              [,1] [,2] [,3] [,4]
## Iowa           75   11    7    7
## Rhode Island   26   32   21   21
## Maryland        8   19   37   37
## Tennessee      10   24   33   33
## Utah           23   36   20   20
## Arizona        10   23   34   34
## Mississippi    16   25   29   29
## Wisconsin      65   15   10   10
## Virginia       17   37   23   23
## Maine          63   15   11   11
## Texas           8   25   33   33
## Louisiana       9   22   35   35
## Montana        41   26   17   17
## Michigan        8   20   36   36
## Arkansas       19   30   25   25
## New York        9   24   34   34
## Florida        10   21   35   35
## Alaska         15   24   31   31
## Hawaii         27   34   20   20
## New Jersey     16   37   23   23
## Fuzzyness coefficients:
## dunn_coeff normalized 
## 0.31337355 0.08449807 
## Closest hard clustering:
##         Iowa Rhode Island     Maryland    Tennessee         Utah 
##            1            2            3            4            2 
##      Arizona  Mississippi    Wisconsin     Virginia        Maine 
##            3            4            1            2            1 
##        Texas    Louisiana      Montana     Michigan     Arkansas 
##            3            4            1            3            2 
##     New York      Florida       Alaska       Hawaii   New Jersey 
##            3            3            4            2            2 
## 
## Available components:
##  [1] "membership"  "coeff"       "memb.exp"    "clustering"  "k.crisp"    
##  [6] "objective"   "convergence" "diss"        "call"        "silinfo"    
## [11] "data"

The different components can be extracted using the code below:

# Membership coefficient
res.fanny$membership

##                    [,1]      [,2]       [,3]       [,4]
## Iowa         0.75234997 0.1056742 0.07098791 0.07098791
## Rhode Island 0.26129280 0.3198982 0.20940449 0.20940449
## Maryland     0.07559096 0.1906031 0.36690296 0.36690296
## Tennessee    0.10351700 0.2444743 0.32600436 0.32600436
## Utah         0.23177048 0.3631831 0.20252321 0.20252321
## Arizona      0.09505979 0.2329621 0.33598906 0.33598906
## Mississippi  0.15957721 0.2511123 0.29465525 0.29465525
## Wisconsin    0.65274007 0.1530047 0.09712764 0.09712764
## Virginia     0.16856415 0.3654879 0.23297397 0.23297397
## Maine        0.62818484 0.1532966 0.10925930 0.10925930
## Texas        0.08407125 0.2465250 0.33470188 0.33470188
## Louisiana    0.09152177 0.2159634 0.34625741 0.34625741
## Montana      0.40788012 0.2556886 0.16821562 0.16821562
## Michigan     0.07811792 0.1957270 0.36307753 0.36307753
## Arkansas     0.19473888 0.2992279 0.25301662 0.25301662
## New York     0.08723572 0.2392572 0.33675356 0.33675356
## Florida      0.09725070 0.2073927 0.34767830 0.34767830
## Alaska       0.14688036 0.2428630 0.30512830 0.30512830
## Hawaii       0.26945561 0.3356724 0.19743602 0.19743602
## New Jersey   0.16160093 0.3720897 0.23315470 0.23315470

# Visualize using corrplot
library(corrplot)
corrplot(res.fanny$membership, is.corr = FALSE)

# Dunn's partition coefficient
res.fanny$coeff

## dunn_coeff normalized 
## 0.31337355 0.08449807

# Observation groups
res.fanny$clustering

##         Iowa Rhode Island     Maryland    Tennessee         Utah 
##            1            2            3            4            2 
##      Arizona  Mississippi    Wisconsin     Virginia        Maine 
##            3            4            1            2            1 
##        Texas    Louisiana      Montana     Michigan     Arkansas 
##            3            4            1            3            2 
##     New York      Florida       Alaska       Hawaii   New Jersey 
##            3            3            4            2            2

3.1.2 cmeans()

Its also possible to use the function cmeans() [in e1071 package] for computing fuzzy clustering.

cmeans(x, centers, iter.max = 100, dist = "euclidean", m = 2)

x: a data matrix where columns are variables and rows are observations
centers: Number of clusters or initial values for cluster centers
iter.max: Maximum number of iterations
dist: Possible values are euclidean or manhattan
m: A number greater than 1 giving the degree of fuzzification.

The function cmeans() returns an object of class fclust which is a list containing the following components:

centers: the final cluster centers
size: the number of data points in each cluster of the closest hard clustering
cluster: a vector of integers containing the indices of the clusters where the data points are assigned to for the closest hard clustering, as obtained by assigning points to the (first) class with maximal membership.
iter: the number of iterations performed
membership: a matrix with the membership values of the data points to the clusters
withinerror: the value of the objective function

set.seed(123)
library(e1071)
cm <- cmeans(df, 4)
cm

## Fuzzy c-means clustering with 4 clusters
## 
## Cluster centers:
##       Murder    Assault   UrbanPop       Rape
## 1  0.6290005  0.9705484  0.5006389  0.8647698
## 2  0.8560350  0.3375298 -0.7294688  0.2002994
## 3 -1.2101485 -1.2476750 -0.7277747 -1.1534135
## 4 -0.7314218 -0.6647441  1.0032068 -0.3335272
## 
## Memberships:
##                        1           2          3          4
## Iowa         0.005939255 0.009155372 0.96585947 0.01904590
## Rhode Island 0.104616576 0.098854401 0.20500209 0.59152694
## Maryland     0.697459281 0.227720539 0.02731256 0.04750762
## Tennessee    0.078024194 0.872296030 0.02111342 0.02856636
## Utah         0.049301432 0.044484100 0.08442894 0.82178552
## Arizona      0.740498081 0.118781050 0.03988867 0.10083220
## Mississippi  0.179555100 0.624367937 0.10296383 0.09311313
## Wisconsin    0.024017906 0.033630983 0.83136508 0.11098604
## Virginia     0.155690387 0.395730684 0.19167059 0.25690834
## Maine        0.021165990 0.034336946 0.89152511 0.05297195
## Texas        0.545608753 0.240753676 0.05410235 0.15953522
## Louisiana    0.275003950 0.617629141 0.04197257 0.06539434
## Montana      0.062161310 0.135620851 0.66557661 0.13664123
## Michigan     0.848927329 0.096168273 0.01784963 0.03705477
## Arkansas     0.131803310 0.565593614 0.18039386 0.12220922
## New York     0.694179984 0.131927283 0.04157413 0.13231860
## Florida      0.711655719 0.173670792 0.03979837 0.07487512
## Alaska       0.369474028 0.381553979 0.11356564 0.13540635
## Hawaii       0.064103932 0.066647766 0.14874490 0.72050340
## New Jersey   0.082015921 0.059546923 0.05743425 0.80100291
## 
## Closest hard clustering:
##         Iowa Rhode Island     Maryland    Tennessee         Utah 
##            3            4            1            2            4 
##      Arizona  Mississippi    Wisconsin     Virginia        Maine 
##            1            2            3            2            3 
##        Texas    Louisiana      Montana     Michigan     Arkansas 
##            1            2            3            1            2 
##     New York      Florida       Alaska       Hawaii   New Jersey 
##            1            1            2            4            4 
## 
## Available components:
## [1] "centers"     "size"        "cluster"     "membership"  "iter"       
## [6] "withinerror" "call"

fviz_cluster(list(data = df, cluster=cm$cluster), frame.type = "norm",
             frame.level = 0.68)

4 Infos

This analysis has been performed using R software (ver. 3.2.4)

J. C. Dunn (1973): A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters. Journal of Cybernetics 3: 32-57
J. C. Bezdek (1981): Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York Tariq Rashid: Clustering

Fuzzy clustering analysis - Unsupervised Machine Learning

1 Required packages

2 Concept of fuzzy clustering

3 Algorithm of fuzzy clustering

3.1 R functions for fuzzy clustering

3.1.1 fanny(): Fuzzy analysis clustering

3.1.2 cmeans()

4 Infos

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112