Fuzzy Clustering (soft clustering, soft k-means) is a type of clustering (grouping) of data points into Clusters which are defined as Fuzzy sets. Membership of each each data point in a fuzzy cluster (set) is not defined in terms of Boolean true or false (0 or 1), but rather as a value from the [0, 1] interval of real numbers (degree of membership). Consequently, each data point can belong to more than one cluster with various degrees. For a given number c of clusters, we can define the membership of a data point x ∈ ℝd in the cluster Ck, k=1,...,c, as wk(x) .Centroid ck of the kth cluster is defined as a weighted average of all points x, weighted by their degree of membership in the cluster wk(x): ck =
∑
x
wk(x)mx
∑
x
wk(x)m
, where m is the fuzziness parameter that controls how fuzzy the cluster will be. Larger m makes the cluster fuzzier. To calculate Fuzzy C-Mean Clustering, first specify the number of Clusters C, and the dimension d of the space ℝd that data points belong to. By default, the fuzziness parameter m is set to 2, which is the most commonly used value, though you can change it other positive integer vale. Finally specify the input data points whose clustering you want to calculate. Please Note:
We have limited the dimension d in the "Manual Entry" tab to 10 due to the limited space on the screen.
After filling the valid value in the rightmost cell of the last row of the table, new empty row (data point) will be added.
Rows (data points) which have all of the cells empty will be disregarded in calculations.
Calculate Fuzzy C-Means Clustering of the data provided in the file to be uploaded.
Input File format:
File must be an ASCII file in CSV (comma separated values) format.
File can contain multiple sets of data points whose clusterings you want to find.
First line of each set of the datapoints is preceded by a keyword FUZZY_C_MEANS (case insensitive). Anything in the line after after that keword will be ignored.
Next line must contain an integer number of clusters, and optionally the fuzziness parameter (separated by comma). If the fuzziness parameter is ommited, the default value of 2 will be used.
Third line must contain the number of the data points, followed (comma separated) by the dimesnision of the space that data points belong to.
Finaly the data points are specified, with each data point on a separate line.
Whitespaces and empty lines are ignored.
Output File format:
Output File is an ASCII file in CSV (comma separated values) format.
For each of data set in the input file, there will be a corresponding result set precedded by the keyword SOLUTION in the first line.
Second line of the result set will containing the keyword CLUSTERS, followed by the number of clsters. (C)
Next, comes the C lines, containing cluster's centroids in each line.
After that comes a line containing keyword MEMBERSHIP, followed by the number of the data points n.
Finally comes a n⨯C table of membership values for each of the n data points and C clusters.