Cluster Analysis:
Cluster analysis groups objects
(observations, events) based on the information found in the data describing
the objects or their relationships. The goal is that the objects in a group
will be similar (or related) to one other and different from (or unrelated to)
the objects in other groups. The greater the similarity (or homogeneity) within
a group, and the greater the difference between groups, the “better” or more
distinct the clustering. The definition of what constitutes a cluster is not
well defined, and, in many applications clusters are not well separated from
one another. Nonetheless, most cluster analysis seeks as a result, a crisp
classification of the data into non-overlapping groups. Fuzzy clustering is an
exception to this, and allows an object to partially belong to several groups.
Cluster analysis is a classification of
objects from the data, where by classification we mean a labeling of objects
with class (group) labels. As such, clustering does not use previously assigned
class labels, except perhaps for verification of how well the clustering
worked. Thus, cluster analysis is distinct from pattern recognition or the
areas of statistics know as discriminant analysis and decision analysis, which
seek to find rules for classifying objects given a set of pre-classified
objects.
While cluster analysis can be useful in the previously mentioned
areas, either directly or as a preliminary means of finding classes, there is
much more to these areas than cluster analysis. For example, the decision of
what features to use when representing objects is a key activity of fields such
as pattern recognition. Cluster analysis typically takes the features as given
and proceeds from there. Thus, cluster analysis, while a useful tool in many
areas (as described later), is normally only part of a solution to a larger
problem which typically involves other steps and techniques.
No comments:
Post a Comment