Density based and Grid based
clustering
Density-based clustering with DBSCAN.
In
density-based clustering clusters are defined as areas of higher density than
the remainder of the data set. Objects in these sparse areas - that are
required to separate clusters - are usually considered to be noise and border
points.
The
most popular density based clustering method is DBSCAN. In contrast to many newer methods, it features a
well-defined cluster model called "density-reach ability". Similar to
linkage based clustering; it is based on connecting points within certain
distance thresholds. However, it only connects points that satisfy a density
criterion, in the original variant defined as a minimum number of other objects
within this radius. A cluster consists of all density-connected objects (which
can form a cluster of an arbitrary shape, in contrast to many other methods) plus
all objects that are within these objects' range. Another interesting property
of DBSCAN is that its complexity is fairly low - it requires a linear number of
range queries on the database - and that it will discover essentially the same
results (it is deterministic
for core and noise points, but not for border points) in each run, therefore
there is no need to run it multiple times. OPTICS is a generalization of DBSCAN that removes the need to
choose an appropriate value for the range parameter
,
and produces a hierarchical result related to that of linkage clustering.
DeLi-Clu Density-Link-Clustering combines ideas from single-linkage clustering and OPTICS, eliminating the
parameter
entirely and offering performance improvements over OPTICS by using an R-tree index.
The
key drawback of DBSCAN
and OPTICS is that they expect some kind of density drop to detect
cluster borders. Moreover they can not detect intrinsic cluster structures
which are prevalent in the majority of real life data. A variation of DBSCAN, EnDBSCAN efficiently detects such kinds of structures. On data sets
with, for example, overlapping Gaussian distributions - a common use case in
artificial data - the cluster borders produced by these algorithms will often
look arbitrary, because the cluster density decreases continuously. On a data set
consisting of mixtures of Gaussians, these algorithms are nearly always
outperformed by methods such as EM clustering that are able to precisely model
this kind of data.
density-based clustering examplesDensity-based clustering with DBSCAN.
DBSCAN assumes clusters of similar density, and may have problems
separating nearby clusters
OPTICS is a DBSCAN variant that handles different densities much
better
No comments:
Post a Comment