Course Content
Cluster Analysis
Cluster Analysis
How DBSCAN Works?
DBSCAN operates based on the idea of density reachability. It defines clusters as dense regions of data points separated by areas of lower density. Two key parameters govern its behavior:
-
Epsilon (Ξ΅): the radius within which you search for neighboring points;
-
Minimum number of points (MinPts): the minimum number of points required within the Ξ΅-radius to form a dense region (including the point itself).
DBSCAN classifies points into three categories:
-
Core points: a point is a core point if it has at least MinPts within its Ξ΅-radius;
-
Border points: a point is a border point if it has fewer than MinPts within its Ξ΅-radius but is reachable from a core point (i.e., within the Ξ΅-radius of a core point);
-
Noise points: a point that is neither a core point nor a border point is considered a noise point.
Algorithm
-
Start with an arbitrary unvisited point;
-
Find all points within its Ξ΅-radius;
-
If a point has at least MinPts neighbors within an Ξ΅-radius, it's marked as a core point, initiating a new cluster that recursively expands by adding all directly density-reachable points;
-
If the number of points within the Ξ΅-radius is less than MinPts, mark the point as a border point (if it's within the Ξ΅-radius of a core point) or a noise point (if it's not);
-
Repeat steps 1-4 until all points are visited.
Imagine a scatter plot of data points. DBSCAN would start by picking a point. If it finds enough neighbors within its Ξ΅-radius, it marks it as a core point and starts forming a cluster. It then expands this cluster by checking the neighbors of the core point and their neighbors, and so on. Points that are close to a core point but don't have enough neighbors themselves are marked as border points. Points that are isolated are identified as noise.
Thanks for your feedback!