pub trait Dbscan<T>where
T: GeoFloat,{
// Required method
fn dbscan(&self, epsilson: T, min_samples: usize) -> Vec<Option<usize>>;
}Expand description
Perform DBSCAN (Density-Based Spatial Clustering of Applications with Noise) clustering on a set of points.
Based on: Ester, M., Kriegel, H., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), pp 226-231.
DBSCAN is a density-based clustering algorithm that groups together points that are closely packed together (points with many nearby neighbours), marking as outliers points that lie alone in low-density regions.
§Parameters
epsilon: The maximum distance between two points for one to be considered as in the neighbourhood of the other. This is not a maximum bound on the distances of points within a cluster. This is the most important DBSCAN parameter to choose appropriately for your datasetmin_samples: The number of points (or total weight) in a neighbourhood for a point to be considered as a core point. This includes the point itself. Larger values lead to more conservative clusters.
§Returns
A vector of cluster labels, one for each input point, in the same order as the input points:
Some(cluster_id): The point belongs to cluster with IDcluster_id(starting from 0)None: The point is considered noise (doesn’t belong to any cluster)
§Algorithm
The algorithm works as follows:
- For each unvisited point, find all points within distance
epsilon(the point’s neighbourhood) - If the neighbourhood contains at least
min_samplespoints (including the point itself), start a new cluster:- Add all points in the neighbourhood to the cluster
- For each newly added point, if it’s also a core point (has >=
min_samplesneighbours), add its neighbours to the cluster (expansion)
- If the neighbourhood contains fewer than
min_samplespoints, mark the point as noise (it may be reassigned later if it’s in another point’s neighbourhood) - Repeat until all points have been visited
§Notes
This implementation uses an R-tree spatial index for efficient neighbourhood queries, and should be O(n log n) for typical cases.
§Examples
§Basic clustering with MultiPoint
use geo::{Dbscan, MultiPoint, point};
let points = MultiPoint::new(vec![
point!(x: 0.0, y: 0.0),
point!(x: 1.0, y: 0.0),
point!(x: 0.0, y: 1.0),
point!(x: 10.0, y: 10.0),
point!(x: 11.0, y: 10.0),
point!(x: 10.0, y: 11.0),
]);
let labels = points.dbscan(2.0, 2);
// Points 0, 1, 2 form one cluster
assert_eq!(labels[0], Some(0));
assert_eq!(labels[1], Some(0));
assert_eq!(labels[2], Some(0));
// Points 3, 4, 5 form another cluster
assert_eq!(labels[3], Some(1));
assert_eq!(labels[4], Some(1));
assert_eq!(labels[5], Some(1));§Detecting noise points
use geo::{Dbscan, point, Point};
let points = vec![
point!(x: 0.0, y: 0.0),
point!(x: 1.0, y: 0.0),
point!(x: 0.0, y: 1.0),
point!(x: 100.0, y: 100.0), // outlier
];
let labels = points.dbscan(2.0, 2);
// First three points form a cluster
assert_eq!(labels[0], Some(0));
assert_eq!(labels[1], Some(0));
assert_eq!(labels[2], Some(0));
// Last point is noise
assert_eq!(labels[3], None);§Using with a Vec of Points
use geo::{Dbscan, point};
let points = vec![
point!(x: 0.0, y: 0.0),
point!(x: 1.0, y: 1.0),
point!(x: 1.0, y: 0.0),
point!(x: 0.0, y: 1.0),
];
let labels = points.dbscan(1.5, 3);
// All points form a single cluster
assert!(labels.iter().all(|&label| label == Some(0)));