[go: up one dir, main page]

Dbscan

Trait Dbscan 

Source
pub trait Dbscan<T>
where T: GeoFloat,
{ // Required method fn dbscan(&self, epsilson: T, min_samples: usize) -> Vec<Option<usize>>; }
Expand description

Perform DBSCAN (Density-Based Spatial Clustering of Applications with Noise) clustering on a set of points.

Based on: Ester, M., Kriegel, H., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), pp 226-231.

DBSCAN is a density-based clustering algorithm that groups together points that are closely packed together (points with many nearby neighbours), marking as outliers points that lie alone in low-density regions.

§Parameters

  • epsilon: The maximum distance between two points for one to be considered as in the neighbourhood of the other. This is not a maximum bound on the distances of points within a cluster. This is the most important DBSCAN parameter to choose appropriately for your dataset
  • min_samples: The number of points (or total weight) in a neighbourhood for a point to be considered as a core point. This includes the point itself. Larger values lead to more conservative clusters.

§Returns

A vector of cluster labels, one for each input point, in the same order as the input points:

  • Some(cluster_id): The point belongs to cluster with ID cluster_id (starting from 0)
  • None: The point is considered noise (doesn’t belong to any cluster)

§Algorithm

The algorithm works as follows:

  1. For each unvisited point, find all points within distance epsilon (the point’s neighbourhood)
  2. If the neighbourhood contains at least min_samples points (including the point itself), start a new cluster:
    • Add all points in the neighbourhood to the cluster
    • For each newly added point, if it’s also a core point (has >= min_samples neighbours), add its neighbours to the cluster (expansion)
  3. If the neighbourhood contains fewer than min_samples points, mark the point as noise (it may be reassigned later if it’s in another point’s neighbourhood)
  4. Repeat until all points have been visited

§Notes

This implementation uses an R-tree spatial index for efficient neighbourhood queries, and should be O(n log n) for typical cases.

§Examples

§Basic clustering with MultiPoint

use geo::{Dbscan, MultiPoint, point};

let points = MultiPoint::new(vec![
    point!(x: 0.0, y: 0.0),
    point!(x: 1.0, y: 0.0),
    point!(x: 0.0, y: 1.0),
    point!(x: 10.0, y: 10.0),
    point!(x: 11.0, y: 10.0),
    point!(x: 10.0, y: 11.0),
]);

let labels = points.dbscan(2.0, 2);

// Points 0, 1, 2 form one cluster
assert_eq!(labels[0], Some(0));
assert_eq!(labels[1], Some(0));
assert_eq!(labels[2], Some(0));

// Points 3, 4, 5 form another cluster
assert_eq!(labels[3], Some(1));
assert_eq!(labels[4], Some(1));
assert_eq!(labels[5], Some(1));

§Detecting noise points

use geo::{Dbscan, point, Point};

let points = vec![
    point!(x: 0.0, y: 0.0),
    point!(x: 1.0, y: 0.0),
    point!(x: 0.0, y: 1.0),
    point!(x: 100.0, y: 100.0), // outlier
];

let labels = points.dbscan(2.0, 2);

// First three points form a cluster
assert_eq!(labels[0], Some(0));
assert_eq!(labels[1], Some(0));
assert_eq!(labels[2], Some(0));

// Last point is noise
assert_eq!(labels[3], None);

§Using with a Vec of Points

use geo::{Dbscan, point};

let points = vec![
    point!(x: 0.0, y: 0.0),
    point!(x: 1.0, y: 1.0),
    point!(x: 1.0, y: 0.0),
    point!(x: 0.0, y: 1.0),
];

let labels = points.dbscan(1.5, 3);

// All points form a single cluster
assert!(labels.iter().all(|&label| label == Some(0)));

Required Methods§

Source

fn dbscan(&self, epsilson: T, min_samples: usize) -> Vec<Option<usize>>

Perform DBSCAN clustering on the points.

See the module-level documentation for details on the algorithm and parameters.

Implementations on Foreign Types§

Source§

impl<T> Dbscan<T> for [Point<T>]
where T: GeoFloat,

Source§

fn dbscan(&self, epsilon: T, min_samples: usize) -> Vec<Option<usize>>

Implementors§

Source§

impl<T> Dbscan<T> for &MultiPoint<T>
where T: GeoFloat,

Source§

impl<T> Dbscan<T> for MultiPoint<T>
where T: GeoFloat,