template<class Distance_ = EuclideanDistance, class Matrix_ = SimpleMatrix<int, int, double>, typename Float_ = double>
class knncolle::KmknnBuilder< Distance_, Matrix_, Float_ >
Perform a nearest neighbor search based on k-means clustering.
In the k-means with k-nearest neighbors (KMKNN) algorithm (Wang, 2012), k-means clustering is first applied to the data points, with the number of cluster centers defined as the square root of the number of points. The cluster assignment and distance to the assigned cluster center for each point represent the KMKNN indexing information, allowing us to speed up the nearest neighbor search by exploiting the triangle inequality between cluster centers, the query point and each point in the cluster to narrow the search space. The advantage of the KMKNN approach is its simplicity and minimal overhead, resulting in performance improvements over conventional tree-based methods for high-dimensional data where most points need to be searched anyway.
- Template Parameters
-
Distance_ | Class to compute the distance between vectors, see distance::Euclidean for an example. |
Matrix_ | Matrix-like object satisfying the MockMatrix contract. |
Float_ | Floating point type for the query data and output distances. |
- See also
- Wang X (2012). A fast exact k-nearest neighbors algorithm for high dimensional search using k-means clustering and triangle inequality. Proc Int Jt Conf Neural Netw, 43, 6:2351-2358.