knncolle
Collection of KNN methods in C++
|
Collection of KNN algorithms. More...
Classes | |
class | BruteforceBuilder |
Perform a brute-force nearest neighbor search. More... | |
class | BruteforcePrebuilt |
Index for a brute-force nearest neighbor search. More... | |
class | BruteforceSearcher |
Brute-force nearest neighbor searcher. More... | |
class | Builder |
Interface to build nearest-neighbor search indices. More... | |
struct | EuclideanDistance |
Compute Euclidean distances between two input vectors. More... | |
class | KmknnBuilder |
Perform a nearest neighbor search based on k-means clustering. More... | |
struct | KmknnOptions |
Options for KmknnBuilder and KmknnPrebuilt construction. More... | |
class | KmknnPrebuilt |
Index for a KMKNN search. More... | |
class | KmknnSearcher |
KMKNN searcher. More... | |
class | L2NormalizedBuilder |
Wrapper around a builder with L2 normalization. More... | |
class | L2NormalizedMatrix |
Wrapper around a matrix with L2 normalization. More... | |
class | L2NormalizedPrebuilt |
Wrapper around a prebuilt index with L2 normalization. More... | |
class | L2NormalizedSearcher |
Wrapper around a search interface with L2 normalization. More... | |
struct | ManhattanDistance |
Compute Manhattan distances between two input vectors. More... | |
struct | MockDistance |
Expectations for a distance calculation class. More... | |
class | MockMatrix |
Compile-time interface for matrix data. More... | |
class | Prebuilt |
Interface for prebuilt nearest-neighbor search indices. More... | |
class | Searcher |
Interface for searching nearest-neighbor search indices. More... | |
class | SimpleMatrix |
Simple wrapper for an in-memory matrix. More... | |
class | VptreeBuilder |
Perform a nearest neighbor search based on a vantage point (VP) tree. More... | |
class | VptreePrebuilt |
Index for a VP-tree search. More... | |
class | VptreeSearcher |
VP-tree searcher. More... | |
Typedefs | |
template<typename Index_ = int, typename Float_ = double> | |
using | NeighborList = std::vector< std::vector< std::pair< Index_, Float_ > > > |
Functions | |
template<typename Task_ , class Run_ > | |
void | parallelize (int num_workers, Task_ num_tasks, Run_ run_task_range) |
template<typename Index_ > | |
int | cap_k (int k, Index_ num_observations) |
template<typename Dim_ , typename Index_ , typename Float_ > | |
NeighborList< Index_, Float_ > | find_nearest_neighbors (const Prebuilt< Dim_, Index_, Float_ > &index, int k, int num_threads=1) |
template<typename Dim_ , typename Index_ , typename Float_ > | |
std::vector< std::vector< Index_ > > | find_nearest_neighbors_index_only (const Prebuilt< Dim_, Index_, Float_ > &index, int k, int num_threads=1) |
Collection of KNN algorithms.
using knncolle::NeighborList = typedef std::vector<std::vector<std::pair<Index_, Float_> > > |
List of nearest neighbors for multiple observations. Each entry corresponds to an observation and contains a nested list (i.e., vector) of its neighbors. Each entry of the nested vector is a pair that contains the identity of the neighbor as an observation index (first) and the distance from the observation to the neighbor (second), sorted by increasing distance.
Index_ | Integer type for the indices. |
Float_ | Floating point type for the distances. |
int knncolle::cap_k | ( | int | k, |
Index_ | num_observations | ||
) |
Cap the number of neighbors to use in Searcher::search()
with an index i
.
Index_ | Integer type for the number of observations. |
k | Number of nearest neighbors, should be non-negative. |
num_observations | Number of observations in the dataset. |
k
if it is less than num_observations
; otherwise it is equal to num_observations - 1
if num_observations > 0
; otherwise it is equal to zero. NeighborList< Index_, Float_ > knncolle::find_nearest_neighbors | ( | const Prebuilt< Dim_, Index_, Float_ > & | index, |
int | k, | ||
int | num_threads = 1 |
||
) |
Find the nearest neighbors within a pre-built index. This is a convenient wrapper around Searcher::search
that saves the caller the trouble of writing a loop.
Dim_ | Integer type for the number of dimensions. |
Index_ | Integer type for the indices. |
Float_ | Floating point type for the query data and output distances. |
index | A Prebuilt index. |
k | Number of nearest neighbors. This should be non-negative. Explicitly calling cap_k() is not necessary as this is done automatically inside this function. |
num_threads | Number of threads to use. The parallelization scheme is defined by parallelize() . |
NeighborList
of length equal to the number of observations in index
. Each entry contains (up to) the k
nearest neighbors for each observation, sorted by increasing distance. The i
-th entry is guaranteed to not contain i
itself. std::vector< std::vector< Index_ > > knncolle::find_nearest_neighbors_index_only | ( | const Prebuilt< Dim_, Index_, Float_ > & | index, |
int | k, | ||
int | num_threads = 1 |
||
) |
Find the nearest neighbors within a pre-built search index. Here, only the neighbor indices are returned, not the distances.
Dim_ | Integer type for the number of dimensions. |
Index_ | Integer type for the indices. |
Float_ | Floating point type for the query data and output distances. |
index | A Prebuilt index. |
k | Number of nearest neighbors. This should be non-negative. Explicitly calling cap_k() is not necessary as this is done automatically inside this function. |
num_threads | Number of threads to use. The parallelization scheme is defined by parallelize() . |
index
. Each vector contains the indices of (up to) the k
nearest neighbors for each observation, sorted by increasing distance. The i
-th entry is guaranteed to not contain i
itself. void knncolle::parallelize | ( | int | num_workers, |
Task_ | num_tasks, | ||
Run_ | run_task_range | ||
) |
Task_ | Integer type for the number of tasks. |
Run_ | Function to execute a range of tasks. |
num_workers | Number of workers. |
num_tasks | Number of tasks. |
run_task_range | Function to iterate over a range of tasks within a worker. |
By default, this is an alias to subpar::parallelize_range()
. However, if the KNNCOLLE_CUSTOM_PARALLEL
function-like macro is defined, it is called instead. Any user-defined macro should accept the same arguments as subpar::parallelize_range()
.