knncolle
Collection of KNN methods in C++
|
Collection of KNN algorithms. More...
Classes | |
class | BruteforceBuilder |
Perform a brute-force nearest neighbor search. More... | |
class | BruteforcePrebuilt |
Index for a brute-force nearest neighbor search. More... | |
class | BruteforceSearcher |
Brute-force nearest neighbor searcher. More... | |
class | Builder |
Interface to build nearest-neighbor search indices. More... | |
class | DistanceMetric |
Interface for a distance metric. More... | |
class | EuclideanDistance |
Compute Euclidean distances between two input vectors. More... | |
class | L2NormalizedBuilder |
Wrapper around a builder with L2 normalization. More... | |
class | L2NormalizedMatrix |
Wrapper around a matrix with L2 normalization. More... | |
class | L2NormalizedMatrixExtractor |
Extractor for the L2NormalizedMatrix . More... | |
class | L2NormalizedPrebuilt |
Wrapper around a prebuilt index with L2 normalization. More... | |
class | L2NormalizedSearcher |
Wrapper around a search interface with L2 normalization. More... | |
class | ManhattanDistance |
Compute Manhattan distances between two input vectors. More... | |
class | Matrix |
Interface for matrix data. More... | |
class | MatrixExtractor |
Extractor interface for matrix data. More... | |
class | NeighborQueue |
Helper class to track nearest neighbors. More... | |
class | Prebuilt |
Interface for prebuilt nearest-neighbor search indices. More... | |
class | Searcher |
Interface for searching nearest-neighbor search indices. More... | |
class | SimpleMatrix |
Simple wrapper for an in-memory matrix. More... | |
class | SimpleMatrixExtractor |
Extractor for a SimpleMatrix . More... | |
class | VptreeBuilder |
Perform a nearest neighbor search based on a vantage point (VP) tree. More... | |
class | VptreePrebuilt |
Index for a VP-tree search. More... | |
class | VptreeSearcher |
VP-tree searcher. More... | |
Typedefs | |
template<typename Index_ , typename Distance_ > | |
using | NeighborList = std::vector<std::vector<std::pair<Index_, Distance_> > > |
Functions | |
template<typename Task_ , class Run_ > | |
void | parallelize (int num_workers, Task_ num_tasks, Run_ run_task_range) |
template<typename Index_ > | |
int | cap_k (int k, Index_ num_observations) |
template<typename Index_ , typename Data_ , typename Distance_ > | |
NeighborList< Index_, Distance_ > | find_nearest_neighbors (const Prebuilt< Index_, Data_, Distance_ > &index, int k, int num_threads=1) |
template<typename Index_ , typename Data_ , typename Distance_ > | |
std::vector< std::vector< Index_ > > | find_nearest_neighbors_index_only (const Prebuilt< Index_, Data_, Distance_ > &index, int k, int num_threads=1) |
template<typename Index_ > | |
Index_ | count_all_neighbors_without_self (Index_ count) |
template<typename Distance_ , typename Index_ > | |
void | report_all_neighbors (std::vector< std::pair< Distance_, Index_ > > &all_neighbors, std::vector< Index_ > *output_indices, std::vector< Distance_ > *output_distances, Index_ self) |
template<typename Distance_ , typename Index_ > | |
void | report_all_neighbors (std::vector< std::pair< Distance_, Index_ > > &all_neighbors, std::vector< Index_ > *output_indices, std::vector< Distance_ > *output_distances) |
Collection of KNN algorithms.
using knncolle::NeighborList = std::vector<std::vector<std::pair<Index_, Distance_> > > |
List of nearest neighbors for multiple observations. Each entry corresponds to an observation and contains a nested list (i.e., vector) of its neighbors. Each entry of the nested vector is a pair that contains the identity of the neighbor as an observation index (first) and the distance from the observation to the neighbor (second), sorted by increasing distance.
Index_ | Integer type for the indices. |
Distance_ | Floating point type for the distances. |
int knncolle::cap_k | ( | int | k, |
Index_ | num_observations ) |
Cap the number of neighbors to use in Searcher::search()
with an index i
.
Index_ | Integer type for the number of observations. |
k | Number of nearest neighbors, should be non-negative. |
num_observations | Number of observations in the dataset. |
k
if it is less than num_observations
; otherwise it is equal to num_observations - 1
if num_observations > 0
; otherwise it is equal to zero. Index_ knncolle::count_all_neighbors_without_self | ( | Index_ | count | ) |
Count the number of neighbors from a range-based search, after removing the observation being searched. This is intended for developer use in implementations of Searcher::search_all()
, and protects against pathological situations where an observation fails to be detected as its own neighbor.
count | Number of neighbors within range of the specified observation, including the observation itself. |
count - 1
in most cases, otherwise zero. NeighborList< Index_, Distance_ > knncolle::find_nearest_neighbors | ( | const Prebuilt< Index_, Data_, Distance_ > & | index, |
int | k, | ||
int | num_threads = 1 ) |
Find the nearest neighbors within a pre-built index. This is a convenient wrapper around Searcher::search
that saves the caller the trouble of writing a loop.
Index_ | Integer type for the observation indices. |
Data_ | Numeric type for the input data. |
Distance_ | Floating point type for the distances. |
index | A Prebuilt index. |
k | Number of nearest neighbors. This should be non-negative. Explicitly calling cap_k() is not necessary as this is done automatically inside this function. |
num_threads | Number of threads to use. The parallelization scheme is defined by parallelize() . |
NeighborList
of length equal to the number of observations in index
. Each entry contains (up to) the k
nearest neighbors for each observation, sorted by increasing distance. The i
-th entry is guaranteed to not contain i
itself. std::vector< std::vector< Index_ > > knncolle::find_nearest_neighbors_index_only | ( | const Prebuilt< Index_, Data_, Distance_ > & | index, |
int | k, | ||
int | num_threads = 1 ) |
Find the nearest neighbors within a pre-built search index. Here, only the neighbor indices are returned, not the distances.
Index_ | Integer type for the indices. |
Data_ | Numeric type for the input and query data. |
Distance_ | Floating point type for the distances. |
index | A Prebuilt index. |
k | Number of nearest neighbors. This should be non-negative. Explicitly calling cap_k() is not necessary as this is done automatically inside this function. |
num_threads | Number of threads to use. The parallelization scheme is defined by parallelize() . |
index
. Each vector contains the indices of (up to) the k
nearest neighbors for each observation, sorted by increasing distance. The i
-th entry is guaranteed to not contain i
itself. void knncolle::parallelize | ( | int | num_workers, |
Task_ | num_tasks, | ||
Run_ | run_task_range ) |
Task_ | Integer type for the number of tasks. |
Run_ | Function to execute a range of tasks. |
num_workers | Number of workers. |
num_tasks | Number of tasks. |
run_task_range | Function to iterate over a range of tasks within a worker. |
By default, this is an alias to subpar::parallelize_range()
. However, if the KNNCOLLE_CUSTOM_PARALLEL
function-like macro is defined, it is called instead. Any user-defined macro should accept the same arguments as subpar::parallelize_range()
.
void knncolle::report_all_neighbors | ( | std::vector< std::pair< Distance_, Index_ > > & | all_neighbors, |
std::vector< Index_ > * | output_indices, | ||
std::vector< Distance_ > * | output_distances ) |
Report the indices and distances of all neighbors in range of an observation of interest. It is assumed that the observation of interest is not detected as its own neighbor, presumably as it does not exist in the original input dataset.
Distance_ | Floating point type for the distances. |
Index_ | Integer type for the observation indices. |
[in] | all_neighbors | Vector of (distance, index) pairs for all neighbors within range of the observation of interest. This may include the observation itself. Note that this will be sorted on output. |
[out] | output_indices | Pointer to a vector in which to store the indices of all neighbors in range, sorted by distance. If NULL , the indices will not be reported. |
[out] | output_distances | Pointer to a vector in which to store the (sorted) distances to all neighbors in range. If NULL , the distances will not be reported. Otherwise, on output, this will have the same length as *output_indices and contain distances to each of those neighbors. |
void knncolle::report_all_neighbors | ( | std::vector< std::pair< Distance_, Index_ > > & | all_neighbors, |
std::vector< Index_ > * | output_indices, | ||
std::vector< Distance_ > * | output_distances, | ||
Index_ | self ) |
Report the indices and distances of all neighbors in range of an observation of interest. If the observation of interest is detected as its own neighbor, it will be removed from the output vectors.
Distance_ | Floating point type for the distances. |
Index_ | Integer type for the observation indices. |
[in] | all_neighbors | Vector of (distance, index) pairs for all neighbors within range of the observation of interest. This may include the observation itself. Note that this will be sorted on output. |
[out] | output_indices | Pointer to a vector in which to store the indices of all neighbors in range, sorted by distance. If NULL , the indices will not be reported. |
[out] | output_distances | Pointer to a vector in which to store the (sorted) distances to all neighbors in range. If NULL , the distances will not be reported. Otherwise, on output, this will have the same length as *output_indices and contain distances to each of those neighbors. |
self | Index of the observation of interest, i.e., for which neighbors are to be identified. If present in all_neighbors , this will be removed from output_indices and output_distances . |