knncolle
Collection of KNN methods in C++
Loading...
Searching...
No Matches
Classes | Typedefs | Functions
knncolle Namespace Reference

Collection of KNN algorithms. More...

Classes

class  BruteforceBuilder
 Perform a brute-force nearest neighbor search. More...
 
class  BruteforcePrebuilt
 Index for a brute-force nearest neighbor search. More...
 
class  BruteforceSearcher
 Brute-force nearest neighbor searcher. More...
 
class  Builder
 Interface to build nearest-neighbor search indices. More...
 
struct  EuclideanDistance
 Compute Euclidean distances between two input vectors. More...
 
class  KmknnBuilder
 Perform a nearest neighbor search based on k-means clustering. More...
 
struct  KmknnOptions
 Options for KmknnBuilder and KmknnPrebuilt construction. More...
 
class  KmknnPrebuilt
 Index for a KMKNN search. More...
 
class  KmknnSearcher
 KMKNN searcher. More...
 
class  L2NormalizedBuilder
 Wrapper around a builder with L2 normalization. More...
 
class  L2NormalizedMatrix
 Wrapper around a matrix with L2 normalization. More...
 
class  L2NormalizedPrebuilt
 Wrapper around a prebuilt index with L2 normalization. More...
 
class  L2NormalizedSearcher
 Wrapper around a search interface with L2 normalization. More...
 
struct  ManhattanDistance
 Compute Manhattan distances between two input vectors. More...
 
struct  MockDistance
 Expectations for a distance calculation class. More...
 
class  MockMatrix
 Compile-time interface for matrix data. More...
 
class  Prebuilt
 Interface for prebuilt nearest-neighbor search indices. More...
 
class  Searcher
 Interface for searching nearest-neighbor search indices. More...
 
class  SimpleMatrix
 Simple wrapper for an in-memory matrix. More...
 
class  VptreeBuilder
 Perform a nearest neighbor search based on a vantage point (VP) tree. More...
 
class  VptreePrebuilt
 Index for a VP-tree search. More...
 
class  VptreeSearcher
 VP-tree searcher. More...
 

Typedefs

template<typename Index_ = int, typename Float_ = double>
using NeighborList = std::vector< std::vector< std::pair< Index_, Float_ > > >
 

Functions

template<typename Task_ , class Run_ >
void parallelize (int num_workers, Task_ num_tasks, Run_ run_task_range)
 
template<typename Index_ >
int cap_k (int k, Index_ num_observations)
 
template<typename Dim_ , typename Index_ , typename Float_ >
NeighborList< Index_, Float_ > find_nearest_neighbors (const Prebuilt< Dim_, Index_, Float_ > &index, int k, int num_threads=1)
 
template<typename Dim_ , typename Index_ , typename Float_ >
std::vector< std::vector< Index_ > > find_nearest_neighbors_index_only (const Prebuilt< Dim_, Index_, Float_ > &index, int k, int num_threads=1)
 

Detailed Description

Collection of KNN algorithms.

Typedef Documentation

◆ NeighborList

template<typename Index_ = int, typename Float_ = double>
using knncolle::NeighborList = typedef std::vector<std::vector<std::pair<Index_, Float_> > >

List of nearest neighbors for multiple observations. Each entry corresponds to an observation and contains a nested list (i.e., vector) of its neighbors. Each entry of the nested vector is a pair that contains the identity of the neighbor as an observation index (first) and the distance from the observation to the neighbor (second), sorted by increasing distance.

Template Parameters
Index_Integer type for the indices.
Float_Floating point type for the distances.

Function Documentation

◆ cap_k()

template<typename Index_ >
int knncolle::cap_k ( int  k,
Index_  num_observations 
)

Cap the number of neighbors to use in Searcher::search() with an index i.

Template Parameters
Index_Integer type for the number of observations.
Parameters
kNumber of nearest neighbors, should be non-negative.
num_observationsNumber of observations in the dataset.
Returns
Capped number of neighbors to search for. This is equal to k if it is less than num_observations; otherwise it is equal to num_observations - 1 if num_observations > 0; otherwise it is equal to zero.

◆ find_nearest_neighbors()

template<typename Dim_ , typename Index_ , typename Float_ >
NeighborList< Index_, Float_ > knncolle::find_nearest_neighbors ( const Prebuilt< Dim_, Index_, Float_ > &  index,
int  k,
int  num_threads = 1 
)

Find the nearest neighbors within a pre-built index. This is a convenient wrapper around Searcher::search that saves the caller the trouble of writing a loop.

Template Parameters
Dim_Integer type for the number of dimensions.
Index_Integer type for the indices.
Float_Floating point type for the query data and output distances.
Parameters
indexA Prebuilt index.
kNumber of nearest neighbors. This should be non-negative. Explicitly calling cap_k() is not necessary as this is done automatically inside this function.
num_threadsNumber of threads to use. The parallelization scheme is defined by parallelize().
Returns
A NeighborList of length equal to the number of observations in index. Each entry contains (up to) the k nearest neighbors for each observation, sorted by increasing distance. The i-th entry is guaranteed to not contain i itself.

◆ find_nearest_neighbors_index_only()

template<typename Dim_ , typename Index_ , typename Float_ >
std::vector< std::vector< Index_ > > knncolle::find_nearest_neighbors_index_only ( const Prebuilt< Dim_, Index_, Float_ > &  index,
int  k,
int  num_threads = 1 
)

Find the nearest neighbors within a pre-built search index. Here, only the neighbor indices are returned, not the distances.

Template Parameters
Dim_Integer type for the number of dimensions.
Index_Integer type for the indices.
Float_Floating point type for the query data and output distances.
Parameters
indexA Prebuilt index.
kNumber of nearest neighbors. This should be non-negative. Explicitly calling cap_k() is not necessary as this is done automatically inside this function.
num_threadsNumber of threads to use. The parallelization scheme is defined by parallelize().
Returns
A vector of vectors of length equal to the number of observations in index. Each vector contains the indices of (up to) the k nearest neighbors for each observation, sorted by increasing distance. The i-th entry is guaranteed to not contain i itself.

◆ parallelize()

template<typename Task_ , class Run_ >
void knncolle::parallelize ( int  num_workers,
Task_  num_tasks,
Run_  run_task_range 
)
Template Parameters
Task_Integer type for the number of tasks.
Run_Function to execute a range of tasks.
Parameters
num_workersNumber of workers.
num_tasksNumber of tasks.
run_task_rangeFunction to iterate over a range of tasks within a worker.

By default, this is an alias to subpar::parallelize_range(). However, if the KNNCOLLE_CUSTOM_PARALLEL function-like macro is defined, it is called instead. Any user-defined macro should accept the same arguments as subpar::parallelize_range().