|
knncolle
Collection of KNN methods in C++
|
Collection of KNN algorithms. More...
Classes | |
| class | BruteforceBuilder |
| Perform a brute-force nearest neighbor search. More... | |
| class | Builder |
| Interface to build nearest-neighbor search indices. More... | |
| class | DistanceMetric |
| Interface for a distance metric. More... | |
| class | EuclideanDistance |
| Compute Euclidean distances between two input vectors. More... | |
| class | L2NormalizedBuilder |
| Wrapper around a builder with L2 normalization. More... | |
| class | L2NormalizedMatrix |
| Wrapper around a matrix with L2 normalization. More... | |
| class | ManhattanDistance |
| Compute Manhattan distances between two input vectors. More... | |
| class | Matrix |
| Interface for matrix data. More... | |
| class | MatrixExtractor |
| Extractor interface for matrix data. More... | |
| class | NeighborQueue |
| Helper class to track nearest neighbors. More... | |
| class | Prebuilt |
| Interface for prebuilt nearest-neighbor search indices. More... | |
| class | Searcher |
| Interface for searching nearest-neighbor search indices. More... | |
| class | SimpleMatrix |
| Simple wrapper for an in-memory matrix. More... | |
| class | VptreeBuilder |
| Perform a nearest neighbor search based on a vantage point (VP) tree. More... | |
Typedefs | |
| template<typename Data_ , typename Distance_ > | |
| using | LoadDistanceMetricFunction = std::function<DistanceMetric<Data_, Distance_>* (const std::string&)> |
| template<typename Index_ , typename Distance_ > | |
| using | NeighborList = std::vector<std::vector<std::pair<Index_, Distance_> > > |
| template<typename Index_ , typename Data_ , typename Distance_ > | |
| using | LoadPrebuiltFunction = std::function<Prebuilt<Index_, Data_, Distance_>* (const std::string&)> |
Functions | |
| template<typename Data_ , typename Distance_ > | |
| std::unordered_map< std::string, LoadDistanceMetricFunction< Data_, Distance_ > > & | load_distance_metric_registry () |
| template<typename Data_ , typename Distance_ > | |
| DistanceMetric< Data_, Distance_ > * | load_distance_metric_raw (const std::string &prefix) |
| template<typename Task_ , class Run_ > | |
| void | parallelize (int num_workers, Task_ num_tasks, Run_ run_task_range) |
| template<typename Index_ > | |
| int | cap_k (int k, Index_ num_observations) |
| template<typename Index_ > | |
| int | cap_k_query (int k, Index_ num_observations) |
| template<typename Index_ , typename Data_ , typename Distance_ > | |
| NeighborList< Index_, Distance_ > | find_nearest_neighbors (const Prebuilt< Index_, Data_, Distance_ > &index, int k, int num_threads=1) |
| template<typename Index_ , typename Data_ , typename Distance_ > | |
| std::vector< std::vector< Index_ > > | find_nearest_neighbors_index_only (const Prebuilt< Index_, Data_, Distance_ > &index, int k, int num_threads=1) |
| template<typename Index_ , typename Data_ , typename Distance_ > | |
| std::unordered_map< std::string, LoadPrebuiltFunction< Index_, Data_, Distance_ > > & | load_prebuilt_registry () |
| template<typename Index_ , typename Data_ , typename Distance_ > | |
| Prebuilt< Index_, Data_, Distance_ > * | load_prebuilt_raw (const std::string &prefix) |
| template<typename Index_ , typename Data_ , typename Distance_ > | |
| std::unique_ptr< Prebuilt< Index_, Data_, Distance_ > > | load_prebuilt_unique (const std::string &prefix) |
| template<typename Index_ , typename Data_ , typename Distance_ > | |
| std::shared_ptr< Prebuilt< Index_, Data_, Distance_ > > | load_prebuilt_shared (const std::string &prefix) |
| template<typename Index_ > | |
| Index_ | count_all_neighbors_without_self (Index_ count) |
| template<typename Distance_ , typename Index_ > | |
| void | report_all_neighbors (std::vector< std::pair< Distance_, Index_ > > &all_neighbors, std::vector< Index_ > *output_indices, std::vector< Distance_ > *output_distances, Index_ self) |
| template<typename Distance_ , typename Index_ > | |
| void | report_all_neighbors (std::vector< std::pair< Distance_, Index_ > > &all_neighbors, std::vector< Index_ > *output_indices, std::vector< Distance_ > *output_distances) |
| template<typename Input_ , typename Length_ > | |
| void | quick_save (const std::string &path, const Input_ *const contents, const Length_ length) |
| template<typename Input_ , typename Length_ > | |
| void | quick_load (const std::string &path, Input_ *const contents, const Length_ length) |
Collection of KNN algorithms.
| using knncolle::LoadDistanceMetricFunction = std::function<DistanceMetric<Data_, Distance_>* (const std::string&)> |
Distance loading function. This accepts a file path prefix (see DistanceMetric::save()) and returns a pointer to a Distance instance.
| Data_ | Numeric type for the input data. |
| Distance_ | Floating-point type for the output distance. |
| using knncolle::LoadPrebuiltFunction = std::function<Prebuilt<Index_, Data_, Distance_>* (const std::string&)> |
Prebuilt loading function. This accepts a file path prefix (see Prebuilt::save()) and returns a pointer to a Prebuilt instance.
| Index_ | Integer type for the observation indices. |
| Data_ | Numeric type for the query data. |
| Distance_ | Floating point type for the distances. |
| using knncolle::NeighborList = std::vector<std::vector<std::pair<Index_, Distance_> > > |
List of nearest neighbors for multiple observations. Each entry corresponds to an observation and contains a nested list (i.e., vector) of its neighbors. Each entry of the nested vector is a pair that contains the identity of the neighbor as an observation index (first) and the distance from the observation to the neighbor (second), sorted by increasing distance.
| Index_ | Integer type for the indices. |
| Distance_ | Floating point type for the distances. |
| int knncolle::cap_k | ( | int | k, |
| Index_ | num_observations ) |
Cap the number of neighbors to use in Searcher::search() with an index i.
| Index_ | Integer type for the number of observations. |
| k | Number of nearest neighbors, should be non-negative. |
| num_observations | Number of observations in the dataset. |
k if it is less than num_observations; otherwise it is equal to num_observations - 1 if num_observations > 0; otherwise it is equal to zero. | int knncolle::cap_k_query | ( | int | k, |
| Index_ | num_observations ) |
Cap the number of neighbors to use in Searcher::search() with a pointer query.
| Index_ | Integer type for the number of observations. |
| k | Number of nearest neighbors, should be non-negative. |
| num_observations | Number of observations in the dataset. |
k and num_observations. | Index_ knncolle::count_all_neighbors_without_self | ( | Index_ | count | ) |
Count the number of neighbors from a range-based search, after removing the observation being searched. This is intended for developer use in implementations of Searcher::search_all(), and protects against pathological situations where an observation fails to be detected as its own neighbor.
| count | Number of neighbors within range of the specified observation, including the observation itself. |
count - 1 in most cases, otherwise zero. | NeighborList< Index_, Distance_ > knncolle::find_nearest_neighbors | ( | const Prebuilt< Index_, Data_, Distance_ > & | index, |
| int | k, | ||
| int | num_threads = 1 ) |
Find the nearest neighbors within a pre-built index. This is a convenient wrapper around Searcher::search that saves the caller the trouble of writing a loop.
| Index_ | Integer type for the observation indices. |
| Data_ | Numeric type for the input data. |
| Distance_ | Floating point type for the distances. |
| index | A Prebuilt index. |
| k | Number of nearest neighbors. This should be non-negative. Explicitly calling cap_k() is not necessary as this is done automatically inside this function. |
| num_threads | Number of threads to use. The parallelization scheme is defined by parallelize(). |
NeighborList of length equal to the number of observations in index. Each entry contains (up to) the k nearest neighbors for each observation, sorted by increasing distance. The i-th entry is guaranteed to not contain i itself. | std::vector< std::vector< Index_ > > knncolle::find_nearest_neighbors_index_only | ( | const Prebuilt< Index_, Data_, Distance_ > & | index, |
| int | k, | ||
| int | num_threads = 1 ) |
Find the nearest neighbors within a pre-built search index. Here, only the neighbor indices are returned, not the distances.
| Index_ | Integer type for the indices. |
| Data_ | Numeric type for the input and query data. |
| Distance_ | Floating point type for the distances. |
| index | A Prebuilt index. |
| k | Number of nearest neighbors. This should be non-negative. Explicitly calling cap_k() is not necessary as this is done automatically inside this function. |
| num_threads | Number of threads to use. The parallelization scheme is defined by parallelize(). |
index. Each vector contains the indices of (up to) the k nearest neighbors for each observation, sorted by increasing distance. The i-th entry is guaranteed to not contain i itself. | DistanceMetric< Data_, Distance_ > * knncolle::load_distance_metric_raw | ( | const std::string & | prefix | ) |
Load a distance metric from disk into a Distance object.
| Data_ | Numeric type for the input data. |
| Distance_ | Floating-point type for the output distance. |
| prefix | File path prefix for a distance index that was saved to disk by DistanceMetric::save(). |
Distance instance, created from the files at prefix.
|
inline |
| Data_ | Numeric type for the input data. |
| Distance_ | Floating-point type for the output distance. |
DistanceMetric::save()) to loading functions. | Prebuilt< Index_, Data_, Distance_ > * knncolle::load_prebuilt_raw | ( | const std::string & | prefix | ) |
Load a neighbor search index from disk into a Prebuilt object.
| Index_ | Integer type for the observation indices. |
| Data_ | Numeric type for the query data. |
| Distance_ | Floating point type for the distances. |
| prefix | File path prefix for a prebuilt index that was saved to disk by Prebuilt::save(). |
Prebuilt instance, created from the files at prefix.
|
inline |
| Index_ | Integer type for the observation indices. |
| Data_ | Numeric type for the query data. |
| Distance_ | Floating point type for the distances. |
Prebuilt::save()) to loading functions. | std::shared_ptr< Prebuilt< Index_, Data_, Distance_ > > knncolle::load_prebuilt_shared | ( | const std::string & | prefix | ) |
Load a neighbor search index from disk into a Prebuilt object. This calls load_prebuilt_raw() internally.
| Index_ | Integer type for the observation indices. |
| Data_ | Numeric type for the query data. |
| Distance_ | Floating point type for the distances. |
| prefix | File path prefix for a prebuilt index that was saved to disk by Prebuilt::save(). |
Prebuilt instance, created from the files at prefix. | std::unique_ptr< Prebuilt< Index_, Data_, Distance_ > > knncolle::load_prebuilt_unique | ( | const std::string & | prefix | ) |
Load a neighbor search index from disk into a Prebuilt object. This calls load_prebuilt_raw() internally.
| Index_ | Integer type for the observation indices. |
| Data_ | Numeric type for the query data. |
| Distance_ | Floating point type for the distances. |
| prefix | File path prefix for a prebuilt index that was saved to disk by Prebuilt::save(). |
Prebuilt instance, created from the files at prefix. | void knncolle::parallelize | ( | int | num_workers, |
| Task_ | num_tasks, | ||
| Run_ | run_task_range ) |
| Task_ | Integer type for the number of tasks. |
| Run_ | Function to execute a range of tasks. |
| num_workers | Number of workers. |
| num_tasks | Number of tasks. |
| run_task_range | Function to iterate over a range of tasks within a worker. |
By default, this is an alias to subpar::parallelize_range(). However, if the KNNCOLLE_CUSTOM_PARALLEL function-like macro is defined, it is called instead. Any user-defined macro should accept the same arguments as subpar::parallelize_range().
| void knncolle::quick_load | ( | const std::string & | path, |
| Input_ *const | contents, | ||
| const Length_ | length ) |
Read an array from a binary file at path. This is intended for developer use in load_prebuilt() functions.
| Input_ | Type of values to be read. |
| Length_ | Integer type of the length of the array. |
| path | File path to read from. |
| contents | Pointer to an array in which to store the contents of path. This should have at least length addressable elements. |
| length | Number of elements (not bytes) to be read from path. This should be non-negative. |
| void knncolle::quick_save | ( | const std::string & | path, |
| const Input_ *const | contents, | ||
| const Length_ | length ) |
Saves an array to a binary file at path. This is intended for developer use in implementations of Prebuilt::save().
| Input_ | Type of values to be saved. |
| Length_ | Integer type of the length of the array. |
| path | File path to save to. Any directories in the path should already exist. |
| contents | Pointer to an array of contents to be saved. |
| length | Length of the array, in terms of the number of elements (not the number of bytes). This should be non-negative. |
| void knncolle::report_all_neighbors | ( | std::vector< std::pair< Distance_, Index_ > > & | all_neighbors, |
| std::vector< Index_ > * | output_indices, | ||
| std::vector< Distance_ > * | output_distances ) |
Report the indices and distances of all neighbors in range of an observation of interest. It is assumed that the observation of interest is not detected as its own neighbor, presumably as it does not exist in the original input dataset.
| Distance_ | Floating point type for the distances. |
| Index_ | Integer type for the observation indices. |
| [in] | all_neighbors | Vector of (distance, index) pairs for all neighbors within range of the observation of interest. This may include the observation itself. Note that this will be sorted on output. |
| [out] | output_indices | Pointer to a vector in which to store the indices of all neighbors in range, sorted by distance. If NULL, the indices will not be reported. |
| [out] | output_distances | Pointer to a vector in which to store the (sorted) distances to all neighbors in range. If NULL, the distances will not be reported. Otherwise, on output, this will have the same length as *output_indices and contain distances to each of those neighbors. |
| void knncolle::report_all_neighbors | ( | std::vector< std::pair< Distance_, Index_ > > & | all_neighbors, |
| std::vector< Index_ > * | output_indices, | ||
| std::vector< Distance_ > * | output_distances, | ||
| Index_ | self ) |
Report the indices and distances of all neighbors in range of an observation of interest. If the observation of interest is detected as its own neighbor, it will be removed from the output vectors.
| Distance_ | Floating point type for the distances. |
| Index_ | Integer type for the observation indices. |
| [in] | all_neighbors | Vector of (distance, index) pairs for all neighbors within range of the observation of interest. This may include the observation itself. Note that this will be sorted on output. |
| [out] | output_indices | Pointer to a vector in which to store the indices of all neighbors in range, sorted by distance. If NULL, the indices will not be reported. |
| [out] | output_distances | Pointer to a vector in which to store the (sorted) distances to all neighbors in range. If NULL, the distances will not be reported. Otherwise, on output, this will have the same length as *output_indices and contain distances to each of those neighbors. |
| self | Index of the observation of interest, i.e., for which neighbors are to be identified. If present in all_neighbors, this will be removed from output_indices and output_distances. |