knncolle_annoy
Annoy nearest neighbors in knncolle
Loading...
Searching...
No Matches
knncolle bindings for Annoy

Unit tests Documentation Codecov

Overview

The knncolle_annoy library provides knncolle bindings to the Annoy library for approximate nearest neighbors search. This allows users to use Annoy searches in any framework that accepts knncolle interfaces, sacrificing neighbor search accuracy for greater speed. For most applications involving large datasets, this is an acceptable trade-off.

Quick start

Instances of the various knncolle_annoy::Annoy* classes can be used anywhere that accepts the corresponding knncolle interface. For example:

#include "knncolle/knncolle_annoy.hpp"
// Wrap our data in a light SimpleMatrix.
knncolle::SimpleMatrix<int, int, double> mat(ndim, nobs, matrix.data());
// Build an Annoy index.
auto an_index = an_builder.build_unique(mat);
// Find 10 (approximate) nearest neighbors of every element.
auto results = knncolle::find_nearest_neighbors(*an_index, 10);
Prebuilt index for an Annoy search.
Definition knncolle_annoy.hpp:237
NeighborList< Index_, Float_ > find_nearest_neighbors(const Prebuilt< Dim_, Index_, Float_ > &index, int k, int num_threads=1)

We could alternate between exact and approximate searches at run-time:

std::unique_ptr<knncolle::Prebuilt<int, int, double> > ptr;
if (use_exact) {
ptr = kbuilder.build_unique(mat);
} else {
ptr = abuilder.build_unique(mat);
}
std::unique_ptr< Prebuilt< typename Matrix_::dimension_type, typename Matrix_::index_type, Float_ > > build_unique(const Matrix_ &data) const

We can also customize the construction of the AnnoyBuilder by passing in options:

an_opts.num_trees = 100;
an_opts.search_mult = 200; // used to compute search_k.
knncolle_annoy::AnnoyBuilder<> an_builder2(an_opts);
Options for AnnoyBuilder() and AnnoyPrebuilt().
Definition knncolle_annoy.hpp:27
int num_trees
Definition knncolle_annoy.hpp:32
double search_mult
Definition knncolle_annoy.hpp:39

Check out the reference documentation for more details.

Building projects

CMake with <tt>FetchContent</tt>

If you're using CMake, you just need to add something like this to your CMakeLists.txt:

include(FetchContent)
FetchContent_Declare(
knncolle
GIT_REPOSITORY https://github.com/knncolle/knncolle_annoy
GIT_TAG master # or any version of interest
)
FetchContent_MakeAvailable(knncolle_annoy)

Then you can link to knncolle_annoy to make the headers available during compilation:

# For executables:
target_link_libraries(myexe knncolle::knncolle_annoy)
# For libaries
target_link_libraries(mylib INTERFACE knncolle::knncolle_annoy)

CMake with <tt>find_package()</tt>

find_package(knncolle_knncolle_annoy CONFIG REQUIRED)
target_link_libraries(mylib INTERFACE knncolle::knncolle_annoy)

To install the library, use:

mkdir build && cd build
cmake .. -DKNNCOLLE_ANNOY_TESTS=OFF
cmake --build . --target install

By default, this will use FetchContent to fetch all external dependencies. If you want to install them manually, use -DKNNCOLLE_ANNOY_FETCH_EXTERN=OFF. See extern/CMakeLists.txt to find compatible versions of each dependency.

Manual

If you're not using CMake, the simple approach is to just copy the files in include/ - either directly or with Git submodules - and include their path during compilation with, e.g., GCC's -I. This requires the external dependencies listed in extern/CMakeLists.txt, which also need to be made available during compilation.

Note on vectorization

Annoy will attempt to perform manual vectorization based on SSE and/or AVX instructions. This may result in differences in the results across machines due to changes in numeric precision across architectures with varying support for SSE/AVX intrinsics. For the most part, such differences can be avoided by consistently compiling for the "near-lowest common denominator" (such as the typical x86-64 default for GCC and Clang) and ensuring that the more specific instruction subsets like SSE3 and AVX are not enabled (which are typically off by default anyway). Nonetheless, if reproducibility across architectures is important, it can be achieved at the cost of some speed by defining the NO_MANUAL_VECTORIZATION macro, which will instruct Annoy to disable its vectorized optimizations.