knncolle_hnsw
knncolle bindings for HNSW
Loading...
Searching...
No Matches
knncolle_hnsw Namespace Reference

knncolle bindings for HNSW search. More...

Classes

struct  DistanceConfig
 Distance configuration for the HNSW index. More...
 
class  HnswBuilder
 Perform an approximate nearest neighbor search with HNSW. More...
 
struct  HnswOptions
 Options for HnswBuilder and HnswPrebuilt. More...
 
struct  HnswPrebuiltTypes
 Template types of a saved HNSW index. More...
 
class  ManhattanDistance
 Manhattan distance. More...
 
class  SquaredEuclideanDistance
 Squared Euclidean distance. More...
 

Enumerations

enum class  DistanceNormalizeMethod : char { SQRT , CUSTOM , NONE }
 

Functions

template<typename Distance_ , typename HnswData_ >
DistanceConfig< Distance_, HnswData_ > configure_euclidean_distance ()
 
template<typename Distance_ , typename HnswData_ >
DistanceConfig< Distance_, HnswData_ > configure_manhattan_distance ()
 
template<typename HnswData_ >
const char * get_distance_name (const hnswlib::SpaceInterface< HnswData_ > *distance)
 
HnswPrebuiltTypes load_hnsw_prebuilt_types (const std::string &prefix)
 
template<typename Index_ , typename Data_ , typename Distance_ , typename HnswData_ >
auto load_hnsw_prebuilt (const std::string &prefix)
 
template<class HnswData_ >
std::function< void(const std::string &)> & custom_save_for_hnsw_data ()
 
template<typename HnswData_ >
std::function< void(const std::string &, const hnswlib::SpaceInterface< HnswData_ > *)> & custom_save_for_hnsw_distance ()
 
template<typename Distance_ >
std::function< void(const std::string &, const std::function< Distance_(Distance_)> &)> & custom_save_for_hnsw_normalize ()
 
template<typename HnswData_ >
std::function< hnswlib::SpaceInterface< HnswData_ > *(const std::string &, std::size_t)> & custom_load_for_hnsw_distance ()
 
template<typename Distance_ >
std::function< std::function< Distance_(Distance_)>(const std::string &)> & custom_load_for_hnsw_normalize ()
 

Detailed Description

knncolle bindings for HNSW search.

Enumeration Type Documentation

◆ DistanceNormalizeMethod

enum class knncolle_hnsw::DistanceNormalizeMethod : char
strong

Methods for distance normalization.

Function Documentation

◆ configure_euclidean_distance()

template<typename Distance_ , typename HnswData_ >
DistanceConfig< Distance_, HnswData_ > knncolle_hnsw::configure_euclidean_distance ( )
Template Parameters
Distance_Floating-point type for the distances.
HnswData_Type of data in the HNSW index, usually floating-point.
Returns
Configuration for using Euclidean distances in the HNSW index. DistanceConfig::create is set to hnswlib::L2Space if HnswData_ = float, otherwise it is set to SquaredEuclideanDistance.

◆ configure_manhattan_distance()

template<typename Distance_ , typename HnswData_ >
DistanceConfig< Distance_, HnswData_ > knncolle_hnsw::configure_manhattan_distance ( )
Template Parameters
Distance_Floating-point type for the distances.
HnswData_Type of data in the HNSW index, usually floating-point.
Returns
Configuration for using Manhattan distances in the HNSW index.

◆ custom_load_for_hnsw_distance()

template<typename HnswData_ >
std::function< hnswlib::SpaceInterface< HnswData_ > *(const std::string &, std::size_t)> & knncolle_hnsw::custom_load_for_hnsw_distance ( )

Define a global function to create a distance metric that is not known to get_distance_name() when loading a HNSW index from disk. Users are expected to provide their own function to regenerate any distance metric that was saved by custom_save_for_hnsw_distance(). Any modifications to this function are not thread-safe and should be done in a serial section.

The first argument to the global function is a file path prefix, same as that used in knncolle::Prebuilt::save(). The second argument should be the number of dimensions in the dataset. The global function should return a pointer to a hnswlib::SpaceInterface instance, similar to the behavior of DistanceConfig::create().

The global function will only be called if the distance metric saved to prefix is unknown to get_distance_name(), otherwise it will ignored. An error is raised if no global function is set when attempting to load an index where get_distance_name() returns "unknown".

Template Parameters
HnswData_Type of data in the HNSW index, usually floating-point.
Returns
Reference to a global function for creating a custom distance metric. By default, no function is provided. If set, the function will be called by the knncolle::Prebuilt::save() method for the HNSW Prebuilt subclass.

◆ custom_load_for_hnsw_normalize()

template<typename Distance_ >
std::function< std::function< Distance_(Distance_)>(const std::string &)> & knncolle_hnsw::custom_load_for_hnsw_normalize ( )

Define a global function to create a custom distance normalization function when loading a H NSW index from disk. Users are expected to provide their own function to regenerate any normalization function that was saved by custom_save_for_hnsw_normalize(). Any modifications to this function are not thread-safe and should be done in a serial section.

The sole argument to the global function is a file path prefix, same as that used in knncolle::Prebuilt::save(). The global function should return a normalization function equivalent to the DistanceConfig::custom_normalize().

The global function is only used when the DistanceConfig::normalize_method saved to prefix is equal to DistanceNormalizeMethod::CUSTOM. otherwise it is ignored. An error is raised if no global function is set when attempting to load an index where DistanceConfig::normalize_method == DistanceNormalizeMethod::CUSTOM.

Template Parameters
Distance_Floating point type for the distances.
Returns
Reference to a global function for creating a custom distance metric. By default, no function is provided. If set, the function will be called by the knncolle::Prebuilt::save() method for the HNSW Prebuilt subclass.

◆ custom_save_for_hnsw_data()

template<class HnswData_ >
std::function< void(const std::string &)> & knncolle_hnsw::custom_save_for_hnsw_data ( )

Define a global function to preserve HnswData_ type information when saving a prebuilt Hnsw index in knncolle::Prebuilt::save(). Users should define their own function here to handle an HnswData_ type that is unknown to knncolle::get_numeric_type(). The action of setting/unsetting the global function is not thread-safe and should be done in a serial section.

The sole argument of the global function is the same prefix provided to knncolle::Prebuilt::save(). The global function is generally expected to write to files at paths starting with prefix. It is recommended that an additional _, - or . is added to prefix to avoid conflicts with other files generated by save().

Template Parameters
HnswData_Type of data in the HNSW index, usually floating-point.
Returns
Reference to a global function for saving information about HnswData_. By default, no global function is defined. If set, the global function will be called by the knncolle::Prebuilt::save() method for the HNSW Prebuilt subclass.

◆ custom_save_for_hnsw_distance()

template<typename HnswData_ >
std::function< void(const std::string &, const hnswlib::SpaceInterface< HnswData_ > *)> & knncolle_hnsw::custom_save_for_hnsw_distance ( )

Define a global function to save extra information about a distance metric of a prebuilt HNSW index in knncolle::Prebuilt::save(). Users should define their own function here to handle a subclass of hnswlib::SpaceInterface that are unknown to get_distance_name(). The action of setting/unsetting the global function is not thread-safe and should be done in a serial section.

The first argument of the global function is the same prefix provided to knncolle::Prebuilt::save(). The second argument of the global function is a pointer to the distance metric used by the HNSW index. The global function is generally expected to write to files at paths starting with prefix. It is recommended that an additional _, - or . is added to prefix to avoid conflicts with other files generated by save().

The global function is only used when the index to be saved uses an unknown distance metric, otherwise it is ignored.

Template Parameters
HnswData_Type of data in the HNSW index, usually floating-point.
Returns
Reference to a global function for saving information about the distance metric. By default, no global function is defined. If set, the global function will be called by the knncolle::Prebuilt::save() method for the HNSW Prebuilt subclass.

◆ custom_save_for_hnsw_normalize()

template<typename Distance_ >
std::function< void(const std::string &, const std::function< Distance_(Distance_)> &)> & knncolle_hnsw::custom_save_for_hnsw_normalize ( )

Define a global function to save a custom distance normalization method for a prebuilt HNSW index.
Users should define their own function here to handle a DistanceConfig::custom_normalize function. The action of setting/unsetting the global function is not thread-safe and should be done in a serial section.

The first argument of the global function is the same prefix provided to knncolle::Prebuilt::save(). The second argument of the global function is the DistanceConfig::custom_normalize method used to construct the HNSW index. The global function is generally expected to write to files at paths starting with prefix. It is recommended that an additional _, - or . is added to prefix to avoid conflicts with other files generated by save().

The global function is only used when the index to be saved uses a custom distance normalization method, otherwise it is ignored.

Template Parameters
HnswData_Type of data in the HNSW index, usually floating-point.
Returns
Reference to a global function for saving information about the distance metric. By default, no global function is defined. If set, the global function will be called by the knncolle::Prebuilt::save() method for the HNSW Prebuilt subclass.

◆ get_distance_name()

template<typename HnswData_ >
const char * knncolle_hnsw::get_distance_name ( const hnswlib::SpaceInterface< HnswData_ > * distance)
Template Parameters
HnswData_Type of data in the HNSW index, usually floating-point.
Parameters
distancePointer to a distance metric.
Returns
String containing a name for each known metric, e.g., "squared_euclidean" for SquaredEuclideanDistance.

◆ load_hnsw_prebuilt()

template<typename Index_ , typename Data_ , typename Distance_ , typename HnswData_ >
auto knncolle_hnsw::load_hnsw_prebuilt ( const std::string & prefix)

Helper function to define a knncolle::LoadPrebuiltFunction for HNSW in knncolle::load_prebuilt().

In an HNSW-specific knncolle::LoadPrebuiltFunction, users should first call scan_prebuilt_save_config() to figure out the saved index's HNSWData_. Then, they can call load_hnsw_prebuilt() with the specified types to return a pointer to a knncolle::Prebuilt object. This can be registered in load_prebuilt_registry() with the key in knncolle_hnsw::save_name.

We do not define a default function for loading HNSW indices as there are too many possible combinations of types. Instead, the user is responsible for deciding which combinations of types should be handled. This avoids binary bloat from repeated instantiations of the HNSW template classes, if an application only deals with a certain subset of combinations. For types or distances that are unknown to knncolle::get_numeric_type() or get_distance_name(), respectively, users can store additional information on disk via customize_save_for_hnsw_types() for use in loading.

Template Parameters
Index_Integer type for the observation indices.
Data_Numeric type for the input and query data.
Distance_Floating-point type for the distances.
HnswData_Floating-point type for data in the HNSW index. This should be the same as the type reported by HnswPrebuiltTypes::data.
Parameters
prefixPrefix of the file paths in which a prebuilt HNSW index was saved. An HNSW index is typically saved by calling the knncolle::Prebuilt::save() method of the HNSW subclass instance.
Returns
Pointer to a knncolle::Prebuilt HNSW index.

◆ load_hnsw_prebuilt_types()

HnswPrebuiltTypes knncolle_hnsw::load_hnsw_prebuilt_types ( const std::string & prefix)
inline
Parameters
prefixPrefix of the file paths in which a prebuilt HNSW index was saved. An HNSW index is typically saved by calling the knncolle::Prebuilt::save() method of the HNSW subclass instance.
Returns
Template types of the saved instance of a knncolle::Prebuilt HNSW subclass. This is typically used to choose template parameters for load_hnsw_prebuilt().