knncolle package¶
- knncolle.includes()[source]¶
Provides access to
knncolle_py.h
C++ header.- Return type:
- Returns:
Path to a directory containing the header.
Submodules¶
knncolle.annoy module¶
- class knncolle.annoy.AnnoyIndex(ptr)[source]¶
Bases:
GenericIndex
A prebuilt index for the Approximate Nearest Neighbors Oh Yeah (Annoy) algorithm, created by
define_builder()
with aAnnoyParameters
object.
- class knncolle.annoy.AnnoyParameters(num_trees=50, search_mult=None, distance='Euclidean')[source]¶
Bases:
Parameters
Parameters for the Approximate Nearest Neighbors Oh Yeah (Annoy) algorithm, see here for details.
- __init__(num_trees=50, search_mult=None, distance='Euclidean')[source]¶
- Parameters:
num_trees (
int
) – Number of trees to use to generate the search index. More trees increase accuracy at the cost of more computational work, in terms of both the indexing time and search speed.search_mult (
Optional
[float
]) – Multiplier for the number of observations to search. Specifically, the product ofk
andsearch_mult
is used to define the number of points to search exhaustively and dictates the balance between search speed and accuracy. If None, this defaults to the value ofnum_trees
.distance (
Literal
['Euclidean'
,'Manhattan'
,'Cosine'
]) – Distance metric for index construction and search. This should be one ofEuclidean
,Manhattan
orCosine
.
- property distance: str¶
Distance metric, see
__init__()
.
- property num_trees: int¶
Number of trees, see
__init__()
.
- property search_mult: int¶
Search multiplier, see
__init__()
.
knncolle.build_index module¶
- knncolle.build_index.build_index(param, x, **kwargs)[source]¶
Build a search index for a given nearest neighbor search algorithm. The default method calls
define_builder()
to obtain an algorithm-specific factory that builds the index fromx
.- Parameters:
param (
Parameters
) – Parameters for a particular search algorithm.x (
ndarray
) – Matrix of coordinates for the observations to be searched. This should be a two-dimensional double-precision NumPy array in Fortran order where the rows are dimensions and columns are observations.kwargs – Additional arguments to be passed to individual methods.
- Return type:
- Returns:
Instance of a
Index
subclass.
knncolle.classes module¶
- class knncolle.classes.Builder(ptr)[source]¶
Bases:
object
Pointer to a search index builder, i.e.,
knncolle_py::WrappedBuilder
, for use in C++ to build new neighbor search indices. The associated memory is automatically freed upon garbage collection.- property ptr¶
Address of a
knncolle_py::WrappedBuilder
, to be passed into C++ as auintptr_t
; seeknncolle_py.h
for details.
- class knncolle.classes.GenericIndex(ptr)[source]¶
Bases:
Index
Abstract base class for a prebuilt nearest neighbor-search index that holds an address to a
knncolle_py::WrappedPrebuilt
instance in C++. The associated memory is automatically freed upon garbage collection.
knncolle.define_builder module¶
- knncolle.define_builder.define_builder(param)[source]¶
Create a builder instance for a given nearest neighbor search algorithm. The builder can be used in
build_index()
to create a search index from a matrix of observations.- Parameters:
param (
Parameters
) – Parameters for a particular search algorithm.- Return type:
- Returns:
Tuple where the first element is a
Builder
and the second element is aGenericIndex
.
knncolle.exhaustive module¶
- class knncolle.exhaustive.ExhaustiveIndex(ptr)[source]¶
Bases:
GenericIndex
A prebuilt index for an exhaustive search, created by
define_builder()
with aExhaustiveParameters
instance.
- class knncolle.exhaustive.ExhaustiveParameters(distance='Euclidean')[source]¶
Bases:
Parameters
Parameters for an exhaustive search.
- __init__(distance='Euclidean')[source]¶
- Parameters:
distance (
Literal
['Euclidean'
,'Manhattan'
,'Cosine'
]) – Distance metric for index construction and search. This should be one ofEuclidean
,Manhattan
orCosine
.
- property distance: str¶
Distance metric, see
__init__()
.
knncolle.find_distance module¶
- knncolle.find_distance.find_distance(X, num_neighbors, num_threads=1, subset=None, **kwargs)[source]¶
Find the distance to the k-th closest point for each observation.
- Parameters:
X (
Index
) – A prebuilt search index.num_neighbors (
Union
[int
,Sequence
]) –Number of nearest neighbors at which to compute the distance from each observation in
X
, i.e., k. This is automatically capped at the number of observations minus 1.Alternatively, this may be a sequence of non-negative integers of length equal to the number of observations in
X
, specifying the neighbor at which to compute the distance for each observation.If
subset
is supplied andnum_neighbors
is a sequence, it should have length equal tosubset
instead, and should specify the neighbor for each observation in the subset.num_threads (
int
) – Number of threads to use for the search.subset (
Optional
[Sequence
]) – Sequence of integers containing the indices of the observations for which to compute the distances. All indices should be non-negative and less than the total number of observations.kwargs – Additional arguments to pass to specific methods.
- Return type:
- Returns:
A NumPy array of length equal to the number of observations in
X
(orsubset
, if provided) containing the distance to thenum_neighbors
-th point for each observation.
knncolle.find_knn module¶
- class knncolle.find_knn.FindKnnResults(index, distance)[source]¶
Bases:
object
Results of
find_knn()
.If
num_neighbors
is an integer,index
anddistance
are both matrices. Each row corresponds to an observation inX
and each column corresponds to one of its neighbors.index
contains the indices of the nearest neighbors whiledistance
contains the distance to those neighbors. In each row, neighbors are guaranteed to be sorted in order of increasing distance. Each row ofindex
is guaranteed to not contain the index of the corresponding observation.If
num_neighbors
is a sequence,index
anddistance
are lists instead. Each list element corresponds to an observation inX
and is a NumPy array containing the indices (forindex
) or distances (fordistance
) to the requested number of neighbors for that observation. For each observation, the neighbors are guaranteed to be sorted in order of increasing distance. Each element ofindex
is guaranteed to not contain the index of the corresponding observation.If
get_index = False
,index
is set to None.If
get_distance = False
,distance
is set to None.If
subset
is provided, the number of rows inindex
anddistance
(ifnum_neighbors
is an integer) or their length (otherwise) is instead equal to the length of the subset. Each row or list entry corresponds to one of the observations in the subset.- __init__(index, distance)¶
- knncolle.find_knn.find_knn(X, num_neighbors, num_threads=1, subset=None, get_index=True, get_distance=True, **kwargs)[source]¶
Find the k-nearest neighbors for each observation.
- Parameters:
X (
Index
) – A prebuilt search index.num_neighbors (
Union
[int
,Sequence
]) –Number of nearest neighbors to identify for each observation in
X
. This is automatically capped at the number of observations minus 1.Alternatively, this may be a sequence of non-negative integers of length equal to the number of observations in
X
, specifying the number of neighbors to find for each observation.If
subset
is supplied andnum_neighbors
is a sequence, it should have length equal tosubset
instead, and should specify the number of neighbors for each observation in the subset.num_threads (
int
) – Number of threads to use for the search.subset (
Optional
[Sequence
]) – Sequence of integers containing the indices of the observations for which to identify neighbors. All indices should be non-negative and less than the total number of observations.get_index (
bool
) – Whether to report the indices of each nearest neighbor.get_distance (
bool
) – Whether to report the distances to each nearest neighbor.kwargs – Additional arguments to pass to specific methods.
- Return type:
- Returns:
Results of the nearest-neighbor search.
knncolle.find_neighbors module¶
- class knncolle.find_neighbors.FindNeighborsResults(index, distance)[source]¶
Bases:
object
Results of
find_neighbors()
.index
anddistance
are lists where each element corresponds to an observation inX
. Each element is a NumPy array containing the indices of (forindex
) or distances to (fordistance
) the neighbors of the corresponding observation within the specified threshold distance. For each observation, neighbors are guaranteed to be sorted in order of increasing distance. Each element ofindex
is guaranteed to not contain the index of the corresponding observation.If
get_index = False
,index
is set to None.If
get_distance = False
,distance
is set to None.If
subset
is provided, the length ofindex
anddistance
is instead equal to the length of the subset. Each row or list entry corresponds to one of the observations in the subset.- __init__(index, distance)¶
- knncolle.find_neighbors.find_neighbors(X, threshold, num_threads=1, subset=None, get_index=True, get_distance=True, **kwargs)[source]¶
Find all neighbors within a certain distance for each observation.
- Parameters:
X (
Index
) – A prebuilt search index.threshold (
Union
[float
,Sequence
]) –Distance threshold at which to identify neighbors for each observation in
X
.Alternatively, this may be a sequence of non-negative floats of length equal to the number of observations in
X
, specifying the distance threshold to search for each observation.If
subset
is supplied andthreshold
is a sequence, it should have length equal tosubset
instead, and should specify the distance threshold for each observation in the subset.num_threads (
int
) – Number of threads to use for the search.subset (
Optional
[Sequence
]) – Sequence of integers containing the indices of the observations for which to identify neighbors. All indices should be non-negative and less than the total number of observations.get_index (
bool
) – Whether to report the indices of each nearest neighbor.get_distance (
bool
) – Whether to report the distances to each nearest neighbor.kwargs – Additional arguments to pass to specific methods.
- Return type:
- Returns:
Results of the neighbor search.
knncolle.hnsw module¶
- class knncolle.hnsw.HnswIndex(ptr)[source]¶
Bases:
GenericIndex
A prebuilt index for the hierarchical navigable small worlds (HNSW) algorithm, created by
define_builder()
with aHnswParameters
object.
- class knncolle.hnsw.HnswParameters(num_links=16, ef_construction=200, ef_search=10, distance='Euclidean')[source]¶
Bases:
Parameters
Parameters for the hierarchical navigable small worlds (HNSW) algorithm, see here for details.
- __init__(num_links=16, ef_construction=200, ef_search=10, distance='Euclidean')[source]¶
- Parameters:
num_links (
int
) – Number of bi-directional links to create per observation during index construction. Larger values improve accuracy at the expense of speed and memory usage.ef_construction (
int
) – Size of the dynamic list for index generation. Larger values improve the quality of the index at the expense of time.ef_search (
int
) – Size of the dynamic list for neighbor searching. Larger values improve accuracy at the expense of a slower search.distance (
Literal
['Euclidean'
,'Manhattan'
,'Cosine'
]) – Distance metric for index construction and search. This should be one ofEuclidean
,Manhattan
orCosine
.
- property distance: str¶
Distance metric, see
__init__()
.
- property ef_construction: int¶
Size of the dynamic list during index construction, see
__init__()
.
- property ef_search: int¶
Size of the dynamic list during search, see
__init__()
.
- property num_links: int¶
Number of links, see
__init__()
.
knncolle.kmknn module¶
- class knncolle.kmknn.KmknnIndex(ptr)[source]¶
Bases:
GenericIndex
A prebuilt index for the k-means k-nearest neighbors algorithm, created by
define_builder()
with aKmknnParameters
instance.
- class knncolle.kmknn.KmknnParameters(distance='Euclidean')[source]¶
Bases:
Parameters
Parameters for the k-means k-nearest neighbors (KMKNN) algorithm.
- __init__(distance='Euclidean')[source]¶
- Parameters:
distance (
Literal
['Euclidean'
,'Manhattan'
,'Cosine'
]) – Distance metric for index construction and search. This should be one ofEuclidean
,Manhattan
orCosine
.
- property distance: str¶
Distance metric, see
__init__()
.
knncolle.query_distance module¶
- knncolle.query_distance.query_distance(X, query, num_neighbors, num_threads=1, **kwargs)[source]¶
Find the distance to the k-th nearest neighbor in the search index for each observation in the query dataset.
- Parameters:
X (
Index
) – A prebuilt search index.query (
ndarray
) – Matrix of coordinates for the query observations. This should be a two-dimensional double-precision NumPy array in Fortran order where the rows are dimensions and columns are observations. The number of dimensions should be consistent with that inX
.num_neighbors (
Union
[int
,Sequence
]) –Number of nearest neighbors in
X
at which to compute the distance from each observation inquery
, i.e., k. This is automatically capped at the total number of observations inX
.Alternatively, this may be a sequence of integers of length equal to the number of observations in
query
, specifying the neighbor at which to compute the distance for each observation.num_threads (
int
) – Number of threads to use for the search.kwargs – Additional arguments to pass to specific methods.
- Return type:
- Returns:
A NumPy array of length equal to the number of observations in
query
containing the distance to thenum_neighbors
-th point inX
for each observation.
knncolle.query_knn module¶
- class knncolle.query_knn.QueryKnnResults(index, distance)[source]¶
Bases:
object
Results of
query_knn()
.If
num_neighbors
is an integer,index
anddistance
are both matrices. Each row corresponds to an observation inquery
and each column corresponds to one of its neighbors inX
.index
contains the indices of the nearest neighbors whiledistance
contains the distance to those neighbors. In each row, neighbors are guaranteed to be sorted in order of increasing distance.If
num_neighbors
is a sequence,index
anddistance
are lists instead. Each list element corresponds to an observation inX
and is a NumPy array containing the indices (forindex
) or distances (fordistance
) to the requested number of neighbors for that observation. For each observation, the neighbors are guaranteed to be sorted in order of increasing distance.If
get_index = False
,index
is set to None.If
get_distance = False
,distance
is set to None.- __init__(index, distance)¶
- knncolle.query_knn.query_knn(X, query, num_neighbors, num_threads=1, get_index=True, get_distance=True, **kwargs)[source]¶
Find the k-nearest neighbors in the search index for each observation in the query matrix.
- Parameters:
X (
Index
) – A prebuilt search index.query (
ndarray
) – Matrix of coordinates for the query observations. This should be a two-dimensional double-precision NumPy array in Fortran order where the rows are dimensions and columns are observations. The number of dimensions should be consistent with that inX
.num_neighbors (
Union
[int
,Sequence
]) –Number of nearest neighbors in
X
to identify for each observation inquery
, i.e., k. This is automatically capped at the total number of observations inX
.Alternatively, this may be a sequence of non-negative integers of length equal to the number of observations in
query
, specifying the number of neighbors to find for each observation.num_threads (
int
) – Number of threads to use for the search.get_index (
bool
) – Whether to report the indices of each nearest neighbor.get_distance (
bool
) – Whether to report the distances to each nearest neighbor.kwargs – Additional arguments to pass to specific methods.
- Return type:
- Returns:
Results of the nearest-neighbor search.
knncolle.query_neighbors module¶
- class knncolle.query_neighbors.QueryNeighborsResults(index, distance)[source]¶
Bases:
object
Results of
query_neighbors()
.index
anddistance
are lists where each element is a NumPy array that corresponds to an observation inquery
. Each array contains the indices of (forindex
) or distances to (fordistance
) the observations ofX
that neighbor the corresponding observation within the specified threshold distance. For each query observation, neighbors are guaranteed to be sorted in order of increasing distance.If
get_index = False
,index
is set to None.If
get_distance = False
,distance
is set to None.- __init__(index, distance)¶
- knncolle.query_neighbors.query_neighbors(X, threshold, num_threads=1, subset=None, get_index=True, get_distance=True, **kwargs)[source]¶
Find all observations in the search index that lie within a threshold distance of each observation in the query dataset.
- Parameters:
X (
Index
) – A prebuilt search index.query – Matrix of coordinates for the query observations. This should be a two-dimensional double-precision NumPy array in Fortran order where the rows are dimensions and columns are observations. The number of dimensions should be consistent with that in
X
.threshold (
Union
[float
,Sequence
]) –Distance threshold at which to identify neighbors for each observation in
X
.Alternatively, this may be a sequence of non-negative floats of length equal to the number of observations in
X
, specifying the distance threshold to search for each observation.num_threads (
int
) – Number of threads to use for the search.get_index (
bool
) – Whether to report the indices of each nearest neighbor.get_distance (
bool
) – Whether to report the distances to each nearest neighbor.kwargs – Additional arguments to pass to specific methods.
- Return type:
- Returns:
Results of the neighbor search.
knncolle.vptree module¶
- class knncolle.vptree.VptreeIndex(ptr)[source]¶
Bases:
GenericIndex
A prebuilt index for the vantage point tree algorithm, created by
define_builder()
with aVptreeParameters
instance.
- class knncolle.vptree.VptreeParameters(distance='Euclidean')[source]¶
Bases:
Parameters
Parameters for the vantage point (VP) tree algorithm.
- __init__(distance='Euclidean')[source]¶
- Parameters:
distance (
Literal
['Euclidean'
,'Manhattan'
,'Cosine'
]) – Distance metric for index construction and search. This should be one ofEuclidean
,Manhattan
orCosine
.
- property distance: str¶
Distance metric, see
__init__()
.