Approximate Nearest Neighbor Search with Filters

dc.contributor.advisorDhulipala, Laxmanen_US
dc.contributor.authorLandrum, Benjamin Thomasen_US
dc.contributor.departmentComputer Scienceen_US
dc.contributor.publisherDigital Repository at the University of Marylanden_US
dc.contributor.publisherUniversity of Maryland (College Park, Md.)en_US
dc.date.accessioned2024-07-02T05:48:41Z
dc.date.available2024-07-02T05:48:41Z
dc.date.issued2024en_US
dc.description.abstractApproximate nearest neighbor search (ANNS) on high-dimensional vectors is a fundamental primitive used widely for search over neural embeddings of unstructured data. Prior work on ANNS has produced indices which provide fast and accurate search on datasets up to billions of points, but are not well suited to queries restricted to some subset of the original dataset. Filtered ANNS is a formulation of the problem which adds metadata to points in the dataset which can be used to filter points at query time. This setting requires indexing a dataset in a metadata-aware way to support filtered queries. Filtered ANNS is important for applications such as product and image search, and necessary to give recently popular `vector databases' functionality similar to more traditional tabular databases. This work concerns two versions of the filtered ANNS problem. The most popular formulation in prior work associates points with boolean metadata in the form of labels and filters queries using a boolean predicate on these labels. In this setting, we present a novel index with state-of-the-art performance for queries with filters requiring either one label or both of a pair of labels which won a large benchmarking competition's track focused on the problem. We also introduce a novel formulation of filtered ANNS called `window filtered' ANNS, in which points are associated with a continuous metadata value (in practical use, this corresponds to a timestamp, measure of popularity, etc.), and queries are filtered to a range of metadata values. In addition to describing the problem, we present a practical and theoretically motivated index which handily outperforms baselines.en_US
dc.identifierhttps://doi.org/10.13016/j9tp-e9pg
dc.identifier.urihttp://hdl.handle.net/1903/33067
dc.language.isoenen_US
dc.subject.pqcontrolledComputer scienceen_US
dc.subject.pquncontrolledannen_US
dc.subject.pquncontrolledannsen_US
dc.subject.pquncontrolledapproximate nearest neighborsen_US
dc.subject.pquncontrolledfiltered annsen_US
dc.subject.pquncontrolledfiltered vector searchen_US
dc.subject.pquncontrolledvector searchen_US
dc.titleApproximate Nearest Neighbor Search with Filtersen_US
dc.typeThesisen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Landrum_umd_0117N_24349.pdf
Size:
2.38 MB
Format:
Adobe Portable Document Format