What’s New in Nearest Neighbor Search?
Nearest neighbor search is the problem of computing the most similar objects to a given query object. Typically objects are represented as vectors in a high-dimensional space. You want to be able to compute similar objects fast even when the database contains millions of objects.
Varieties of the problem
- Sparse Data vs. Dense Data
- Systems (standalone systems vs. keyword search systems)
- Online queries vs. batch
- Queries vs. all nearest neighbors problem
- Types of similarity
What’s new
- 2019, Application. In NLP with deep neural networks, one can predict a vector, then search for the corresponding word: “At test time, the model generates a vector and then searches for its nearest neighbor in the target embedding space to generate the corresponding word.” Von Mises-Fisher Loss for Training Sequence to Sequence Models with Continuous Outputs by Kumar and Tsvetkov
- 2019, Library, Microsoft open sources a new NN library SPTAG: A library for fast approximate nearest neighbor search
- 2017, Library, Faiss by Facebook is a library for efficient similarity search and clustering of dense vectors
- 2014, Library, Non-Metric Space Library