What’s New in Nearest Neighbor Search?
Nearest neighbor search is the problem of computing the most similar objects to a given query object. Typically objects are represented as vectors in a high-dimensional space. You want to be able to compute similar objects fast even when the database contains millions of objects.
Varieties of the problem
- Sparse Data vs. Dense Data
- Systems (standalone systems vs. keyword search systems)
- Online queries vs. batch
- Queries vs. all nearest neighbors problem
- Types of similarity
Applications
- Monitoring computer clusters (http://www.msr-waypoint.com/en-us/people/sumann/correlation-sigmod10.pdf)
- Anomaly detection in computer cluster: create a correlation matrix between different computers, over time
- Anomaly detection in graphs: find near-bipartite cores (http://alexbeutel.com/papers/www2013_copycatch.pdf)
watch for unexpected changes in the structure of the matrix - Finding correlations in multiple data streams and with time lags: http://www.cs.cmu.edu/~christos/PUBLICATIONS/sigmod05-braid.pdf
References
Algorithms:
Dynamic Time Warping: https://github.com/lemire/lbimproved