Nearest neighbor search is the problem of computing the most similar objects to a given query object. Typically objects are represented as vectors in a high-dimensional space. You want to be able to compute similar objects fast even when the database contains millions of objects.

Varieties of the problem

  • Sparse Data vs. Dense Data
  • Systems (standalone systems vs. keyword search systems)
  • Online queries vs. batch
  • Queries vs. all nearest neighbors problem
  • Types of similarity


  • Monitoring computer clusters (
  • Anomaly detection in computer cluster: create a correlation matrix between different computers, over time
  • Anomaly detection in graphs: find near-bipartite cores (

watch for unexpected changes in the structure of the matrix - Finding correlations in multiple data streams and with time lags:



Dynamic Time Warping: