LSI/LSA/SVD - a Bit of History - Algorithms for Big Data

Latent Semantic Indexing/Analysis (LSI/LSA) is an application of Singular Value Decomposition Technique (SVD) on the so called word-document matrix used in Information Retrieval.

Latent Semantic Indexing was proposed in 1990 by a team of researchers of which Susan Dumais is the most prominent. She is currently a Distinguished Scientist at Micorosft Research. LSI was proposed as a text search technique.

Latent Semantic Analysis was proposed in 2005 by Jerome Bellegarda for natural language processing tasks. Bellegarda showed massive improvements in speech recognition tasks due to the ability of the LSA to capture long-term (or semantic) context of text.

SVD, the underlying technique behind LSI/LSA, is a statistical technique known for a very long time in Statistics and Linear Algebra. In Statistics, SVD is particularly useful in the context of linear regression.

Susain Domain (the original author of LSI) was elected an ACM fellow in 2006 partly because of her LSI contribution, while Jerome Bellegarda was elected an IEEE fellow in 2004 partly due to LSA. These are very high awards speaking to the importance of this method.

You can calculate that currently LSI is 25 years old while LSA is 10 years old. SVD is known for a much longer time.

LSA was highly hyped at the time of its introduction, but it took a bit more time to researchers to properly understand the strengths and weaknesses of the method. No one, as far as I know, could achieve similarly high improvement in perplexity reduction as Bellegarda. However, (Google) word vectors, a similar approach (at least in spirit) to LSA reported high improvement in perplexity reduction.

In the field of Information Retrieval, some textbooks state that no improvement in precision was achieved in search tasks after application of the LSI. However, application of LDA (Latent Dirichlet Allocation) reported improvement in search tasks. LDA was motivated as an improvment of Probabilistic Latent Semantic Analysis (PLSA), which itself was motivated as an improvement of LSA.

Here is a table of ancronyms used in machine learning/data mining literature:

SVD: Singular Value Decomposition. A standard technique from Linear Algebra and Statistics.
LSI: Latent Semantic Indexing. An application of SVD in Information Retrieval.
LSA: Latent Semantic Analysis. LSI applied to natural language processing.
PLSA: Probabilistic Latent Semantic Analysis. An improvement to LSA.
LDA: Latent Dirichlet Allocation. An improvement to PLSA.
PCA: Principle Component Analysis. A statistical technique which is an application of SVD.

In 2009, SVD became popluar in recommendation systems due to the Netflix Prize competition. It was one the few stand-alone methods that achieved good results. One difference from prior work is the so called the smoothing trick of adding a small diagonal matrix before taking the SVD. This trick was not mentioned neither in the LSI or LSA work (however it is well-known in statistics in the context of ridge regression and penalized PCA). So tricks are important and part of this blog post series is about the tricks that I have learned along the way.

I plan to deconstruct the LSA (LSI, SVD) in order to separate fact from fiction and show some practical tricks. I also plan to show that SVD does not work well for some data points (words or documents) when applied to text.

One should keep in mind that even though LSI/LSA/SVD is an old method, the principles behind the method are still valid. Follow up work on Latent Dirichlet Allocation has improved search, and (Google) Word Vectors have improved the NLP perplexity reduction tasks.

Meanwhile, big data requirements motivated the development of fast algorithms for computing cheaply the SVD of a huge matrix (those aglorithms are randomzied). At the same time, SVD has been accepted in the industry. Some companies are applying the SVD (or related techniques such as LDA) in their recommendation systems.