In this article I will introduce those not familiar to LSI, the basic notions of what it is and why does it matter. As Google primarily, Yahoo and Msn increase the use of this evaluation method in their quest to more relevant and higher quality results, LSI or related applications of it will become active components of this process.

An introduction to LSI.
Translated to easier terms, “latent” means hidden, “semantic” is meaning, therefore “latent semantic indexing” means hidden meaning indexing. This application of information retrieval technology, which it is based on the vector space model of document classification, evaluates the content of pages within an entire site, and determines the common theme of that site. This application is slowly taking more importance from on-page factors like keyword density, or off-page factors like page rank, in the evaluation/ranking process of the search engines.

Quick facts about LSI.
LSI is 30% more effective than popular word matching methods, specially in cross languages retrievals. LSI can retrieve relevant information that does not contain query words, using a fully automatic statistical method called singular value decomposition. LSI also does consider documents that have many words in common to be semantically close, and ones that have few words in common to be distant.

The LSI vector process.
LSI assumes that there is some underlying or “hidden” structure in word usage that is partially obscured by variability in word choice. So, a truncated singular value decomposition is used to estimate the structure in word usage across documents. At this point, retrieval is performed using the database of singular values and vectors obtained from the truncated decomposition. Data shows that these statistically derived vectors are more robust indicators of meaning than of individual terms.

Why does LSI matters.
Since Google has a “Sandbox” or “Trustbox” filter applied to new sites with no trust in the form of quality links from their neighbors, the application of LSI becomes a good resource. Search engines are able to use this LSI in their databases to associate certain terms with concepts when ranking pages. If applied, sites with excellent content are not penalized because of being new or not having enough trusted links.

How to benefit from LSI.
If you naturally write your content with your theme in mind, and focusing on your visitors, you will have a much greater chance of ranking higher in LSI driven SERPs. Develop your site around a theme, using relevant, related synonyms. Because of this, you will begin to rank well for terms that are not even on your page. Stemming is a clear way of how Google is applying this methodology on a daily basis.

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)

If you enjoyed this post, please consider to leave a comment or subscribe to the feed and get future articles delivered to your feed reader.


No comments yet.

Leave a comment