Creating tools to identify the latent knowledge found in text

Semantic indexing is our name for a family of techniques for searching and organizing large data collections. The goal of semantic indexing is to find patterns in unstructured data (documents without descriptors such as keywords or special tags) and use those patterns to offer more effective search and categorization services.

During the past five years, a team of linguists and computer scientists at NITLE and Middlebury College has developed a prototype Semantic Engine. This prototype was designed to address the universal problem of accessing and organizing large amounts of unstructured digital text. Using mathematical algorithms to index the latent semantic content of documents, the prototype engine has been demonstrated to drastically reduce, if not eliminate, the need for expensive and time-consuming metadata tagging, and to produce results superior to keyword searches in limited test domains.

The Semantic Engine is at the center of our research and development mission. It is designed to enable scholars and educators to manage the already overwhelming and ever-increasing volume of data that we encounter in every field of inquiry. We are developing a set of tools that will enable researchers to quickly search large datasets that may be distributed across multiple databases, to interact with the engine to refine the search, and to contribute their knowledge to the collection. We also are creating a set of visualization and archiving tools for the researcher to use, to facilitate the organization and dissemination of the search results.