The Horseless Library
Digital Library Discussions
All | JT | General

20061212 Tuesday December 12, 2006

Nifty search tool for Iraq report

if:book has an interesting post on inventive ways to present public-domain texts. http://www.futureofthebook.org/blog/archives/2006/12/how_would_you_design_the_iraq.html

One example is Vivisimo, a search engine company, who did the following:

"The search engine crawled the PDF file of the final report issued by
the Iraq Study Group and indexed it by paragraph rather than as a
single document. By breaking the 142 page file into paragraphs, readers
of the report can now search for specific aspects that are of interest
to them instead of having to read through the entire document or
perform a tedious keyword search within the document using the Acrobat
application. When a search query is entered, the search engine returns
the relevant paragraphs in the search results. Additionally, the Velocity Clustering Engine
is used to cluster the search results into related topic areas. The
clusters allow the reader to easily browse related information and
uncover relationships between topics within the report. This demo was
placed online by Vivísimo within minutes of the publication of the
final report, showing the speed with which the Vivísimo Velocity Search Platform can be deployed."

I don't intuitively see how they've selected the topics that "cluster" under my search terms, but it's an interesting set of views of this text.

I wonder what we could learn from this. Is it (clustered/faceted views of search results) something we could do with our own digital content and Endeca? I'd like to consider useful ways to offer access to content we create, and would be glad to hear from those with a grasp of the technical challenges.

Posted by Monica McCormick | Dec 12 2006, 06:08:01 PM EST | Permalink | Comments [4]



Horseless Library image by Herman Berkhoff
Archives
Links