
Monday July 10, 2006
Snippets
Text snippets are different from abstracts and summaries because they are algorithmically extracted from the source text, rather than editorially created to function as a summary or teaser. For example, compare the news headline treatments of Google News with The New York Times online. In its headline blurbs, Google News uses the beginning of the source news article up to a prescribed number of words or characters, as the snippet. The New York Time blurb is hand authored, and functions as a traditional abstract. The Google News approach arguable employs the most common snippet heuristic, employed in RSS feeds, blog comments, product reviews, etc. The presumption here, I think, is that the beginning of the text is the most useful part of the text to use in the snippet.
Search engines often employ a different method for generating snippets in search results. Google results typically contain auto-generated snippets derived by extracting and combining sentence fragments from the indexed webpage that contain the keyword(s) searched for by the user. This turns out to be useful method for generating teaser text because it literally puts the keyword in context. A similar method of generating snippets is used in Google Book Search.
Recently I learned of The Final Word, a self-described media experiment, that presents New York Times headlines by conjoining the headline with the last paragraph of the Times article. In other words, the "punchline" is used as a teaser for the article. In some cases the last paragraph functions as a true summary. In other cases the last paragraph consists only of a pithy quote. It's unclear to me how useful this is for scanning headlines, but it does make me think that snippet generation is more of an art than a science.
I can imagine a variety of algorithms and heuristics for generating snippets that are more or less useful for specific audiences, or specific types of content. A simple example is competitive intelligence. Corporations have an interest in what their competitors are up to, and are especially interested in news where their own corporation is mentioned, even in cases when they are not the focus of the article. In this context it would be useful to summarize the article by conjoining sentences containing the company names (self and competitors), perhaps highlighting article headlines that contain both. For reviews I wonder if adjectives could play a useful role in snippet generation.
It also seems to me that there is a big difference between creating summaries and creating teaser text.
Can you think of other methods for generating snippets? Are snippets evil?
Posted by Tito Sierra
| Jul 10 2006, 01:28:25 PM EDT
| Permalink
|
Future books
I?ve been reading a lot recently about books, what they are,
and what they can and perhaps might become.
There?s a huge amount of editorial content ranging from the
possibilities that digitizing content a la Google might make to the more arcane
experiments in form and definition of the book itself such as Mackenzie Work?s GAM3R
7H30RY, http://www.futureofthebook.org/gamertheory/
(There?s a nice summary at http://www.laweekly.com/art+books/books/writing-in-public/13910/).
Two recent editorials in the NYTs garnered a lot of
attention. Kevin Kelly?s essay, ?Scan this Book? presented what seemed to me to
be a highly utopian view of digitized books where everything that was possible ?
and hence in his view desirable ? rested upon social networking. It seemed that
there was time for everything except perhaps actually reading content straight
through and reflecting on it. John
Updike?s speech at the Book Expo, http://bookexpocast.com/?p=12,
was reprinted as ?The End of Authorship? and offered a stinging and somewhat overly
vitriolic rebuke that focused mostly on the high literature end of books.
The best summary of the two I?ve seen comes from Ben Vershbow where he says,
I say it again, it's a shame that Kelly, the
uncritical commercialist, and Updike, the nostaligic elitist, have been the
ones framing the public debate. For most of us, Google is neither the eclipse
nor dawn of authorship, but just a single feature of a shifting landscape. Search
is merely a tool, a means: the books themselves are the end. Yet, neither
Google Book Search, which is simply an apparatus for extracting new profits off
of the transmission and search of books, nor the present-day publishing
industry, dominated as it is by mega-conglomerates with their penchant for
blockbusters (our culture haunted by vast legions of the out-of-print), serves
those ends very well. And yet these are the competing futures of the book:
lonely forts and sparkling clouds. Or so we're told.
Posted by ben vershbow on June 27,
2006 01:47 AM at http://www.futureofthebook.org/blog/archives/2006/06/the_least_interesting_conversa.html
If:Book is a good place to start reading if you are
interested in this sort of thing. It has some thoughtful takes on defining and
thinking about books. For example, http://www.futureofthebook.org/blog/archives/2006/06/what_is_a_book.html
See also, http://www.libraryjournal.com/article/CA6332156.html.
Posted by WARREN, SCOTT
| Jul 10 2006, 09:46:38 AM EDT
| Permalink
|
|