The Horseless Library
Digital Library Discussions
All | JT | General

20060824 Thursday August 24, 2006

Some recent interesting blog posts elsewhere about digital library issues

Three recent posts on other blogs dealt with digital librarianship and issues that have been written about by several of us here over the summer including networked books, Google's digitization program, and digital presses in the academy. All three are better written than the average blog post, are not rants (imho) and make for decent reading.

At If: Book a rare post was put up that actually deals with physical books. It challenges academic libraries to stop handing over the keys to the store, so to speak, in making deals with Google's book digitization program. http://www.futureofthebook.org/blog/archives/2006/08/librarians_hold_google_accountable.html

Here's an excerpt:


That's because no sane librarian would outsource their profession to an
unaccountable private entity that refuses to disclose the workings of
its system ? in other words, how does Google's book algorithm work, how
are the search results ranked? And yet so many librarians are behind
this plan. Am I to conclude that they've all gone insane? Or are they
just so anxious about the pace of technological change, driven to
distraction by fears of obsolescence and diminishing reach, that they
are willing to throw their support uncritically behind the company,
who, like a frontier huckster, promises miracle cures and grand visions
of universal knowledge?

Strong stuff, especially that last sentence.

Meanwhile, over at Inside Higher Education, Scott Palmer posted a trenchant critique of If: Book itself and its recent high profile activities.
http://insidehighered.com/views/2006/08/15/palmer

I can't help but smile when Palmer says :

"Still, when one filters out the soul-deadening jargon about ?authentic
learning opportunities,? ?self-reflexivity,? ?mediated environments,?
etc. that permeates their posts, it?s clear that the blog?s authors and
readers are thinking creatively and earnestly (although rather
pretentiously) about the prospects of the digital age in transforming
academic writing."

He also caught my attention when he argued:

"the emphasis that contributors to if:book seem to place on
the ?transparency? of scholarship and ?immediacy? of publication made
possible by digital delivery misses a very important point...One can build a convincing case that, in the current age of instant
analysis, self-absorbed ?experts,? and ubiquitous 24/7 live blog feeds,
the last thing that the academy needs is to embrace transparency and
immediacy."

Finally, IHE also had an article a bit ago that dealt with digital publishing and blogs.
http://insidehighered.com/views/2006/07/12/mclemee
The articles revolves around the question

"But will urging university presses to think more seriously about blogs
(and other new media forms) really offer a solution? Or does it just
compound the problem? Hearing from readers over the past week, I?ve
started to wonder."

Posted by WARREN, SCOTT | Aug 24 2006, 03:08:26 PM EDT | Permalink |

20060807 Monday August 07, 2006

WorldCat.org

OCLC's WorldCat.org search service in now live. The search response is super fast, the interface is very clean, and they even have faceted search refinement functionality across five dimensions (Author, Content, Format, Language, Year). For an initial release this is impressive.

The ability to search across the collections of 18,000 libraries is impressive. The Find in a Library feature allows you to check to see if your local library has a copy of the item. I am interested in the evolution of WorldCat localization services. For users with a known library affiliation (e.g. undergrads), can WorldCat do more than Find in a Library?

Posted by Tito Sierra | Aug 07 2006, 03:25:03 PM EDT | Permalink |

20060801 Tuesday August 01, 2006

Recent Long Tail discussions in newspapers

Two interesting articles about the Chris Anderson's Long Tail phenomenon showed up in the last week. In the Wall Street Journal, Lee Gomes critiqued Andererson's methodologies:


"By Mr. Anderson's calculation, 25% of
Amazon's sales are from its tail, as they involve books you can't find at a traditional
retailer. But using another analysis of those numbers -- an analysis that Mr.
Anderson argues isn't meaningful -- you can show that 2.7% of Amazon's titles
produce a whopping 75% of its revenues. Not quite as impressive."




The article mentions that real economists are beginning to look at some hard data on Anderson's theory and trying to see if it actually pans out.


Sunday's NYT book review had an article by  Rachel Donadio on backlists, http://www.nytimes.com/2006/07/30/books/review/30donadio.html. She argues that

Indeed, so far, the winners in the long tail scenario aren?t
publishers but the online booksellers and the databases that aggregate their
titles, making books stranded on the dusty shelves of local used-book stores
readily available to buyers around the world. Online used-book sales rose 33
percent between 2003 and 2004, to $609 million, in a $2.2 billion used-book
market, according to the Book Industry Study Group. But publishers don?t profit
from used books. Even Anderson acknowledges this. Online retailers may have unlocked the fuller potential of
the used-book market, but ?that doesn?t benefit the authors or the publishers,
because the revenues don?t go to them,? he said in a recent telephone
interview. ?But it does benefit us as consumers.?


Meanwhile, in a great example of irony, Mr. Anderson's book, The Long Tail, sits at #10 on the NYTs nonfiction best seller list, enjoying all the perks that come from blockbuster publishing, bookstore placements, and extended media coverage. It makes me wonder if the best thing to do to help Chris Anderson out might be to not buy his book , but rather find and buy something way down the long tail list, say #100,000 or lower on Amazon. Surely he'd rather be correct in his economic analysis than rich off a blockbuster, right? Right?

But a real question, more pertinent to me, is where are libraries in all this?




Posted by WARREN, SCOTT | Aug 01 2006, 10:07:30 AM EDT | Permalink | Comments [1]

20060725 Tuesday July 25, 2006

Books, German



From French-language Wikimedia comes this picture of a sculpture titled "Modern Book Printing" on Berlin's "Walk of Ideas." Evidently the sculpture is in honor of Gutenberg. Oh, and the 2006 World Cup.

Posted by Amanda French | Jul 25 2006, 07:05:18 AM EDT | Permalink | Comments [2]

20060720 Thursday July 20, 2006

NY Public Library + Amazon mashup: E41ST - Library Way

Just saw this mashup mentioned as a Macromedia Flex Developer contest winner last month. The site is created by Amit Supta and is called E41ST, http://www.amitgupta.info/E41ST/.  It integrates book content from Amazon.com with the catalog holdings of the NYPL. A really neat, enhanced way to browse this info.


Viewing requires the new Flash 9 Player, and I had to restart my browser after that installation in order to access the site. Visitors can view without registering.

Posted by Joe Williams | Jul 20 2006, 01:51:07 PM EDT | Permalink | Comments [2]

20060714 Friday July 14, 2006

All-digital university press at Rice

Okay, I'm completely excited about this. A piece in today's Inside Higher Ed reports that Houston's Rice University has revived its scholarly press.

But this time, it's digital.

I've wanted to see an all-digital university press for a long time. Rice's press is going to maintain high standards of peer review, and it'll easily be able to publish art history books with high-quality images and musicology books with built-in sound files.

As for the economics of it, I bet it'll be cheaper in the long run for universities and more profitable for scholars, too, both in terms of royalties and reputation (which is the real currency in academe). The articles I've published are posted online on my website, and I get a lot more "action" from having them there in terms of queries from scholars than I do from having them in the print journals. No one ever emails me (much less snail-mails me) and says, Hey, I read your article in the Yeats Eliot Review, but I get emails from people every once in awhile who say, Hey, I read your article online. And we then have interesting conversations, productive of learning. What's not to love?

I'm trying to get my dissertation published now, and I've gotten rejection letters that give my work unqualified praise, but cite the "difficult economic considerations that face university presses" as their sole reason for rejecting my proposal. I'm a little hampered by the fact that I really do sympathize. Why should they publish my dissertation as a book when any half-serious scholar can read it through DAI and any yahoo can find it on the internet? Granted, I'm making some major revisions to it, but still. If the economic burden to presses and libraries is decreased enough, it will mean that scholars like me can get the higher level of credentialization that publication (really, peer review) affords based more on the merit of the work than on the financial difficulties of the scholarly publishing biz.

bepress, which is the major (first?) digital publisher of academic journals, makes for an instructive comparison. There's an interesting table there that shows that the per-page subscription costs for bepress journals have decreased from 83 cents per page in 2001 to 36 cents per page in 2005. That's a 56% decrease in four years. I'd bet that the new Rice press will see something similar; at first their operating costs and the concomitant cost of their books will be pretty high, but they'll decrease dramatically very quickly.

I will say that I don't think that a digital revolution in scholarly book publishing will do much to make it easier for junior scholars to get tenure. Digital university press publishing will make it easier to get a book published, yes, and that's a good thing even though there's more and more being published every year, and less and less of it being read, probably. But I think it's a good thing for that scholarship to enter the permanent record, credentialized by publication if it deserves it, because the "long tail" will ensure that if someone needs it sometime, it'll be there. But there are other gi-normous economic considerations driving the overpopulation of graduate schools and the adjunctification of the academic profession, and it's those factors, really, that make it tough to get a tenure-track job and subsequently to get tenure.

One final comment. Note that it's University Librarian and Vice Provost Charles Henry who will be in charge of the new press at Rice. Lots of librarians think that it's not appropriate for libraries to get into the publishing biz, and lots of university publishers think that it's not appropriate for their work to be done by the librarians. I can see that point, but on the whole I think it's a good idea for university libraries and presses to merge, especially when you're talking about digital publishing. Somehow the gap between the task of producing the book and the task of preserving the book narrows in the digital realm. Maybe it's that you need about the same resources (e.g., servers, programmers) to deal with the digital book no matter which one you're doing?

More importantly, I think that university presses and university libraries have more in common with each other (or should) than do university presses and commercial presses. Call me an idealist, but I think that it is part of the research university's overall mission to provide knowledge to the world for its own good, not for a profit. University libraries and presses can cooperate on that mission.

Posted by Amanda French | Jul 14 2006, 01:19:43 PM EDT | Permalink |

20060713 Thursday July 13, 2006

Future - and Old Books, part 4

Here is a case where networked digital books likely would have been of great assistance to an individual. My friend and colleague Keith Morgan sent me an article a few days ago called The Poet of Dielectics that analyzed Marx's Das Kapital as a work of literature rather than as a piece of economic theory as is usually the case. See http://books.guardian.co.uk/review/story/0,,1814909,00.html

The article claims that


As a student Marx was infatuated by Tristram Shandy, and 30 years later
he found a subject which allowed him to mimic the loose and disjointed
style pioneered by Sterne. Like Tristram Shandy, Das Kapital is full of
paradoxes and hypotheses, abstruse explanations and whimsical
tomfoolery, fractured narratives and curious oddities. How else could
he do justice to the mysterious and often topsy-turvy logic of
capitalism?

As I've never read Das Kapital, I was very surprised to find that Marx had read and was quoting from and incorporating huge varieties of classical and other literary sources. Here's a sampling (the paragraphs are a bit out of order from the original article to make this flow better):


At university, Marx "adopted the habit of making extracts from all the
books I read
" - a habit he never lost. A reading list from this period
shows the precocious scope of his intellectual explorations. While
writing a paper on the philosophy of law he made a detailed study of
Winckelmann's History of Art, started to teach himself English and
Italian, translated Tacitus's Germania and Aristotle's Rhetoric, read
Francis Bacon and "spent a good deal of time on Reimarus, to whose book
on the artistic instincts of animals I applied my mind with delight".
This is the same eclectic, omnivorous and often tangential style of
research which gave Das Kapital its extraordinary breadth of reference.

..."They are my slaves," he [Marx] would sometimes say, gesturing at the books on
his shelves, "and they must serve me as I will." The task of this
unpaid workforce was to provide raw materials which could be shaped for
his own purposes. "His conversation does not run in one groove, but is
as varied as are the volumes upon his library shelves," wrote an
interviewer from the Chicago Tribune who visited Marx in 1878. In 1976
SS Prawer wrote a 450-page book devoted to Marx's literary references.
The first volume of Das Kapital yielded quotations from the Bible,
Shakespeare, Goethe, Milton, Voltaire, Homer, Balzac, Dante, Schiller,
Sophocles, Plato, Thucydides, Xenophon, Defoe, Cervantes, Dryden,
Heine, Virgil, Juvenal, Horace, Thomas More, Samuel Butler - as well as
allusions to horror tales, English romantic novels, popular ballads,
songs and jingles, melodrama and farce, myths and proverbs.

...Like Frenhofer, Marx was a modernist avant la lettre. His famous
account of dislocation in the Communist Manifesto - "all that is solid
melts into air" - prefigures the hollow men and the unreal city
depicted by TS Eliot, or Yeats's "Things fall apart; the centre cannot
hold". By the time he wrote Das Kapital, he was pushing out beyond
conventional prose into radical literary collage - juxtaposing voices
and quotations from mythology and literature, from factory inspectors'
reports and fairy tales, in the manner of Ezra Pound's Cantos or
Eliot's The Waste Land. Das Kapital is as discordant as Schoenberg, as
nightmarish as Kafka.

...To prove that money is a radical leveller, Marx quotes a speech from
Timon of Athens on money as the "common whore of mankind", followed by
another from Sophocles's Antigone ("Money! Money's the curse of man,
none greater! / That's what wrecks cities, banishes men from home, /
Tempts and deludes the most well-meaning soul, / Pointing out the way
to infamy and shame . . ."). Economists with anachronistic models and
categories are likened to Don Quixote, who "paid the penalty for
wrongly imagining that knight-errantry was equally compatible with all
economic forms of society".

No wonder it took him 10 years or more to write Das Kapital. Imagine the copying and the prodigious memory to be able to pull all of those varied sources together. Work like this could be made easier with full text searching, digital content, and indexing. While it  takes a rare mind to be able to do anything meaningful with all that content, by providing exposure to as many sources as possible to as many people at possible and giving them at least the chance to read and think and make something new out of any or all of it, someone will do something that changes the world, in big ways or small. That seems to be a part of the golden dream of networked books. That's the part I fully believe in and hope to see happen.

Thanks Keith for passing that article on.

Posted by WARREN, SCOTT | Jul 13 2006, 02:06:42 PM EDT | Permalink |

20060712 Wednesday July 12, 2006

Future books, part 3

Indulge me while I flog the
networked books horse some more. I?ll warn you up front that this is a
relatively long post (were you expecting something short?). Jeff Jarvis of Buzzmachine a few months back had a lot to
say about the future social possibilities of the book, but was unnecessarily
critical of books as they now exist: http://www.buzzmachine.com/index.php/2006/05/19/the-book-is-dead-long-live-the-book/


Here's what he said:


The problems with books are many: They are frozen in time without the
means of being updated and corrected. They have no link to related
knowledge, debates, and sources. They create, at best, a one-way
relationship with a reader. They try to teach readers but don?t teach
authors. They tend to be too damned long because they have to be long
enough to be books. As David Weinberger taught me, they limit how
knowledge can be found because they have to sit on a shelf under one
address; there?s only way way to get to it. They are expensive to
produce. They depend on scarce shelf space. They depend on blockbuster
economics. They can?t afford to serve the real mass of niches. They are
subject to gatekeepers? whims. They aren?t searchable. They aren?t
linkable. They have no metadata. They carry no conversation. They are
thrown out when there?s no space for them anymore. Print is where words go to die.




Wow! No metadata huh? Aside from indices I guess. They carry no conversations? They have no link to related knowledge, debates, and sources? Aside from footnotes and credits I guess.




A good response to Jarvis came
from K. G. Schneider, whom I admire quite a lot:




?Print is where words go to
die?: that depends on the genre. A textbook you might be pressured into writing
for your fall class? That could be short-lived, or even (like the first
technology title I penned) DOA. But ?Pride and Prejudice? isn?t dead, and it
fully participates in a long conversation, continuing all the way to ?The Jane
Austen Book Club? and no doubt beyond.




It may well be that novels and
creative nonfiction move from dead trees to living bytes. Like many librarians,
I don?t have a container fetish, so that?s fine?maybe even better, what with
shelf space and old-growth forests and whatnot (though I do like writing in my
own books, and would expect an electronic book to be as easy to annotate).
Also, we in LibraryLand let David Weinberger think he invented this idea because
he?s such a nice guy, but we *already know* how frustrating it is?and how
limiting?that a book can only be in one physical location at a time. (I manage
a digital library where infinite points of access are part of the satisfying
experience.)




I also anticipate that new media
will birth new genres, some more participatory and interactive than others. I
adore recipes on Epicurious because food preparation is a great example of a
running conversation, and I find my own cookbooks far too silent as a result.




But sometimes?as with the
storyteller around the fire, or the children?s librarian with the hand puppets,
or a writer such as Jane Austen?we want the author to tell the tale. (Consider
how grimly awful most fanfic is.) Let each genre find its natural homes, as
future formats allow, and let new genres spring forth from the fertile fields
of human creativity.
[emphasis mine].




I must say that I now find myself
mostly dismayed by Jarvis and Kelly. I want to keep an open mind to new
developments, but frankly their attitudes towards present day books and the
implications of dismissal that seem to be there for those of us who read books
turn me off so much.  When Jarvis says things
like: ?they create, at best, a one-way relationship with a reader,? ?They try
to teach readers but don?t teach authors,? and ?They tend to be too damned long
because they have to be long enough to be books.?, I just wonder what's he thinking? Here's what I thought about those three particular statements that Jarvis made.



So what exactly is wrong with a
one-way relationship with a reader? Last night I finished Virginia Woolfe?s The Years. What relationships should I exactly be expected to form here? This
wonderful novel is a nuanced and beautifully written exploration of what
constitutes memory and family life set in a particular time and milieu. I am
forming an ongoing relationship with it based on my own thoughts, past experiences,
and readings of other Woolfe works. What is it that I am missing? In the
networked book future, I suppose I would be chopping out segments of the novel
and perhaps ?doing? something with them.



But I already am, just not
digitally. And there?s the crux for Kelly and Jarvis and company. If it isn?t
done digitally, they seem to become really, really upset to the point of being
petulant. I?m thinking and having emotional reactions to what I?m reading and
the online remixing, sharing, and annotating that are so highly touted as the
added value that will become normative parts of books of this envisioned future
seems to me to offer a paltry return compared to whatever thoughts and feelings
plain print books can already engender within me. I?d rather actually read or
view than annotate and compare lists. Maybe these bonus activities will be more
applicable to professional reading and reading for work where segments may
matter more than a whole continuous and contiguous work, but there?s a lot of
reading going on for just pleasure too where a self-contained narrative is just
fine, thank you.



As for the second point about
teaching the author, Virginia Woolfe is dead. What am I supposed to teach her?
If she were alive, would I need to interact with her to have a relationship
with her work? I?ve heard several authors talk and frankly often find my
interaction with them via the page to be far richer than whatever supposed
interactions people like Kevin Kelly insist that I must soon have.



Were I a fiction author, I?m not
at all sure I would want to be in constant dialog with readers who supposedly have
something to teach me ('Hey Dan Brown, listen up, you are a really lousy writer.'). Not
because I?m necessarily better or smarter than they are, but I simply don?t
have the time or desire to be in constant revision, discussion,or explanation. Sometimes
works should just stand as they are. Updike says this pretty well in his
editorial and I think he?s right on the mark there.



Experiments like Mackenzie Work?s
GAM3R 7H30RY which are all about participation are fine (though frankly that
effort appears to me to be more like a pimped-out wiki with an open-post blog
attached to it than a book), but I doubt most authors want that much ongoing
revision. If they do, then networked books will be their medium. However,
participation and feedback already occur. Most books are edited, many books
have acknowledgments to friends and colleagues who parsed some part of the
draft. Just how wide and open does the circle of participation have to be? Maybe there's room for different standards?




As for his comment that books are
too long, that says more to me about Jeff Jarvis than it does about books. His
brave new world seems to reward those who do not want to (or can?t) consume
longer works, but want to work with short segments and remix or tag or do
things to them other than just reading and thinking about them. That?s ok, go
at it, but it seems bizarre to say that books are just too plain long, period.
You?ll get me to take you somewhat seriously by not issuing blanket edicts like
that. I usually associate that attitude with reviews of classics in Amazon written
by self-righteously outraged high school students frustrated at having to read
anything longer and more sophisticated than the latest text message they?ve
received.



It?s neat that there are people
who want to and like to remix and annotate and create new types of expression via
social interactions with text. Some incredible things will likely be done
someday and I applaud those who will bring their creativity to bear. It?s just
that  many of
us, I?m guessing, are still content to read as we presently do. It works - really well.

Posted by WARREN, SCOTT | Jul 12 2006, 02:16:05 PM EDT | Permalink | Comments [3]

20060711 Tuesday July 11, 2006

Future books, part 2.

I think what is happening with the networked books debate is
that Kelly and Jeff Jarvis and the other luminaries of the future book crowd
have heretofore mostly been preaching to the choir. Their exhortations and
descriptions of what could happen ? too often unfortunately and naively phrased
as what will happen ? have fallen on receptive ears and screens via
Wired and If: Book and venues like that. So when the conversation went beyond
those safe and already converted crowds to the population at large, things all
of a sudden got messy. Not everyone by a long shot agreed with the script being
written. And then to be very publicly rebuked and soundly dressed down by a figure
like John Updike, who carries far more cultural and intellectual capital among
a wider swath than Kelly or Jarvis can lay claim to, well, it probably stung like the
dickens.

 Here are some quotes from discussions taking place on If:
book. Speberg?s comments are what led me to believe that they had never really
had any real dissension before.

For Updike and all those unable to cross into the
new Canaan of electronicity, the apotheosis of the artist fits into the
tradition of history as a history of heroes?

But it doesn't seem fruitful to talk about Updike's
writing or rank in the Top 100 Writers list. Instead, let me repeat that his
remarks clearly demonstrate a complete lack of shared values, language and
experience with those who are interested
in moving to the book we will all read in the future.
[Emphasis mine ? the future
is already worked out and decided upon by Roger Sperberg. 'We will all read...' Nice to know.]

To paraphrase something I wrote elsewhere the books in the Library of the Future will be [there's that will be again] more like Paul
Ford?s Ftrain than like anything in Updike's
oeuvre. Everything he writes, however brilliant it is in comparison to
contemporary work, will appear to the future as flat and two-dimensional as all
the art before Giotto and Duccio. Updike doesn't know how to access those other
dimensions (me neither ? but at least I'm aware of them) and he will always be
on the one side of a very clear demarcation in the history of writing.

Posted by: Roger Sperberg at June 7, 2006
06:17 PM

That?s pretty strong determinist thinking there. It could be
true. But it isn?t guaranteed by a long shot. If there?s one thing I try to
avoid doing, it?s predicting the future. We were all supposed to be taking
trips to the moon in our private rockets for vacations by now too and living in
those dreadful Modernist concrete monstrosities designed by Le Corbusier.

Now compare that to the following thoughtful piece, also
posted on If: Books. Eddie Tejeda, whoever he is, is clearly thinking about books. Sperberg by
comparison has an agenda that has been disrupted.

 I really enjoyed Updike's essay. I don't think he
is either denying what is happening to the book (the "book" as we
know it) and I do not think he is on a crusade to try and save the book. I
think he is simply acknowledging the changes to the book and I think he has a
honest concern of what might lost in the transition of moving ideas to the web,
especially from someone who's life has been about books.

I don't think he is trying to hold back what
appears to be progress the way we share ideas. The benefits of the web are
enormous! and it's hard to imagine ever trying to revert it...

But, like Updike, who doesn't acknowledge what is
gained, I think it's important to also acknowledge what might be lost. I often
say that I read the news, facts and interesting ideas on the web all day and I
am rarely satisfied! Thats my life. That is what I do. I read stuff on the web.
Usually interesting stuff. But when I pick up one book, my life changes. Almost
every time! When I finish a (good) book it almost always has a profound effect
of me. I think about the ideas in the book a lot! And the thoughts never fade. Books change the way I think. The internet
fills me up with facts.
[emphasis mine]

In the web I can read about the Ottoman Empires, I
find out who acted in what movie, and I can find out details on the collapse of
the Argentinean economy in seconds, and now I often say I have a hard time
imagining not having the internet to answer many of my questions. I joke: Before
the internet, what did people do when someone said an ambiguous or incorrect
statement? Unless you bothered going to the library every time someone said a
strange "fact", how would you know if it's true? Did you just accept
it? Who bothered doing "research"? That world now seems distant to
me.

But I wonder, as it appears Updike does, wether
that profound moment you have after reading book is lost. Will it be replaced
with technology? maybe... until then..I think it's fair to lament what might be
lost.

Posted by: Eddie A. Tejeda at June 27, 2006
08:49 PM

Thanks Eddie for helping me think a bit too.  I think we may gain lots of good new things with networked books and lots of them we probably haven't yet anticipated, but it doesn't mean we have to, or want to, throw away or give up what's already good about books now.  It doesn't have to be an absolutist one or the other kind of situation. That just doesn't make sense.

Posted by WARREN, SCOTT | Jul 11 2006, 10:10:51 AM EDT | Permalink | Comments [2]

20060710 Monday July 10, 2006

Snippets

Text snippets are different from abstracts and summaries because they are algorithmically extracted from the source text, rather than editorially created to function as a summary or teaser.  For example, compare the news headline treatments of Google News with The New York Times online.  In its headline blurbs, Google News uses the beginning of the source news article up to a prescribed number of words or characters, as the snippet.  The New York Time blurb is hand authored, and functions as a traditional abstract.  The Google News approach arguable employs the most common snippet heuristic, employed in RSS feeds, blog comments, product reviews, etc.  The presumption here, I think, is that the beginning of the text is the most useful part of the text to use in the snippet.

Search engines often employ a different method for generating snippets in search results.  Google results typically contain auto-generated snippets derived by extracting and combining sentence fragments from the indexed webpage that contain the keyword(s) searched for by the user.  This turns out to be useful method for generating teaser text because it literally puts the keyword in context.  A similar method of generating snippets is used in Google Book Search.

Recently I learned of The Final Word, a self-described media experiment, that presents New York Times headlines by conjoining the headline with the last paragraph of the Times article.  In other words, the "punchline" is used as a teaser for the article.  In some cases the last paragraph functions as a true summary.  In other cases the last paragraph consists only of a pithy quote.  It's unclear to me how useful this is for scanning headlines, but it does make me think that snippet generation is more of an art than a science. 

I can imagine a variety of algorithms and heuristics for generating snippets that are more or less useful for specific audiences, or specific types of content.  A simple example is competitive intelligence.  Corporations have an interest in what their competitors are up to, and are especially interested in news where their own corporation is mentioned, even in cases when they are not the focus of the article.  In this context it would be useful to summarize the article by conjoining sentences containing the company names (self and competitors), perhaps highlighting article headlines that contain both.  For reviews I wonder if adjectives could play a useful role in snippet generation.

It also seems to me that there is a big difference between creating summaries and creating teaser text. 

Can you think of other methods for generating snippets?  Are snippets evil?

Posted by Tito Sierra | Jul 10 2006, 01:28:25 PM EDT | Permalink | Comments [3]

Future books

I?ve been reading a lot recently about books, what they are,
and what they can and perhaps might become. 
There?s a huge amount of editorial content ranging from the
possibilities that digitizing content a la Google might make to the more arcane
experiments in form and definition of the book itself such as Mackenzie Work?s GAM3R
7H30RY, http://www.futureofthebook.org/gamertheory/
(There?s a nice summary at http://www.laweekly.com/art+books/books/writing-in-public/13910/).

Two recent editorials in the NYTs garnered a lot of
attention. Kevin Kelly?s essay, ?Scan this Book? presented what seemed to me to
be a highly utopian view of digitized books where everything that was possible ?
and hence in his view desirable ? rested upon social networking. It seemed that
there was time for everything except perhaps actually reading content straight
through and reflecting on it.  John
Updike?s speech at the Book Expo, http://bookexpocast.com/?p=12,
was reprinted as ?The End of Authorship? and offered a stinging and somewhat overly
vitriolic rebuke that focused mostly on the high literature end of books.
The best summary of the two I?ve seen comes from Ben Vershbow where he says,

I say it again, it's a shame that Kelly, the
uncritical commercialist, and Updike, the nostaligic elitist, have been the
ones framing the public debate. For most of us, Google is neither the eclipse
nor dawn of authorship, but just a single feature of a shifting landscape. Search
is merely a tool, a means: the books themselves are the end. Yet, neither
Google Book Search, which is simply an apparatus for extracting new profits off
of the transmission and search of books, nor the present-day publishing
industry, dominated as it is by mega-conglomerates with their penchant for
blockbusters (our culture haunted by vast legions of the out-of-print), serves
those ends very well. And yet these are the competing futures of the book:
lonely forts and sparkling clouds. Or so we're told.

Posted by ben vershbow on June 27,
2006 01:47 AM at http://www.futureofthebook.org/blog/archives/2006/06/the_least_interesting_conversa.html

If:Book is a good place to start reading if you are
interested in this sort of thing. It has some thoughtful takes on defining and
thinking about books. For example, http://www.futureofthebook.org/blog/archives/2006/06/what_is_a_book.html

See also, http://www.libraryjournal.com/article/CA6332156.html.

 

Posted by WARREN, SCOTT | Jul 10 2006, 09:46:38 AM EDT | Permalink | Comments [2]

20060622 Thursday June 22, 2006

The Approachable Tablet PC

Last month, I started using an IBM Lenovo Tablet PC at work. Our Head of I.T., another Tablet owner, warned me that Tablets are like magnets. He said that people - strangers - would now stop me everywhere I went, and he was right. I am now stopped consistently in the airport, approached on the street, and interrupted in coffee shops by people interested in the Tablet or, it sometimes seems, just interested in chatting and using the Tablet as an ice-breaker. When I attended the Educause Southeast Regional Conference in Atlanta this week, I had a particularly interesting Tablet-related experience.


I took all my notes for the Conference on the Lenovo. I love the flexibility of stylus-based, handwriting recognition text entry. Now I can draw diagrams in my notes, write in the margins of electronic documents, basically doing anything I can do with pen and paper, and have a digital file as the end product.


So, I attended a large-group discussion on Tuesday and the session was generating lots of input from the participants. Suddenly, the facilitator stopped the conversation mid-stream and asked if someone would volunteer to take notes and capture all the ideas bouncing around.


I am not much of a note-taker, and I confess to looking down at my feet when this request was made. The room was silent for a long time and when I looked up, the entire room was looking at me and smiling. We all started laughing, and the facilitator said 'Would you mind?'


This situation was strange to me because so many other people in the room were taking notes on pen and paper or on their laptops. I didn't know a soul in the room, while many of the other attendees seemed to know each other well. I was participating vocally like everyone else. I could only think that the Tablet made me the unanimous note-taker choice.


I thought maybe people saw me scribbling with my stylus, so it was obvious that I was taking notes already. But, like I said, there were plenty of people writing on pads of paper. Then, I thought, maybe I was singled out because my notes were digital. But, what about all those laptop users in the room? Aren't typewritten notes going to be more legible anyway? Maybe participants assumed that the laptop users were really just websurfing, checking email, or doing other work? Do people just identify the tablet as a specialized 'note-taking' tool - moreso than a legal pad and ballpoint pen?


This phenomenon is interesting to me because librarians often deal with 'approachability' issues at the reference or information desk. The nature of library work now requires some sort of computer at these service desks. However, patrons feel uncomfortable approaching librarians who are working on or sitting at a computer. Maybe librarians need to carry Tablets at these service points instead?

Posted by Joe Williams | Jun 22 2006, 10:32:50 AM EDT | Permalink | Comments [2]

20060616 Friday June 16, 2006

Digital, flexible paper

Take a look at http://www.plasticlogic.com/lifeisflexible.php

Pretty neat stuff. What caught my attention are the designs. Good technology alone isn't the solution to e-books. You've got to have usable designs and the bright winners of this contest clearly have good ideas and have thought about the human being actually using the device and situations in which flexible digital paper could be useful.

Posted by WARREN, SCOTT | Jun 16 2006, 02:41:58 PM EDT | Permalink |

She frequently had recourse to digital aid

Happy Bloomsday, everyone. In honor of the occasion, I'd like to refer you to a catechistic passage about Leopold Bloom and his wife Molly from Episode 17 ("Ithaca") of Joyce's Ulysses, available in full online at Project Gutenberg. Kudos to PG, as usual, for the "plain vanilla" text that serves so many purposes so well. There's also a version broken down by episode that was copied from PG, I think, by a resourceful and helpful professor at U Penn.


Which domestic problem as much as, if not more than, any other frequently engaged his mind?

What to do with our wives.

What had been his hypothetical singular solutions?

Parlour games (dominos, halma, tiddledywinks, spilikins, cup and ball, nap, spoil five, bezique, twentyfive, beggar my neighbour, draughts, chess or backgammon): embroidery, darning or knitting for the policeaided clothing society: musical duets, mandoline and guitar, piano and flute, guitar and piano: legal scrivenery or envelope addressing: biweekly visits to variety entertainments: commercial activity as pleasantly commanding and pleasingly obeyed mistress proprietress in a cool dairy shop or warm cigar divan: the clandestine satisfaction of erotic irritation in masculine brothels, state inspected and medically controlled: social visits, at regular infrequent prevented intervals and with regular frequent preventive superintendence, to and from female acquaintances of recognised respectability in the vicinity: courses of evening instruction specially designed to render liberal instruction agreeable.

What instances of deficient mental development in his wife inclined him in favour of the lastmentioned (ninth) solution?

In disoccupied moments she had more than once covered a sheet of paper with signs and hieroglyphics which she stated were Greek and Irish and Hebrew characters. She had interrogated constantly at varying intervals as to the correct method of writing the capital initial of the name of a city in Canada, Quebec. She understood little of political complications, internal, or balance of power, external. In calculating the addenda of bills she frequently had recourse to digital aid. After completion of laconic epistolary compositions she abandoned the implement of calligraphy in the encaustic pigment, exposed to the corrosive action of copperas, green vitriol and nutgall. Unusual polysyllables of foreign origin she interpreted phonetically or by false analogy or by both: metempsychosis (met him pike hoses), ALIAS (a mendacious person mentioned in sacred scripture).

What compensated in the false balance of her intelligence for these and such deficiencies of judgment regarding persons, places and things?

The false apparent parallelism of all perpendicular arms of all balances, proved true by construction. The counterbalance of her proficiency of judgment regarding one person, proved true by experiment.



Posted by Amanda French | Jun 16 2006, 12:46:50 PM EDT | Permalink | Comments [3]

More on the nora project

Yesterday, I was fortunate enough to attend a live demonstration of the nora project at JCDL 2006.  As you may recall from Amanda's previous post, the nora project aims to develop tools for detecting patterns in humanities collections.  The demo I attended was part of a presentation provocatively titled "Exploring Erotics in Emily Dickinson's Correspondence with Text Mining and Visual Interfaces".  The data source for this particular demo was a collection of about 300 letters written by Emily Dickinson to her sister-in-law.  The tool is being used to help scholars in the interpretation of literary work.  You can actually launch the demo application (click on "Nora Visualization") from the nora project website.  Below is a screenshot I took this morning from the downloadable demo tool (with bogus ratings inserted by me for illustration purposes).

screenshot of nora visualization tool

In a nutshell, the tool allows you to browse a collection of Emily Dickinson poems, and rate the poem on a 1-5 scale according to some predefined criteria.  The criteria in this case was the erotic nature of the poem (red is "hot", black is "not hot").  The user ratings provide a baseline for the text mining algorithm to do it's work of classifying the remaining poems as "hot" or "not hot" using a Naive Bayes algorithm.  The predicted "hot" poems are marked in purple.  The tool highlights words within the collection that were algorithmically associated with "hotness" and "non hotness", and provides scatterplots for detecting patterns over time.

The great thing about this tool is that it supports open-ended interpretive analysis, not bound to a specific collection or topic dimension.  In the future I expect to see text mining tools like this embedded as services in a variety of digital libraries and repositories.

Posted by Tito Sierra | Jun 16 2006, 12:00:38 PM EDT | Permalink | Comments [3]



Horseless Library image by Herman Berkhoff
Archives
Links