The Horseless Library
Digital Library Discussions
All | JT | General

20060711 Tuesday July 11, 2006

Future books, part 2.

I think what is happening with the networked books debate is
that Kelly and Jeff Jarvis and the other luminaries of the future book crowd
have heretofore mostly been preaching to the choir. Their exhortations and
descriptions of what could happen ? too often unfortunately and naively phrased
as what will happen ? have fallen on receptive ears and screens via
Wired and If: Book and venues like that. So when the conversation went beyond
those safe and already converted crowds to the population at large, things all
of a sudden got messy. Not everyone by a long shot agreed with the script being
written. And then to be very publicly rebuked and soundly dressed down by a figure
like John Updike, who carries far more cultural and intellectual capital among
a wider swath than Kelly or Jarvis can lay claim to, well, it probably stung like the
dickens.

 Here are some quotes from discussions taking place on If:
book. Speberg?s comments are what led me to believe that they had never really
had any real dissension before.

For Updike and all those unable to cross into the
new Canaan of electronicity, the apotheosis of the artist fits into the
tradition of history as a history of heroes?

But it doesn't seem fruitful to talk about Updike's
writing or rank in the Top 100 Writers list. Instead, let me repeat that his
remarks clearly demonstrate a complete lack of shared values, language and
experience with those who are interested
in moving to the book we will all read in the future.
[Emphasis mine ? the future
is already worked out and decided upon by Roger Sperberg. 'We will all read...' Nice to know.]

To paraphrase something I wrote elsewhere the books in the Library of the Future will be [there's that will be again] more like Paul
Ford?s Ftrain than like anything in Updike's
oeuvre. Everything he writes, however brilliant it is in comparison to
contemporary work, will appear to the future as flat and two-dimensional as all
the art before Giotto and Duccio. Updike doesn't know how to access those other
dimensions (me neither ? but at least I'm aware of them) and he will always be
on the one side of a very clear demarcation in the history of writing.

Posted by: Roger Sperberg at June 7, 2006
06:17 PM

That?s pretty strong determinist thinking there. It could be
true. But it isn?t guaranteed by a long shot. If there?s one thing I try to
avoid doing, it?s predicting the future. We were all supposed to be taking
trips to the moon in our private rockets for vacations by now too and living in
those dreadful Modernist concrete monstrosities designed by Le Corbusier.

Now compare that to the following thoughtful piece, also
posted on If: Books. Eddie Tejeda, whoever he is, is clearly thinking about books. Sperberg by
comparison has an agenda that has been disrupted.

 I really enjoyed Updike's essay. I don't think he
is either denying what is happening to the book (the "book" as we
know it) and I do not think he is on a crusade to try and save the book. I
think he is simply acknowledging the changes to the book and I think he has a
honest concern of what might lost in the transition of moving ideas to the web,
especially from someone who's life has been about books.

I don't think he is trying to hold back what
appears to be progress the way we share ideas. The benefits of the web are
enormous! and it's hard to imagine ever trying to revert it...

But, like Updike, who doesn't acknowledge what is
gained, I think it's important to also acknowledge what might be lost. I often
say that I read the news, facts and interesting ideas on the web all day and I
am rarely satisfied! Thats my life. That is what I do. I read stuff on the web.
Usually interesting stuff. But when I pick up one book, my life changes. Almost
every time! When I finish a (good) book it almost always has a profound effect
of me. I think about the ideas in the book a lot! And the thoughts never fade. Books change the way I think. The internet
fills me up with facts.
[emphasis mine]

In the web I can read about the Ottoman Empires, I
find out who acted in what movie, and I can find out details on the collapse of
the Argentinean economy in seconds, and now I often say I have a hard time
imagining not having the internet to answer many of my questions. I joke: Before
the internet, what did people do when someone said an ambiguous or incorrect
statement? Unless you bothered going to the library every time someone said a
strange "fact", how would you know if it's true? Did you just accept
it? Who bothered doing "research"? That world now seems distant to
me.

But I wonder, as it appears Updike does, wether
that profound moment you have after reading book is lost. Will it be replaced
with technology? maybe... until then..I think it's fair to lament what might be
lost.

Posted by: Eddie A. Tejeda at June 27, 2006
08:49 PM

Thanks Eddie for helping me think a bit too.  I think we may gain lots of good new things with networked books and lots of them we probably haven't yet anticipated, but it doesn't mean we have to, or want to, throw away or give up what's already good about books now.  It doesn't have to be an absolutist one or the other kind of situation. That just doesn't make sense.

Posted by WARREN, SCOTT | Jul 11 2006, 10:10:51 AM EDT | Permalink | Comments [2]

20060710 Monday July 10, 2006

Snippets

Text snippets are different from abstracts and summaries because they are algorithmically extracted from the source text, rather than editorially created to function as a summary or teaser.  For example, compare the news headline treatments of Google News with The New York Times online.  In its headline blurbs, Google News uses the beginning of the source news article up to a prescribed number of words or characters, as the snippet.  The New York Time blurb is hand authored, and functions as a traditional abstract.  The Google News approach arguable employs the most common snippet heuristic, employed in RSS feeds, blog comments, product reviews, etc.  The presumption here, I think, is that the beginning of the text is the most useful part of the text to use in the snippet.

Search engines often employ a different method for generating snippets in search results.  Google results typically contain auto-generated snippets derived by extracting and combining sentence fragments from the indexed webpage that contain the keyword(s) searched for by the user.  This turns out to be useful method for generating teaser text because it literally puts the keyword in context.  A similar method of generating snippets is used in Google Book Search.

Recently I learned of The Final Word, a self-described media experiment, that presents New York Times headlines by conjoining the headline with the last paragraph of the Times article.  In other words, the "punchline" is used as a teaser for the article.  In some cases the last paragraph functions as a true summary.  In other cases the last paragraph consists only of a pithy quote.  It's unclear to me how useful this is for scanning headlines, but it does make me think that snippet generation is more of an art than a science. 

I can imagine a variety of algorithms and heuristics for generating snippets that are more or less useful for specific audiences, or specific types of content.  A simple example is competitive intelligence.  Corporations have an interest in what their competitors are up to, and are especially interested in news where their own corporation is mentioned, even in cases when they are not the focus of the article.  In this context it would be useful to summarize the article by conjoining sentences containing the company names (self and competitors), perhaps highlighting article headlines that contain both.  For reviews I wonder if adjectives could play a useful role in snippet generation.

It also seems to me that there is a big difference between creating summaries and creating teaser text. 

Can you think of other methods for generating snippets?  Are snippets evil?

Posted by Tito Sierra | Jul 10 2006, 01:28:25 PM EDT | Permalink | Comments [3]

Future books

I?ve been reading a lot recently about books, what they are,
and what they can and perhaps might become. 
There?s a huge amount of editorial content ranging from the
possibilities that digitizing content a la Google might make to the more arcane
experiments in form and definition of the book itself such as Mackenzie Work?s GAM3R
7H30RY, http://www.futureofthebook.org/gamertheory/
(There?s a nice summary at http://www.laweekly.com/art+books/books/writing-in-public/13910/).

Two recent editorials in the NYTs garnered a lot of
attention. Kevin Kelly?s essay, ?Scan this Book? presented what seemed to me to
be a highly utopian view of digitized books where everything that was possible ?
and hence in his view desirable ? rested upon social networking. It seemed that
there was time for everything except perhaps actually reading content straight
through and reflecting on it.  John
Updike?s speech at the Book Expo, http://bookexpocast.com/?p=12,
was reprinted as ?The End of Authorship? and offered a stinging and somewhat overly
vitriolic rebuke that focused mostly on the high literature end of books.
The best summary of the two I?ve seen comes from Ben Vershbow where he says,

I say it again, it's a shame that Kelly, the
uncritical commercialist, and Updike, the nostaligic elitist, have been the
ones framing the public debate. For most of us, Google is neither the eclipse
nor dawn of authorship, but just a single feature of a shifting landscape. Search
is merely a tool, a means: the books themselves are the end. Yet, neither
Google Book Search, which is simply an apparatus for extracting new profits off
of the transmission and search of books, nor the present-day publishing
industry, dominated as it is by mega-conglomerates with their penchant for
blockbusters (our culture haunted by vast legions of the out-of-print), serves
those ends very well. And yet these are the competing futures of the book:
lonely forts and sparkling clouds. Or so we're told.

Posted by ben vershbow on June 27,
2006 01:47 AM at http://www.futureofthebook.org/blog/archives/2006/06/the_least_interesting_conversa.html

If:Book is a good place to start reading if you are
interested in this sort of thing. It has some thoughtful takes on defining and
thinking about books. For example, http://www.futureofthebook.org/blog/archives/2006/06/what_is_a_book.html

See also, http://www.libraryjournal.com/article/CA6332156.html.

 

Posted by WARREN, SCOTT | Jul 10 2006, 09:46:38 AM EDT | Permalink | Comments [2]

20060622 Thursday June 22, 2006

The Approachable Tablet PC

Last month, I started using an IBM Lenovo Tablet PC at work. Our Head of I.T., another Tablet owner, warned me that Tablets are like magnets. He said that people - strangers - would now stop me everywhere I went, and he was right. I am now stopped consistently in the airport, approached on the street, and interrupted in coffee shops by people interested in the Tablet or, it sometimes seems, just interested in chatting and using the Tablet as an ice-breaker. When I attended the Educause Southeast Regional Conference in Atlanta this week, I had a particularly interesting Tablet-related experience.


I took all my notes for the Conference on the Lenovo. I love the flexibility of stylus-based, handwriting recognition text entry. Now I can draw diagrams in my notes, write in the margins of electronic documents, basically doing anything I can do with pen and paper, and have a digital file as the end product.


So, I attended a large-group discussion on Tuesday and the session was generating lots of input from the participants. Suddenly, the facilitator stopped the conversation mid-stream and asked if someone would volunteer to take notes and capture all the ideas bouncing around.


I am not much of a note-taker, and I confess to looking down at my feet when this request was made. The room was silent for a long time and when I looked up, the entire room was looking at me and smiling. We all started laughing, and the facilitator said 'Would you mind?'


This situation was strange to me because so many other people in the room were taking notes on pen and paper or on their laptops. I didn't know a soul in the room, while many of the other attendees seemed to know each other well. I was participating vocally like everyone else. I could only think that the Tablet made me the unanimous note-taker choice.


I thought maybe people saw me scribbling with my stylus, so it was obvious that I was taking notes already. But, like I said, there were plenty of people writing on pads of paper. Then, I thought, maybe I was singled out because my notes were digital. But, what about all those laptop users in the room? Aren't typewritten notes going to be more legible anyway? Maybe participants assumed that the laptop users were really just websurfing, checking email, or doing other work? Do people just identify the tablet as a specialized 'note-taking' tool - moreso than a legal pad and ballpoint pen?


This phenomenon is interesting to me because librarians often deal with 'approachability' issues at the reference or information desk. The nature of library work now requires some sort of computer at these service desks. However, patrons feel uncomfortable approaching librarians who are working on or sitting at a computer. Maybe librarians need to carry Tablets at these service points instead?

Posted by Joe Williams | Jun 22 2006, 10:32:50 AM EDT | Permalink | Comments [2]

20060616 Friday June 16, 2006

Digital, flexible paper

Take a look at http://www.plasticlogic.com/lifeisflexible.php

Pretty neat stuff. What caught my attention are the designs. Good technology alone isn't the solution to e-books. You've got to have usable designs and the bright winners of this contest clearly have good ideas and have thought about the human being actually using the device and situations in which flexible digital paper could be useful.

Posted by WARREN, SCOTT | Jun 16 2006, 02:41:58 PM EDT | Permalink |

She frequently had recourse to digital aid

Happy Bloomsday, everyone. In honor of the occasion, I'd like to refer you to a catechistic passage about Leopold Bloom and his wife Molly from Episode 17 ("Ithaca") of Joyce's Ulysses, available in full online at Project Gutenberg. Kudos to PG, as usual, for the "plain vanilla" text that serves so many purposes so well. There's also a version broken down by episode that was copied from PG, I think, by a resourceful and helpful professor at U Penn.


Which domestic problem as much as, if not more than, any other frequently engaged his mind?

What to do with our wives.

What had been his hypothetical singular solutions?

Parlour games (dominos, halma, tiddledywinks, spilikins, cup and ball, nap, spoil five, bezique, twentyfive, beggar my neighbour, draughts, chess or backgammon): embroidery, darning or knitting for the policeaided clothing society: musical duets, mandoline and guitar, piano and flute, guitar and piano: legal scrivenery or envelope addressing: biweekly visits to variety entertainments: commercial activity as pleasantly commanding and pleasingly obeyed mistress proprietress in a cool dairy shop or warm cigar divan: the clandestine satisfaction of erotic irritation in masculine brothels, state inspected and medically controlled: social visits, at regular infrequent prevented intervals and with regular frequent preventive superintendence, to and from female acquaintances of recognised respectability in the vicinity: courses of evening instruction specially designed to render liberal instruction agreeable.

What instances of deficient mental development in his wife inclined him in favour of the lastmentioned (ninth) solution?

In disoccupied moments she had more than once covered a sheet of paper with signs and hieroglyphics which she stated were Greek and Irish and Hebrew characters. She had interrogated constantly at varying intervals as to the correct method of writing the capital initial of the name of a city in Canada, Quebec. She understood little of political complications, internal, or balance of power, external. In calculating the addenda of bills she frequently had recourse to digital aid. After completion of laconic epistolary compositions she abandoned the implement of calligraphy in the encaustic pigment, exposed to the corrosive action of copperas, green vitriol and nutgall. Unusual polysyllables of foreign origin she interpreted phonetically or by false analogy or by both: metempsychosis (met him pike hoses), ALIAS (a mendacious person mentioned in sacred scripture).

What compensated in the false balance of her intelligence for these and such deficiencies of judgment regarding persons, places and things?

The false apparent parallelism of all perpendicular arms of all balances, proved true by construction. The counterbalance of her proficiency of judgment regarding one person, proved true by experiment.



Posted by Amanda French | Jun 16 2006, 12:46:50 PM EDT | Permalink | Comments [3]

More on the nora project

Yesterday, I was fortunate enough to attend a live demonstration of the nora project at JCDL 2006.  As you may recall from Amanda's previous post, the nora project aims to develop tools for detecting patterns in humanities collections.  The demo I attended was part of a presentation provocatively titled "Exploring Erotics in Emily Dickinson's Correspondence with Text Mining and Visual Interfaces".  The data source for this particular demo was a collection of about 300 letters written by Emily Dickinson to her sister-in-law.  The tool is being used to help scholars in the interpretation of literary work.  You can actually launch the demo application (click on "Nora Visualization") from the nora project website.  Below is a screenshot I took this morning from the downloadable demo tool (with bogus ratings inserted by me for illustration purposes).

screenshot of nora visualization tool

In a nutshell, the tool allows you to browse a collection of Emily Dickinson poems, and rate the poem on a 1-5 scale according to some predefined criteria.  The criteria in this case was the erotic nature of the poem (red is "hot", black is "not hot").  The user ratings provide a baseline for the text mining algorithm to do it's work of classifying the remaining poems as "hot" or "not hot" using a Naive Bayes algorithm.  The predicted "hot" poems are marked in purple.  The tool highlights words within the collection that were algorithmically associated with "hotness" and "non hotness", and provides scatterplots for detecting patterns over time.

The great thing about this tool is that it supports open-ended interpretive analysis, not bound to a specific collection or topic dimension.  In the future I expect to see text mining tools like this embedded as services in a variety of digital libraries and repositories.

Posted by Tito Sierra | Jun 16 2006, 12:00:38 PM EDT | Permalink | Comments [3]

20060615 Thursday June 15, 2006

Librarians and Search Industry

Librarians are sometimes cast as being separate or somehow removed from the IT and Search & Retrieval fields. So, when I was reading this month's Search Engine Report, I was happy to see that Search Engine Watch hires - and really seems to value - librarians ("you know, those human search engines that have helped people for thousands of years").


The Report references one librarian's search-related blog. For a large, international list of other library-related blogs, check out http://www.libdex.com/weblogs.html.  

Posted by Joe Williams | Jun 15 2006, 10:35:30 AM EDT | Permalink |

Big

Here is a search engine that takes simple search to a whole new level.

Posted by Tito Sierra | Jun 15 2006, 09:28:07 AM EDT | Permalink | Comments [1]

20060614 Wednesday June 14, 2006

Digital time capsules: Zittrain at JCDL2006

Yesterday I attended a fascinating presentation by Jonathan Zittrain at the JCDL 2006 conference.  His topic was "Open Information: Redaction, Restriction, and Removal".  One problem he posed is how we should deal with retracted or edited information published in the open information environment.  Sometimes information is published that is controversial (e.g. Danish newspaper cartoons controversy) or incredibly sensitive (e.g. scholarly articles on how to contaminate the milk supply).  Sometimes there is a compelling public interest to redact or retract this information because of its sensitive nature at the present time.  For content published in digital form, edits can occur silently, and digital archives can be purged from databases and filesystems.  But there can also be a compelling long term interest in preserving controversial content for historical and cultural research.

How do we deal with these competing interests to censor and archive sensitive materials?  One idea Zittrain raised is that of an archive encryption key.  Rather then destroy censored materials from the public information space one could encrypt it with a key that can only be decrypted at some point in the future.  This would function as digital time capsule, allowing scholarly access to sensitive materials at later presumably less sensitive date.

The idea is not without its problems (how to encypt on a time basis? how long to encrypt?), but is interesting nonetheless.

Posted by Tito Sierra | Jun 14 2006, 11:03:30 AM EDT | Permalink |

20060612 Monday June 12, 2006

More on Authority & Wikipedia

The topic is not new, but I appreciated this Jaron Lanier article and this editorial by Robert McHenry, which both criticize Wikipedia in terms of editorial authority and voice. Lanier lashes out at the "hive mentality" that he says drives Wikipedia and meta- aggregator websites. Ironically, I found Lanier's article through the Arts & Letters Daily aggregator site - a publication of the Chronicle of Higher Education.


I agree with much of what Lanier says in terms of Wikipedia authority and bias concerns, but I don't agree that multitudes of people are consciously flocking to Wikipedia because they trust and seek out hive-generated information sources. I think Wikipedia's traffic is mostly just an acknowledgement that there is too much information out there and people are trying to simplify their search process. The same with Google. It is much easier to have one place to search for things, one place to look up quick-answer questions. And as I surfed around Wikipedia, I also had to wonder how many of the entries were created entirely through Google searches...


The 'simplification of searching' is one service issue that reference and instruction librarians see daily. From a patron's perspective, why should they have to search in X database for articles on religion and Y database for articles on engineering or psychology books. The trade-off many patrons make, of course, is to use a much simpler search tool like Google and accept (often fewer) results with questionable authority. Very different from choosing Wikipedia because of it's community-authored nature.


[Disclaimer: I do have a library bias, but NCSU's new online catalog by Endeca and Google Scholar searching services take giant steps toward simplifying the research process for our patrons.] 

Posted by Joe Williams | Jun 12 2006, 06:34:03 PM EDT | Permalink | Comments [2]

20060608 Thursday June 08, 2006

ngc4lib and the localness of catalogs

A few days ago Eric Morgan spun off a new list, ngc4lib, from web4lib. This new list focuses on what a catalog is with the abbreviation
standing for Next Generation Catalogs for Libraries. Subscribe at
LISTSERV-AT-LISTSERV.ND-DOT-EDU. Already some robust discussion has ensued about
defining a catalog and whether such a category as a primary user exists. Some
of the members of the Horseless Library (Tito and myself) have subscribed. I
should add that Eric Morgan once upon a time worked here at NCSU though before my
time so I've never met him.


Sone of the questions being debated are whether a primary user exists for a
given catalog and what makes a catalog unique from other search tools. What is
getting lost in the discussion a bit is the word local. Much is being made of
comparisons to Amazon and other completely public and open tools with some
commentators stating that there is no such thing as a primary user. I disagree.



The easy thing is to state that there are primary local user communities
attached to libraries, be they faculty, staff, and students for an academic
setting like here or residents who reside in a given town for a public library.
If user groups outside of those primary groups benefit from a catalog, that is
nice, but it is certainly not essential. What I think commentators who say
there is no primary user are really arguing for is a type of interface design
that promotes ease of widespread usage - no one needs to be taught how to use
Amazon or shop Wal-Mart.


However, our situation in academic settings is a bit more complex. I argued
yesterday that a catalog can be defined not just by its searching ability, but
by an economic dimension as well. Catalogs delineate what is locally owned or
rented or payed for and hence what some sort of privileged user group actually
has access to once the discovery part of a catalog is done. It's important to
realize - and I forgot to say this on the list - that having access does not
fully equate to immediate availability. Nonetheless, one way I tend to think of
a catalog is proof of ownership which equals some measure of access rights
(without payment) and provides services too as compared to a tool like Amazon
which simply provides proof of publication while also providing some services.


Things may get messier still as catalogs begin incorporating functionality that
Amazon and other Web 2.0 enterprises already embody. In earlier posts, we
discussed reviews and Tito said

"Relating this to an earlier discussion on the ordering of reviews, if our local library catalog included both NCSU submitted reviews, and reviews from a shared pool of user contributed
content from other universities, would it make sense to bias the display of the NCSU submitted reviews over the shared reviews?  One can imagine a system that would gracefully degrade from local to global display.  Such a bias would be easy to build in, but would it be desirable?

This idea has potential relevance for other types of user contributed content such as book lists, tags, annotations, etc.  How important is the local institution in these contexts?"

So I have two questions: Just how local should a catalog be?
And just how important is the localness of a catalog as opposed to the
universality of an Amazon or WorldCat?

Posted by WARREN, SCOTT | Jun 08 2006, 05:30:57 PM EDT | Permalink | Comments [1]

20060602 Friday June 02, 2006

Microsoft research in search awards announced

You may have previously heard about the "Accelerating Search in Academic Research Awards" offered by Microsoft to further research in the search field.  Well, here are the first 12 winners of these awards.

Posted by Tito Sierra | Jun 02 2006, 04:45:10 PM EDT | Permalink |

More on reviews, specifically for libraries

John Blyberg, of the Ann Arbor District Library, recently posted some interesting comments on the idea of a shared repository of user contributed content (including reviews) for all libraries to use.

I?m all for it?but with some caution. Isn?t that what Amazon is now? If
you take away the e-commerce, Amazon is a collection of reviews, tags,
and ratings on an insanely large amount of material. Interesting?
Indeed. Useful? Of course. But I feel the need to point out that
libraries are community-based institutions. They are supported by local
taxpayers and are run, mostly, by members of the communities they
serve. As such, wouldn?t we want any social element that is
incorporated into our OPAC to reflect the tastes and opinions and
personality of our community?


I take his point to be that access to globally available content is nice, but local contributed content is more valuable.  This makes sense to me in the public library context.

Does it make in the academic library context?  Relating this to an earlier discussion on the ordering of reviews, if our local library catalog included both NCSU submitted reviews, and reviews from a shared pool of user contributed content from other universities, would it make sense to bias the display of the NCSU submitted reviews over the shared reviews?  One can imagine a system that would gracefully degrade from local to global display.  Such a bias would be easy to build in, but would it be desirable?

This idea has potential relevance for other types of user contributed content such as book lists, tags, annotations, etc.  How important is the local institution in these contexts?

Of related interest is Clay Shirky's idea of situated software.


Posted by Tito Sierra | Jun 02 2006, 04:27:41 PM EDT | Permalink | Comments [2]

20060601 Thursday June 01, 2006

What to do with a million books


From the Issues in Scholarly Communication blog at UIUC comes an announcement of a colloquium titled
What to Do With a Million Books. Proof, by the way, that a catchy title is worth a million . . . extremely valuable objects. From the call for papers:

Digitizing 'a million books' is not only a problem for computer scientists. Tomorrow, a million scholars will have to re-evaluate their notions of archive, textuality and materiality in the wake of these developments. Our familiar modes of scholarly edition, analysis, interpretation and publication are being challenged and transformed in a world where blogs and wikis are busy creating new knowledge and folksonomies are shaping our access to online archives.
Actually I thought a million scholars had already had to re-evaluate their notions of textuality. It always freaks me out a little that phenomenological theories of textuality (its fluidity, its collaborativeness, its lack of stable referentiality) pre-dated the digital revolution. Heidegger and Derrida and DeMan and Barthes and Foucault and those guys were writing about the text's indecidability and using the web metaphor of knowledge in the fifties, sixties, seventies, and early eighties -- way before anything went up online. It just seems so prescient.

Posted by Amanda French | Jun 01 2006, 06:52:09 PM EDT | Permalink | Comments [1]



Horseless Library image by Herman Berkhoff
Archives
Links