The Horseless Library
Digital Library Discussions
All | JT | General

20060608 Thursday June 08, 2006

ngc4lib and the localness of catalogs

A few days ago Eric Morgan spun off a new list, ngc4lib, from web4lib. This new list focuses on what a catalog is with the abbreviation
standing for Next Generation Catalogs for Libraries. Subscribe at
LISTSERV-AT-LISTSERV.ND-DOT-EDU. Already some robust discussion has ensued about
defining a catalog and whether such a category as a primary user exists. Some
of the members of the Horseless Library (Tito and myself) have subscribed. I
should add that Eric Morgan once upon a time worked here at NCSU though before my
time so I've never met him.


Sone of the questions being debated are whether a primary user exists for a
given catalog and what makes a catalog unique from other search tools. What is
getting lost in the discussion a bit is the word local. Much is being made of
comparisons to Amazon and other completely public and open tools with some
commentators stating that there is no such thing as a primary user. I disagree.



The easy thing is to state that there are primary local user communities
attached to libraries, be they faculty, staff, and students for an academic
setting like here or residents who reside in a given town for a public library.
If user groups outside of those primary groups benefit from a catalog, that is
nice, but it is certainly not essential. What I think commentators who say
there is no primary user are really arguing for is a type of interface design
that promotes ease of widespread usage - no one needs to be taught how to use
Amazon or shop Wal-Mart.


However, our situation in academic settings is a bit more complex. I argued
yesterday that a catalog can be defined not just by its searching ability, but
by an economic dimension as well. Catalogs delineate what is locally owned or
rented or payed for and hence what some sort of privileged user group actually
has access to once the discovery part of a catalog is done. It's important to
realize - and I forgot to say this on the list - that having access does not
fully equate to immediate availability. Nonetheless, one way I tend to think of
a catalog is proof of ownership which equals some measure of access rights
(without payment) and provides services too as compared to a tool like Amazon
which simply provides proof of publication while also providing some services.


Things may get messier still as catalogs begin incorporating functionality that
Amazon and other Web 2.0 enterprises already embody. In earlier posts, we
discussed reviews and Tito said

"Relating this to an earlier discussion on the ordering of reviews, if our local library catalog included both NCSU submitted reviews, and reviews from a shared pool of user contributed
content from other universities, would it make sense to bias the display of the NCSU submitted reviews over the shared reviews?  One can imagine a system that would gracefully degrade from local to global display.  Such a bias would be easy to build in, but would it be desirable?

This idea has potential relevance for other types of user contributed content such as book lists, tags, annotations, etc.  How important is the local institution in these contexts?"

So I have two questions: Just how local should a catalog be?
And just how important is the localness of a catalog as opposed to the
universality of an Amazon or WorldCat?

Posted by WARREN, SCOTT | Jun 08 2006, 05:30:57 PM EDT | Permalink | Comments [1]

20060602 Friday June 02, 2006

Microsoft research in search awards announced

You may have previously heard about the "Accelerating Search in Academic Research Awards" offered by Microsoft to further research in the search field.  Well, here are the first 12 winners of these awards.

Posted by Tito Sierra | Jun 02 2006, 04:45:10 PM EDT | Permalink |

More on reviews, specifically for libraries

John Blyberg, of the Ann Arbor District Library, recently posted some interesting comments on the idea of a shared repository of user contributed content (including reviews) for all libraries to use.

I?m all for it?but with some caution. Isn?t that what Amazon is now? If
you take away the e-commerce, Amazon is a collection of reviews, tags,
and ratings on an insanely large amount of material. Interesting?
Indeed. Useful? Of course. But I feel the need to point out that
libraries are community-based institutions. They are supported by local
taxpayers and are run, mostly, by members of the communities they
serve. As such, wouldn?t we want any social element that is
incorporated into our OPAC to reflect the tastes and opinions and
personality of our community?


I take his point to be that access to globally available content is nice, but local contributed content is more valuable.  This makes sense to me in the public library context.

Does it make in the academic library context?  Relating this to an earlier discussion on the ordering of reviews, if our local library catalog included both NCSU submitted reviews, and reviews from a shared pool of user contributed content from other universities, would it make sense to bias the display of the NCSU submitted reviews over the shared reviews?  One can imagine a system that would gracefully degrade from local to global display.  Such a bias would be easy to build in, but would it be desirable?

This idea has potential relevance for other types of user contributed content such as book lists, tags, annotations, etc.  How important is the local institution in these contexts?

Of related interest is Clay Shirky's idea of situated software.


Posted by Tito Sierra | Jun 02 2006, 04:27:41 PM EDT | Permalink | Comments [2]

20060601 Thursday June 01, 2006

What to do with a million books


From the Issues in Scholarly Communication blog at UIUC comes an announcement of a colloquium titled
What to Do With a Million Books. Proof, by the way, that a catchy title is worth a million . . . extremely valuable objects. From the call for papers:

Digitizing 'a million books' is not only a problem for computer scientists. Tomorrow, a million scholars will have to re-evaluate their notions of archive, textuality and materiality in the wake of these developments. Our familiar modes of scholarly edition, analysis, interpretation and publication are being challenged and transformed in a world where blogs and wikis are busy creating new knowledge and folksonomies are shaping our access to online archives.
Actually I thought a million scholars had already had to re-evaluate their notions of textuality. It always freaks me out a little that phenomenological theories of textuality (its fluidity, its collaborativeness, its lack of stable referentiality) pre-dated the digital revolution. Heidegger and Derrida and DeMan and Barthes and Foucault and those guys were writing about the text's indecidability and using the web metaphor of knowledge in the fifties, sixties, seventies, and early eighties -- way before anything went up online. It just seems so prescient.

Posted by Amanda French | Jun 01 2006, 06:52:09 PM EDT | Permalink | Comments [1]

Market segmentation for digital library services?

Please excuse the marketing term in the title. Lately, I've been wondering whether there is value in applying the concept of "market segmentation" to universities in the context of digital library services. In other words, developing library services configured to the needs or wants of specific segments of the local university community. Higher education institutions offer plenty of available attributes for segmentation. For example:

Level:
* Freshman/Sophomore
* Junior/Senior
* Graduate

Departments:
* Nuclear Engineering
* History
* Zoology
* ...

Then there is the Biglan classification that can be used to subdivide academic disciplines along three dimensions:
* Hard vs. Soft
* Pure vs. Applied
* Life vs. Nonlife

Here is a chart that I created that applies the Biglan classfication to NCSU colleges:

Are these useful distinctions from a library services point of view?  Could these attributes serve as basis for developing targeted library portals?  See "The recombinant library: portals and people" for more information on libraries and portals.

Posted by Tito Sierra | Jun 01 2006, 11:39:44 AM EDT | Permalink | Comments [3]

20060526 Friday May 26, 2006

Amazon Reviews - Do user evaluations behave the opposite of the Long Tail?

Amazon does display user reviews of books in order of other
users' usefulness ratings and that mechanism is interesting for several reasons
(reviews move higher up the list based on the number of positive evaluations by
others). First, it's the same sort of technique used more or less by Google to
sort pages and by Web of Science to measure impact for journal articles.


It also distinctly creates an 'anti-Long Tail' situation. It appears that
rather quickly a few reviews are tagged as being useful by people and gain the
favored first page. The odds of these reviews ever losing this hallowed ground
becomes smaller and smaller as more people evaluate them ? especially if they
are positive. Who goes past the first page or two of reviews to look at, say
#100, when the top 2 or 3 or 5 are judged useful by dozens of people? There's
nothing inherently wrong with this phenomenon (in fact it is quite useful in
many ways), but it has me wondering at modeling this situation. Just how low is
the tipping point for viewers to tag a review as useful before it permanently
gains the favored viewing position? Reviews that are garbage (immature,
threats, profanity, etc.) get pushed out most likely, but perhaps so do
unorthodox, but legitimate and thoughtful positions.

 

Moreover, reviews that are late-comers to a list have a
disproportionate chance of never getting seen, no matter what their quality may
be. Once the leaders on a reviews list reach some number of reviews, it seems like
it would be very difficult to budge them off. Even if there is a separate
display for the most recent reviews, that display area is likely to offer very
ephemeral real estate and the question then becomes one of how quickly must a
new review attract enough attention to garner enough favorable evaluations to
move itself into a more favorable permanent position in the list of all reviews
for a given book? Given that the leading ?incumbent? reviews are also occupying
permanently visible ground, even a new reviews area seems to offer scant hope
for new reviews to become popular if the already popular reviews have enough of
an edge in the number of evaluations (but what that edge might be I don?t know).
And if a new review does not garner enough positive evaluations during its
short life-span as a new review, most likely this pattern damns it to permanent
obscurity in the longer list of reviews.


Now what does this all mean? Even as Amazon and other entities begin exploring
and exploiting the Long Tail, feedback mechanisms for items both in and out of
the Long Tail seem to some extent to rely on the central clustering of the
'Short Tail' or 'Short Hump or ?Big Hump' to work - and I find that intriguing.

Scott




Posted by WARREN, SCOTT | May 26 2006, 04:48:04 PM EDT | Permalink | Comments [5]

20060525 Thursday May 25, 2006

AmazonOnlineReader

Looks like Amazon.com is provided a new interface (very similar to Google Book Search) for viewing books online called AmazonOnlineReader (example).  This is a dedicated page turner application, and does provide an easier to use interface for browsing their large page scans.  Additional features include the ability to search for text in the book, and the ability to add highlights/notes/tags to sections of text.

Perhaps of greater significance is that Amazon is offering full text online access to already purchased books with a feature called "Amazon Upgrade".  Apparently I only have two books in my account that are eligible for this service (despite having purchased dozens of books on Amazon).  The reason is that they are getting publisher approval to put these books in the upgrade program.  I think the two books in my account are out of print.

In one case the online access price is 1/3 Amazon's print copy retail price.  In the other case Amazon does not sell the hardcover new directly so charging for the scanned copy of the book is a new non-cannabilized revenue source. 

For those interested in the gory details, check out the Amazon Upgrade FAQ.

Hmmm, I wonder if/when Google will charge for online access to scanned library books.  I suppose scanning the books now gives them the option to work out a deal with the publishers later.


Posted by Tito Sierra | May 25 2006, 12:18:05 PM EDT | Permalink |

20060523 Tuesday May 23, 2006

My, how times have changed

Encyclopedia Britannica commercial from 1991.

Posted by Tito Sierra | May 23 2006, 10:12:51 PM EDT | Permalink | Comments [1]

20060517 Wednesday May 17, 2006

Dual-paned search results

Not sure if it is a up-and-coming trend in search, but I've recently noticed that some web search tools are now using two panes to display search results--the left pane for the traditional search results list and the right pane for selected item preview.  For example, check out the Windows Live Academic search for avian bird flu.  In the Windows Live implementation the user can mouseover a search result item to see a summary metadata display for the item.  Also check out the Snap.com search for avian bird flu.  In the Snap.com implementation the user can click on the search result item to see a cached screen capture of the webpage result. 


Posted by Tito Sierra | May 17 2006, 04:50:06 PM EDT | Permalink | Comments [1]

20060516 Tuesday May 16, 2006

WeatherMole-- one more rockin' mashup

Web mapping mashups are everywhere...

The most recent, and potentially most useful, one I've come across is the WeatherMole
Combining data from NOAA and GoogleMaps, this tool allows you to click anywhere on the US map and get a 5-day forecast. 

For more mashups checkout GoogleMapsMania, a blog devoted entirely to these cool tools.

Posted by James Jackson Sanborn | May 16 2006, 11:17:04 AM EDT | Permalink |

20060510 Wednesday May 10, 2006

Exposing user search patterns (Google Trends)

Google Labs have launched another innovative search-related product, Google Trends.  Here they are providing a search interface to generate graphs comparing the search volume for different search keywords.   This is similar to the information Google already provides in the Google Zeitgeist, with the crucial distinction that Trends is interactive and allows you to explore (and compare) usage patterns for specific search terms of interest.  In other words, you can compare the relative popularity of search terms in the long tail.

For example, compare search trends for these popular scripting languages in the US.  This example is necessarily imprecise because "python" refers to much more than the programming language (clearly illustrated by the spike caused by the python eating alligator incident of Fall 2005).  A couple things we see in this example is noticeably higher volume of searches for "php" compared to these other scripting languages (this is more pronounced when you switch the view to worldwide), and a discernable decline in searches for "perl" in the last two years.  Unfortunately there is no scale provided, so it is unclear how big these differences are.

What is interesting about this for me is that they are providing an interface for exploring usage of their search tool.  Showing the top searches is nice, but Trends allow for deeper drill down.  Of course Google can do this because they have a massive collection of user data to draw upon.  But in the academic community I think we should think about collecting and exposing aggregated usage data (anonomously of course) of our digital library tools to let scholars explore what other scholars are interested in.  Apart from citation analysis, I wonder what other tools exist for type of metascholarship exploration.

Posted by Tito Sierra | May 10 2006, 05:37:18 PM EDT | Permalink |

20060505 Friday May 05, 2006

More text mining: The nora project

I learned about the nora project the other day; it's a Mellon-funded initiative "to produce software for discovering, visualizing, and exploring significant patterns across large collections of full-text humanities resources in existing digital libraries. . . . Over the last decade, many millions of dollars have been invested in creating digital library collections: at this point, terabytes of full-text humanities resources are publicly available on the web. Those collections, dispersed across many different institutions, are large enough and rich enough to provide an excellent opportunity for text-mining, and we believe that web-based text-mining tools will make those collections significantly more useful, more informative, and more rewarding for research and teaching."

This signals an interesting new phase both for digital libraries and humanities computing--and indeed, the humanities more generally. We're at the point now where enough primary sources are available online that some people are thinking, Okay, what can we do with all this online text--besides search it? Should be interesting to see what the brilliant John Unsworth, Steve Ramsay, and Matt Kirschenbaum (all former UVA-ites, like me) and their partners come up with. Still, there remains a lot of text, even public-domain text, that needs to go up online--19th-century periodicals, anyone?  unpublished materials in Special Collections?--and that work is at least as important now as it's ever been. And, of course, there are huge caches of audio and video that need to be digitized; I wouldn't want anyone to abandon the digitization projects just yet. But I'm sure they won't, and in fact, it might wind up being a non-vicious circle: if nora produces a really terrific new tool, it might spur yet more digitization initiatives.

The nora project also seems to me to be part of a slight trend in the discipline of English that I've dubbed "the New Empiricism," which is probably largely related to the technological Zeitgeist. Perhaps the best recent example of this trend is Franco Moretti's 2005 book Graphs, Maps, Trees: Abstract Models for A Literary History. Moretti takes an unheard-of step in literary studies by collecting data and presenting it in visual form . . . I thought the graphs were the most compelling; the maps and the trees I could have taken or left. For one thing, they were less data-driven. I suppose there's also a slight trend developing in English toward interpretation-through-visuals: call it Graphicism, maybe. There's bibliographic software that will analyze your bibliographies visually, too.

See literary blog The Valve for a discussion of Graphs, Maps, Trees.

Posted by Amanda French | May 05 2006, 11:48:31 AM EDT | Permalink |

Open Text Mining Interface (OTMI)

Recently posted on Nature's blog is a proposal and demostration of an Open Text Mining Interface. This is an early proposal for making journal content available specifically for text mining purposes.  The demonstration file contains summary information for a recently published Nature journal article using the Atom XML format with some OTMI-specific extensions.  Right now these extensions include word frequency counts and text snippets from the source article.

Text mining researchers will find this to be of limited value in its current form.  Value-added information such as word frequency counts and text snippets are nice, but even better would be the full-text.  But exposing the full-text is not economically viable for subscription-based content, hence the compromise.

The motivation behind this initiative is excellent.  Kudos to Nature for putting this out there.  Subscription-based scholarship is not going away anytime soon.  The use of a widely adopted format for exposing information from journal articles for textual analysis and research could open the door to new ways of discovering and interpreting scholarship in different fields.  It will be interesting to see how (if?) OTMI matures in the next year or two. 

Posted by Tito Sierra | May 05 2006, 10:37:11 AM EDT | Permalink |

Search terms as tag clouds

A colleague recently tipped me off to Swicki.  Rather than explain how this "build your own community search engine" service works, I will refer to their FAQ. What I do want to discuss is what they call a "buzzcloud". This is a Swicki search engine feature that presents popular user submitted search terms in a tag cloud display under the search box. The idea to provide users with search topic suggestions that may be of interest.

Flickr popularized tags clouds based on people tagging their own photos.  The Swicki buzzcloud is a little different because it is based on community submitted search terms around a specific topic.  For example, check out the Digital Preservation Search Engine.  The Swicki system requires search engine administrators to manually add submitted search terms to their buzzcloud, rather than automatically display the top search terms.  I suppose this is to prevent a search spammer from entering irrelevant search terms to direct traffic to their commercial website.  Nonetheless, this seems like a clever way to build up topical vocabularies on just about any subject (assuming people are interested and use your Swicki search engine).

I wonder how useful search term tag clouds would be for digital library applications. A basic requirement seems to be topical focus. If the search tool is topically generic or agnostic, such as an OPAC, then the tag cloud would be too generic to be useful. But for topically specific digital libraries these tag clouds could be interesting.  Even if the actual tags themselves do not describe what a user is looking for right now, the tag cloud itself would provide a reflection of current community interest, and perhaps provide ideas for topics the user should search for.

Posted by Tito Sierra | May 05 2006, 08:45:26 AM EDT | Permalink | Comments [1]

20060502 Tuesday May 02, 2006

SIPs for fun (and profit?)

I have been enjoying Amazon.com's SIPs (Statistically Improbably Phrases) feature. You will see this on Amazon.com book pages for books they have scanned as part of their "Search Inside" program. For example, take a look at the SIPs for Balzac's classic novel Lost Illusions.  In this case we may deduce that money plays an important role in this novel, which is correct.  Each of these SIPs are hyperlinks, enabling me to see other books that discuss provincial poets and nankeen trousers

Amazon.com builds on SIPs with a newer feature called Books on Related Topics.  The idea is to link books based on shared SIPs.  As full text book scanning becomes commonplace we should see more features like this that exploit the textual contents of the books to expose non-obvious relationships between different books and authors.

It is odd to first see this type of feature on an e-commerce website.  I don't expect this is contributing to greater profits on Amazon.com's part, but who knows.  I would love to see a dedicated SIPs interface.  If they ever add SIPs to AWS (Amazon Web Services) maybe I will build one myself.

Posted by Tito Sierra | May 02 2006, 11:06:55 PM EDT | Permalink |



Horseless Library image by Herman Berkhoff
Archives
Links