
Thursday June 01, 2006
What to do with a million books
From the Issues in Scholarly Communication blog at UIUC comes an announcement of a colloquium titled What to Do With a Million Books. Proof, by the way, that a catchy title is worth a million . . . extremely valuable objects. From the call for papers:
Digitizing 'a million books' is not only a problem for computer scientists. Tomorrow, a million scholars will have to re-evaluate their notions of archive, textuality and materiality in the wake of these developments. Our familiar modes of scholarly edition, analysis, interpretation and publication are being challenged and transformed in a world where blogs and wikis are busy creating new knowledge and folksonomies are shaping our access to online archives. Actually I thought a million scholars had already had to re-evaluate their notions of textuality. It always freaks me out a little that phenomenological theories of textuality (its fluidity, its collaborativeness, its lack of stable referentiality) pre-dated the digital revolution. Heidegger and Derrida and DeMan and Barthes and Foucault and those guys were writing about the text's indecidability and using the web metaphor of knowledge in the fifties, sixties, seventies, and early eighties -- way before anything went up online. It just seems so prescient.
Posted by Amanda French
| Jun 01 2006, 06:52:09 PM EDT
| Permalink
|
Market segmentation for digital library services?
Please excuse the marketing term in the title. Lately, I've been wondering whether there is value in applying the concept of "market segmentation" to universities in the context of digital library services. In other words, developing library services configured to the needs or wants of specific segments of the local university community. Higher education institutions offer plenty of available attributes for segmentation. For example:
Level: * Freshman/Sophomore * Junior/Senior * Graduate
Departments: * Nuclear Engineering * History * Zoology * ...
Then there is the Biglan classification that can be used to subdivide academic disciplines along three dimensions: * Hard vs. Soft * Pure vs. Applied * Life vs. Nonlife
Here is a chart that I created that applies the Biglan classfication to NCSU colleges:
 Are these useful distinctions from a library services point of view? Could these attributes serve as basis for developing targeted library portals? See "The recombinant library: portals and people" for more information on libraries and portals.
Posted by Tito Sierra
| Jun 01 2006, 11:39:44 AM EDT
| Permalink
|

Friday May 26, 2006
Amazon Reviews - Do user evaluations behave the opposite of the Long Tail?
Amazon does display user reviews of books in order of other
users' usefulness ratings and that mechanism is interesting for several reasons
(reviews move higher up the list based on the number of positive evaluations by
others). First, it's the same sort of technique used more or less by Google to
sort pages and by Web of Science to measure impact for journal articles.
It also distinctly creates an 'anti-Long Tail' situation. It appears that
rather quickly a few reviews are tagged as being useful by people and gain the
favored first page. The odds of these reviews ever losing this hallowed ground
becomes smaller and smaller as more people evaluate them ? especially if they
are positive. Who goes past the first page or two of reviews to look at, say
#100, when the top 2 or 3 or 5 are judged useful by dozens of people? There's
nothing inherently wrong with this phenomenon (in fact it is quite useful in
many ways), but it has me wondering at modeling this situation. Just how low is
the tipping point for viewers to tag a review as useful before it permanently
gains the favored viewing position? Reviews that are garbage (immature,
threats, profanity, etc.) get pushed out most likely, but perhaps so do
unorthodox, but legitimate and thoughtful positions.
Moreover, reviews that are late-comers to a list have a
disproportionate chance of never getting seen, no matter what their quality may
be. Once the leaders on a reviews list reach some number of reviews, it seems like
it would be very difficult to budge them off. Even if there is a separate
display for the most recent reviews, that display area is likely to offer very
ephemeral real estate and the question then becomes one of how quickly must a
new review attract enough attention to garner enough favorable evaluations to
move itself into a more favorable permanent position in the list of all reviews
for a given book? Given that the leading ?incumbent? reviews are also occupying
permanently visible ground, even a new reviews area seems to offer scant hope
for new reviews to become popular if the already popular reviews have enough of
an edge in the number of evaluations (but what that edge might be I don?t know).
And if a new review does not garner enough positive evaluations during its
short life-span as a new review, most likely this pattern damns it to permanent
obscurity in the longer list of reviews.
Now what does this all mean? Even as Amazon and other entities begin exploring
and exploiting the Long Tail, feedback mechanisms for items both in and out of
the Long Tail seem to some extent to rely on the central clustering of the
'Short Tail' or 'Short Hump or ?Big Hump' to work - and I find that intriguing. Scott
Posted by WARREN, SCOTT
| May 26 2006, 04:48:04 PM EDT
| Permalink
|

Thursday May 25, 2006
AmazonOnlineReader
Looks like Amazon.com is provided a new interface (very similar to Google Book Search) for viewing books online called AmazonOnlineReader (example). This is a dedicated page turner application, and does provide an easier to use interface for browsing their large page scans. Additional features include the ability to search for text in the book, and the ability to add highlights/notes/tags to sections of text.
Perhaps of greater significance is that Amazon is offering full text online access to already purchased books with a feature called "Amazon Upgrade". Apparently I only have two books in my account that are eligible for this service (despite having purchased dozens of books on Amazon). The reason is that they are getting publisher approval to put these books in the upgrade program. I think the two books in my account are out of print.
In one case the online access price is 1/3 Amazon's print copy retail price. In the other case Amazon does not sell the hardcover new directly so charging for the scanned copy of the book is a new non-cannabilized revenue source.
For those interested in the gory details, check out the Amazon Upgrade FAQ.
Hmmm, I wonder if/when Google will charge for online access to scanned library books. I suppose scanning the books now gives them the option to work out a deal with the publishers later.
Posted by Tito Sierra
| May 25 2006, 12:18:05 PM EDT
| Permalink
|

Tuesday May 23, 2006
My, how times have changed
Encyclopedia Britannica commercial from 1991.
Posted by Tito Sierra
| May 23 2006, 10:12:51 PM EDT
| Permalink
|

Wednesday May 17, 2006
Dual-paned search results
Not sure if it is a up-and-coming trend in search, but I've recently noticed that some web search tools are now using two panes to display search results--the left pane for the traditional search results list and the right pane for selected item preview. For example, check out the Windows Live Academic search for avian bird flu. In the Windows Live implementation the user can mouseover a search result item to see a summary metadata display for the item. Also check out the Snap.com search for avian bird flu. In the Snap.com implementation the user can click on the search result item to see a cached screen capture of the webpage result.
Posted by Tito Sierra
| May 17 2006, 04:50:06 PM EDT
| Permalink
|

Tuesday May 16, 2006
WeatherMole-- one more rockin' mashup
Web mapping mashups are everywhere...
The most recent, and potentially most useful, one I've come across is the WeatherMole. Combining data from NOAA and GoogleMaps, this tool allows you to click anywhere on the US map and get a 5-day forecast.
For more mashups checkout GoogleMapsMania, a blog devoted entirely to these cool tools.
Posted by James Jackson Sanborn
| May 16 2006, 11:17:04 AM EDT
| Permalink
|

Wednesday May 10, 2006
Exposing user search patterns (Google Trends)
Google Labs have launched another innovative search-related product, Google Trends. Here they are providing a search interface to generate graphs comparing the search volume for different search keywords. This is similar to the information Google already provides in the Google Zeitgeist, with the crucial distinction that Trends is interactive and allows you to explore (and compare) usage patterns for specific search terms of interest. In other words, you can compare the relative popularity of search terms in the long tail.
For example, compare search trends for these popular scripting languages in the US. This example is necessarily imprecise because "python" refers to much more than the programming language (clearly illustrated by the spike caused by the python eating alligator incident of Fall 2005). A couple things we see in this example is noticeably higher volume of searches for "php" compared to these other scripting languages (this is more pronounced when you switch the view to worldwide), and a discernable decline in searches for "perl" in the last two years. Unfortunately there is no scale provided, so it is unclear how big these differences are.
What is interesting about this for me is that they are providing an interface for exploring usage of their search tool. Showing the top searches is nice, but Trends allow for deeper drill down. Of course Google can do this because they have a massive collection of user data to draw upon. But in the academic community I think we should think about collecting and exposing aggregated usage data (anonomously of course) of our digital library tools to let scholars explore what other scholars are interested in. Apart from citation analysis, I wonder what other tools exist for type of metascholarship exploration.
Posted by Tito Sierra
| May 10 2006, 05:37:18 PM EDT
| Permalink
|

Friday May 05, 2006
More text mining: The nora project
I learned about the nora project the other day; it's a Mellon-funded initiative "to produce software for discovering, visualizing, and exploring significant patterns across large collections of full-text humanities resources in existing digital libraries. . . . Over the last decade, many millions of dollars have been invested in creating digital library collections: at this point, terabytes of full-text humanities resources are publicly available on the web. Those collections, dispersed across many different institutions, are large enough and rich enough to provide an excellent opportunity for text-mining, and we believe that web-based text-mining tools will make those collections significantly more useful, more informative, and more rewarding for research and teaching."
This signals an interesting new phase both for digital libraries and humanities computing--and indeed, the humanities more generally. We're at the point now where enough primary sources are available online that some people are thinking, Okay, what can we do with all this online text--besides search it? Should be interesting to see what the brilliant John Unsworth, Steve Ramsay, and Matt Kirschenbaum (all former UVA-ites, like me) and their partners come up with. Still, there remains a lot of text, even public-domain text, that needs to go up online--19th-century periodicals, anyone? unpublished materials in Special Collections?--and that work is at least as important now as it's ever been. And, of course, there are huge caches of audio and video that need to be digitized; I wouldn't want anyone to abandon the digitization projects just yet. But I'm sure they won't, and in fact, it might wind up being a non-vicious circle: if nora produces a really terrific new tool, it might spur yet more digitization initiatives.
The nora project also seems to me to be part of a slight trend in the discipline of English that I've dubbed "the New Empiricism," which is probably largely related to the technological Zeitgeist. Perhaps the best recent example of this trend is Franco Moretti's 2005 book Graphs, Maps, Trees: Abstract Models for A Literary History. Moretti takes an unheard-of step in literary studies by collecting data and presenting it in visual form . . . I thought the graphs were the most compelling; the maps and the trees I could have taken or left. For one thing, they were less data-driven. I suppose there's also a slight trend developing in English toward interpretation-through-visuals: call it Graphicism, maybe. There's bibliographic software that will analyze your bibliographies visually, too.
See literary blog The Valve for a discussion of Graphs, Maps, Trees.
Posted by Amanda French
| May 05 2006, 11:48:31 AM EDT
| Permalink
|
Open Text Mining Interface (OTMI)
Recently posted on Nature's blog is a proposal and demostration of an Open Text Mining Interface. This is an early proposal for making journal content available specifically for text mining purposes. The demonstration file contains summary information for a recently published Nature journal article using the Atom XML format with some OTMI-specific extensions. Right now these extensions include word frequency counts and text snippets from the source article.
Text mining researchers will find this to be of limited value in its current form. Value-added information such as word frequency counts and text snippets are nice, but even better would be the full-text. But exposing the full-text is not economically viable for subscription-based content, hence the compromise.
The motivation behind this initiative is excellent. Kudos to Nature for putting this out there. Subscription-based scholarship is not going away anytime soon. The use of a widely adopted format for exposing information from journal articles for textual analysis and research could open the door to new ways of discovering and interpreting scholarship in different fields. It will be interesting to see how (if?) OTMI matures in the next year or two.
Posted by Tito Sierra
| May 05 2006, 10:37:11 AM EDT
| Permalink
|
Search terms as tag clouds
A colleague recently tipped me off to Swicki. Rather than explain how this "build your own community search engine" service works, I will refer to their FAQ. What I do want to discuss is what they call a "buzzcloud". This is a Swicki search engine feature that presents popular user submitted search terms in a tag cloud display under the search box. The idea to provide users with search topic suggestions that may be of interest.
Flickr popularized tags clouds based on people tagging their own photos. The Swicki buzzcloud is a little different because it is based on community submitted search terms around a specific topic. For example, check out the Digital Preservation Search Engine. The Swicki system requires search engine administrators to manually add submitted search terms to their buzzcloud, rather than automatically display the top search terms. I suppose this is to prevent a search spammer from entering irrelevant search terms to direct traffic to their commercial website. Nonetheless, this seems like a clever way to build up topical vocabularies on just about any subject (assuming people are interested and use your Swicki search engine).
I wonder how useful search term tag clouds would be for digital library applications. A basic requirement seems to be topical focus. If the search tool is topically generic or agnostic, such as an OPAC, then the tag cloud would be too generic to be useful. But for topically specific digital libraries these tag clouds could be interesting. Even if the actual tags themselves do not describe what a user is looking for right now, the tag cloud itself would provide a reflection of current community interest, and perhaps provide ideas for topics the user should search for.
Posted by Tito Sierra
| May 05 2006, 08:45:26 AM EDT
| Permalink
|

Tuesday May 02, 2006
SIPs for fun (and profit?)
I have been enjoying Amazon.com's SIPs (Statistically Improbably Phrases) feature. You will see this on Amazon.com book pages for books they have scanned as part of their "Search Inside" program. For example, take a look at the SIPs for Balzac's classic novel Lost Illusions. In this case we may deduce that money plays an important role in this novel, which is correct. Each of these SIPs are hyperlinks, enabling me to see other books that discuss provincial poets and nankeen trousers.
Amazon.com builds on SIPs with a newer feature called Books on Related Topics. The idea is to link books based on shared SIPs. As full text book scanning becomes commonplace we should see more features like this that exploit the textual contents of the books to expose non-obvious relationships between different books and authors.
It is odd to first see this type of feature on an e-commerce website. I don't expect this is contributing to greater profits on Amazon.com's part, but who knows. I would love to see a dedicated SIPs interface. If they ever add SIPs to AWS (Amazon Web Services) maybe I will build one myself.
Posted by Tito Sierra
| May 02 2006, 11:06:55 PM EDT
| Permalink
|
(Inter)Network Neutrality...
There's an interesting article about network neutrality over at Slate.com. The debate boils down to this: currently an ISP can't prefer packets from, say, Yahoo! to those from AOL, even if they are partners of some sort with the former. The proposed changes would weaken current rules, and might allow an ISP to accept payment for allowing one companies content to get priority, and therefore stream faster.
Bringing this closer to home, it will be interesting to see how colleges and universities deal with these changes if they go into effect. Would they maintain their principles of open access to information, or would they enter into agreements like they currently do with sneaker companies and soda vendors? I expect that most would maintain neutrality in the networks that they control-- but would this mean that only those with access to university or government networks truly have unfettered access to information? Interesting to ponder...
Posted by James Jackson Sanborn
| May 02 2006, 10:19:48 AM EDT
| Permalink
|

Monday May 01, 2006
Yiddish Books online
I read a fascinating book last year called Outwitting History: The Amazing Adventures of a Man Who Rescued a Million Yiddish Books. The book chronicles the origins and early operations of what is now The National Yiddish Book Center.
The book's subtitle doesn't exaggerate: Aaron Lansky, recipient of a MacArthur "genius" award, really did have some amazing adventures in the course of finding and collecting Yiddish books. One of the least interesting things from a dramatic perspective that might nevertheless interest people involved in building digital libraries is that the National Yiddish Book Center has what looks like a very sleek print-on-demand operation.
For instance, if you'd like to see a Yiddish translation of Shakespeare's sonnets published in 1944, you might try ordering it through bookfinder.com or a rare books dealer, but you're going to have a tough time finding it, if you ever do, and it'll be expensive and fragile. If you have academic library privileges, you can order the microfilm from the single library that owns it (New York Public) or borrow it via ILL from one of the 13 libraries in the world that owns a copy. Or, which is much quicker and simpler, you can pay the Yiddish Book Center $48 plus shipping for your own brand-new edition.
Posted by Amanda French
| May 01 2006, 06:07:11 PM EDT
| Permalink
|
Lists
Call me crazy, but I love dense lists like this.
Posted by Tito Sierra
| May 01 2006, 05:18:34 PM EDT
| Permalink
|
|
|

Horseless Library image by Herman Berkhoff
|
| Archives |
|
|
| « June 2006 » | | Sun | Mon | Tue | Wed | Thu | Fri | Sat |
|---|
| | | | | | | 3 | 4 | 5 | 6 | 7 | | 9 | 10 | 11 | | 13 | | | | 17 | 18 | 19 | 20 | 21 | | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | | | | | | | | | | | Today |
|
|
|
|
|
|
| Links |
|
|
|
|
|
|
|
|