The Horseless Library
Digital Library Discussions
All | JT | General

20081217 Wednesday December 17, 2008

Revealing libraries

I wonder if library patrons would be interested in seeing a live stream of what kinds of research are going in the library? You've all seen streams of what terms people are putting into search engines, like this. What if somewhere on a library website we showed live streams of:

  • terms patrons are typing into the catalog
  • terms patrons are typing into library site searches
  • books checked out today
  • reference books re-shelved this week
  • Amazon-like "Statistically Improbable Phrases" pulled from chat and email question logs
  • other data we have?
There are enormous privacy problems here. I write this on Dec. 17. Imagine you could see all those things above today. We have very few patrons today. One could just about triangulate to figure out who checked out which books. I don't want to ask a librarian what I think is a dumb question and then see it broadcast on the web. Still... Any libraries doing anything like this?


Posted by BOYER, JOSH | Dec 17 2008, 03:59:08 PM EST | Permalink |

20080812 Tuesday August 12, 2008

Transformations in the Repository Stack

Yesterday Sandy Payette, Executive Director of Fedora Commons, spoke to the Red Island Repository Institute.  I twittered regarding a comment she made to the effect that content transformations should logically be moved farther down the repository stack.  Several people wrote to me to ask about it.  I'm afraid I have little more to add to her statement other than to point out that she offered Sun Honeycomb as evidence of the trend toward pushing those processes lower.  My understanding of her position is that it's logical to transform the data closer to where it lives rather than moving it out of storage and into the access application layer and then making the transformation.

Another interesting follow-on conversation involved the scope of Fedora and repositories in general.  Declan Fleming, Director of Information Technology at UC San Diego Libraries, questioned the broadness of scope and resulting complexity of Fedora.  This reminded me of a train of thought I'd been following prior to the RIRI- There's a great deal of overlap between the layers in a typical repository stack.  iRods,a storage layer, provides functionality covered by Fedora, the repository layer, which provides functionality provided by Fez, the access layer.  This is a simple example that got me thinking about where specific functions should occur.  Of course, these layers weren't designed solely to interact with one another and shouldn't be expected to integrate perfectly, but it's a useful exercise to consider the most logical location for processes. 

Posted by James Tuttle | Aug 12 2008, 08:50:25 AM EDT | Permalink | Comments [1]

20080506 Tuesday May 06, 2008

MySQL Remains Open Source

It's very encouraging to see that Sun does understand Open Source and has announced that MySQL will remain open. This is a follow-on from a couple of weeks ago when Sun broke the news that it would begin releasing features for MySQL Enterprise only available to paying customers.  The whole affair makes it even more significant that Postgres has been pretty adamant about not selling.  For a moment, it seemed as though Sun forgot how to build a reasonable community and business model around an Open Source product.  Let's hope they don't forget again.  Giving Oracle a run for it's money in the enterprise market can only benefit consumers.

Posted by James Tuttle | May 06 2008, 11:01:15 PM EDT | Permalink |

20080503 Saturday May 03, 2008

Cognitive Surplus

A brilliant meme from Clay Shirky at Web 2.0 Expo.  He answers the question of where people find the time for knowledge collection and dissemination a la Wikipedia.   He also does a pretty convincing job of explaining that the shift from purely consumptive media behavior to bidirectional engagement with media is a one way trip.

Posted by James Tuttle | May 03 2008, 11:02:58 PM EDT | Permalink | Comments [2]

20080501 Thursday May 01, 2008

Capturing Campus Data

I sat in on a session with a local Information Architecture consultant yesterday that rekindled my interest in capturing data generated by the university.  I asked how we might expose our deep web content without overwhelming users not interested in that content.  Really, though, the problem starts much farther back.

We have an excellent GIS collection in addition to numeric data with which I am not as familiar, but we haven't, to my knowledge, engaged campus data producers in any large-scale fashion.  Presently, it's unclear to me that we'd know what to do with the data regarding preservation, access, organization, and intellectual property concerns.  Still farther back, I don't know that we've thought about building business cases to sell the services and facilities the Libraries could offer data producers.  It seems likely that  data producers will need strong business cases to incentivize the submission process.  

In our experience with the National Digital Information and Preservation Program it's clear that preservation motivates some but is often little motivation for others and is frequently too remote an issue to induce real action.  Certainly, the more demanding the process is for data producers, the more motivation they would need.  Building business cases to address producer needs is an important step.  I think that diversifying the benefits is equally important.

In addition to preservation, access could be a key carrot to lure potential submitters.   Reading Peter Brantley's thoughts on design beyond the interface inspired some dreamy thoughts about hiding preservation motives behind shiny access interfaces.  While this might not work for Mathematics or Chemistry, it could work for Design, Architecture, Art, or others.  

See White Paper: Behind a Law School's Decision to Implement an Institutional Repository for an interesting read on building business cases for institutional repositories.

Posted by James Tuttle | May 01 2008, 11:01:51 PM EDT | Permalink | Comments [0]

20080411 Friday April 11, 2008

Webutation, or: How to save your reputation on the WWW

A few days ago, National Public Radio broadcasted an interesting clip about a new internet company called reputationdefender.com. One of their main products is named "myreputation" and offers to compile a comprehensive report of all information available through various web-based services: their website mentions social networks (e.g., MySpace, Facebook), professional review websites, blogs, online news sources, photo, audio and video sharing sites and "millions of additional sites on the 'open Internet'".

For an additional fee, the company offers to delete this data through its "proprietary DESTROY process," which, according to the NPR broadcast, is largely achieved through legal means, i.e., the company contacts the content provider and requests the deletion of the information.
What does this have to do with libraries? As librarians, it is not just our duty to provide our patrons with authoritative information on any topic they are interested in. Even more so, we should enable them to discover and critically evaluate information for themselves. And part of this evaluation process is not only being able to determine what information they should trust as consumers, but also as providers, meaning they should be able to decide what information they make publically available and what impact that information will have on their lives in the future. Instead of trusting services such as reputationdefender to clean up after them and remove images or public comments once those have become embarrassing (e.g., when applying for a job), they should become aware of the ubiquity of this information and judge whether that image from last week's party should really be publicly available.

Posted by Markus Wust | Apr 11 2008, 08:58:08 AM EDT | Permalink |

20070809 Thursday August 09, 2007

Interviews touching upon book data

There's an  interview today over at Inside Higher Ed that caught me eye. Scott McLemee, my favorite writer there (he mostly covers the 'big idea' intellectual side of academic news) has a short interview with Aaron Swartz of the Open Library project.
http://insidehighered.com/views/2007/08/08/mclemee

What struck me as interesting was hearing two people outside the regular library world discuss so many familiar issues in depth. Also worthy of note was that Swartz mentioned McLemee's interview with Franco Moretti  about a year and a half ago. I remember Amanda French mentioning Moretti somewhere; that's how I first found out about him though I've never gotten around to reading his work - though I really want to. What connects all these ideas are how they deal with manipulating data, albeit for different purposes, that surround, and perhaps inform, books.

Posted by WARREN, SCOTT | Aug 09 2007, 10:03:08 AM EDT | Permalink |

20070621 Thursday June 21, 2007

Recent readings tangential to digital libraries

Three articles I've read recently that seemed worth passing along.

1) Michael Jensen on how authority is constructed, conferred, and maintained in various online venues. Read this in the Chronicle of Higher Ed a few days back and then saw that the ACRLlog had picked it up too. No mention of libraries, but undergrads are rife for discussions of this sort on methodologies when it comes time to mention the dreaded phrase, "authoritative sources." He includes a lengthy guess list at what Authority 3.0 will be that strongly reminds me of Vernor Vinge's recent fiction.

2) An article by Kalevi Kilkki in First Monday on quantitatively analyzing Long Tail phenomena. Chris Anderson used mostly qualitative arguments and avoided the nitty gritty math really needed to fully analyze and predict distributions of this nature. Saw this on Lorcan Dempsey's blog a bit ago and finally got around to reading it. If you're interested in Long Tail behavior, this is required reading.

3) The New York Times Magazine just this past Sunday had a feature on the rather darker sweatshop side of online gaming. As usual for me and 2.0 stuff, I'm way more interested in the economics and social influences rippling out from the growth of online games than I am in the actual technologies involved or the games themselves (though a good question is where do you draw the line between those categories)

I'd love to hear what others have come across lately.

Scott


Posted by WARREN, SCOTT | Jun 21 2007, 04:11:26 PM EDT | Permalink |

20070613 Wednesday June 13, 2007

We all get to digitize books

Check out this story from CNN about the distorted word puzzles we all have to solve to register at Web sites.

"Instead of wasting time typing in random letters and numbers, Carnegie
Mellon researchers have come up with a way for people to type in
snippets of books to put their time to good use, confirm they're not
machines and help speed up the process of getting searchable texts
online."

The idea is that when a machine can't decipher words from a book, get all of us non-machines to do that work, bit by bit. How smart is that?

Posted by BOYER, JOSH | Jun 13 2007, 11:13:13 AM EDT | Permalink |

20070607 Thursday June 07, 2007

Photosynth

First, I want to thank Erik M. for sharing this amazing video with me.

Fast forward to the 3min mark to see a demo of this amazing software called Photosynth.

The basic idea is that the software takes a huge pile of ordinary photos about a subject (e.g. photos of Notre Dame from Flickr) and virtually stitches them together to form an interactive model of the subject.



This technology suggests a mind-boggling number of opportunities for understanding the world around us. 

Posted by Tito Sierra | Jun 07 2007, 05:47:43 PM EDT | Permalink | Comments [1]

20070531 Thursday May 31, 2007

Face morphing video

Digital libraries of the future, I hope, will be able to render visualizations like this on the fly using images in their collection.

Posted by Tito Sierra | May 31 2007, 03:17:27 PM EDT | Permalink | Comments [1]

20070518 Friday May 18, 2007

Holographic file storage

I remember reading about this probably seven or eight years ago: holographic storage. It seemed too Star Trek to be real. Now, apparently, it's about to go on the market, and libraries might be one of the first customers. See this <a href="http://hardware.slashdot.org/article.pl?sid=07/05/18/0546244">Slashdot article</a> for a little more.

Posted by Amanda French | May 18 2007, 08:39:04 AM EDT | Permalink |

20070516 Wednesday May 16, 2007

LibraryThings tags added to a public library catalog - some thoughts

OK, time to try to breathe some new life into this blog

Yesterday I read about LibraryThing's tagged data being included in the Danbury, CT Public Library's catalog. First, bravo to the Danbury Library for being gutsy enough to go first and experiment with this.

From what I've read of the commentary on NGC4Lib and elsewhere, the results aren't perfect, but they are pretty good. And hey, the results weren't perfect before from just the LC headings, so I believe that mostly this will help. I sometimes wonder, when people point out counterexamples for how searching fails in some particular case, did they actually expect it to work well always?

The emails tend to point out some weirdness in recommendations based on LT's social data, but that's exactly the crux of social data. Odd connections and preferences emerge, especially if the title has a small set of tag data - whether they are in fact helpful is harder to tell because helpfulness in the this context is bound to be so personal.

Years ago I worked part time at a public library as an assistant and one of the most challenging questions I was regularly asked was, "can you recommend a good book?"  I could tell you several, but the odds are unless you read and are exactly interested in very similar things as me, you may not like my recommendations. Or maybe you will, who knows? Being able to do that well was an art and it took time to really begin to know authors, genres, reading habits, etc and not focus on my own personal tastes. It wasn't easy and all this was before any social networking tags let alone Amazon reviews.

I looked at the Danbury catalog yesterday and put in the title I'm currently reading, Our Mutual Friend, by Charles Dickens. I guess LT's data worked well enough; several other Victorian novels recommended; though only one by Dickens weirdly enough. And the tags themselves made sense. But Dickens has a lot of tags - he's one of the top 20 LT authors by copies and even though OMF is not one of the most famous of Dickens' works, LT still has data on 687 copies (I salute all of those people for taking the time to tag this good book). When I actually clicked on one of the tags and went into the tag browser, I got a really nice reading list of 19th century fiction, all of which were at that library. Same thing for the tag, "Dickens."  And for "London." Nice lists, not random. It was easy to use as a browsing tool and something I think I would enjoy as a patron.

A lot of the power of LT and tags comes when there are enough of them for any given title or author to statistically overwhelm the odd tags and entries that are too idiosyncratic to be of much use to anyone except the tagger herself. I'm not sure how helpful this would be for a really obscure title that LT has little data on - so perhaps I'm wondering just how good tagging is at reasonably producing Long Tail recommendations. Tagging is something that, like Amazon reviews for a given title, while perhaps possessing a Long Tail distribution,  I believe only really displays its power in the much smaller set (of titles, of reviews that are tagged as helpful) that accounts for most of the data.

My conclusion after that last run-on Dickensian sentence is that adding LT's huge tag dataset could be a very interesting addition for most public libraries, especially given the price which is supposed to be very low. What's to lose?


Posted by WARREN, SCOTT | May 16 2007, 11:57:11 AM EDT | Permalink |

20070329 Thursday March 29, 2007

Blog rankings

Just now I had the extremely weird experience of looking something up with Google and getting as the very first result, ta-da! my own blog. It was a bit like dialing a random telephone number and getting my own voice mail.

Here's what happened. Someone wrote me an e-mail about a Petrarchan sonnet and used the word "octet" to refer to the first eight lines. I was pretty sure that this was inaccurate, and that the correct term is "octave," which people mix up with the term for the last six lines, "sestet."

But I wasn't quite sure, so I figured I'd check. I thought I remembered that Karen Ciccone told me that Google now allows truncation with asterisks, so I entered this search string:

petrarchan sonnet sestet oct*

I've now remembered that what Karen actually said was that you can use asterisks within quotation marks to search Google, so that, for instance, you can search "the * is too much with us" and find Wordsworth's famous sonnet. But you know how it is when you're Googling, especially if you're a fast typist -- it's easier to Google than it is to think.

So Google didn't recognize my asterisk as a truncation, and simply searched for "oct." Which, of course, is a well-known abbreviation for "October," so you can see why a blog that uses abbreviated months in its datestamps would rank high. And that helps explain why the first result I got for that search was the blog I created last semester for a graduate course in Victorian Poetry. I also wonder whether Google KNOWS IT'S ME (I was signed in at the time) and is giving me a top result from a site it has reason to believe I'd be interested in.

But it's still weird to me, just how findable the course blogs that I created last semester are, and how high their hit counts continue to be. My course blogs for ENG 560 and ENG 669 continue to make the "hot blogs" list on the front page of WolfBlogs, getting anywhere from 5 to 30 hits per day, usually. The Victorian Poetry blog, in particular, continues to keep its profile high even though no one's visiting the blog anymore. The course is over, so no one's posting or commenting, and no other site has linked to the blog.

So here's my question: Why are those basically defunct blogs still so "hot"? I've looked at the referrer logs, and while they do show lots of hits from Google and other search engines, they also list lots of "direct" (which can mean "unidentified") hits. Sometimes the "direct" hits are double the hits from search engines. Any explanations?

I did write once to Scott Warren, "I'd bet that the reason my course blogs continue to rank high in WolfBlogs is that both the assignments and especially the comments contain a whole lot of really specific, concrete, familiar terms, such as names of authors, poems, and books. Those kinds of terms are heavily searched, as Jakob Nielsen points out." If that's true, it would confirm the real importance of particular kinds of web writing in good web ranking. Blogs do, of course, rank high in Google results generally, but that doesn't explain why my old, unvisited blogs are often getting more hits than (say) Horseless Library.

Wouldn't it be cool if we could make a library website that appeared in the top results whenever information-illiterate students Googled their research topics? Imagine what would happen if all those dynamically-generated catalog pages were crawled.

By the way: That Google search didn't really do it for me, so I turned around in my chair, reached up on the shelf, pulled down my copy of The New Princeton Encyclopedia of Poetry and Poetics, and discovered that I was right. The correct term is "octave."

Posted by Amanda French | Mar 29 2007, 10:52:41 PM EDT | Permalink | Comments [1]

20070322 Thursday March 22, 2007

More provocative thoughts on the future of libraries

Peter Brantley shares some provocative thoughts on the future of libraries.

He quotes a friend:  "discovery has moved to the network layer and libraries should stop allocating their time and money trying to build better end-user UI, and concentrate instead on delivery".

He goes on say as "discovery services move to the network there is less reason why libraries should maintain duplicative local data caches."

Please read the full post to put these quotes in context.  

What I find interesting is that the library technology community now seems precisely focused on "trying to build better end-user UI" and trying to develop "local data caches" in response to the limitations of licensed access.  I don't hear much about delivery and fulfillment services at digital library or library technology conferences.  Thoughts?

Posted by Tito Sierra | Mar 22 2007, 03:41:32 PM EDT | Permalink | Comments [1]



Horseless Library image by Herman Berkhoff
Archives
Links