Three articles I've read recently that seemed worth passing along.
1) Michael Jensen on how authority is constructed, conferred, and maintained in various online venues. Read this in the Chronicle of Higher Ed a few days back and then saw that the ACRLlog had picked it up too. No mention of libraries, but undergrads are rife for discussions of this sort on methodologies when it comes time to mention the dreaded phrase, "authoritative sources." He includes a lengthy guess list at what Authority 3.0 will be that strongly reminds me of Vernor Vinge's recent fiction.
2) An article by Kalevi Kilkki in First Monday on quantitatively analyzing Long Tail phenomena. Chris Anderson used mostly qualitative arguments and avoided the nitty gritty math really needed to fully analyze and predict distributions of this nature. Saw this on Lorcan Dempsey's blog a bit ago and finally got around to reading it. If you're interested in Long Tail behavior, this is required reading.
3) The New York Times Magazine just this past Sunday had a feature on the rather darker sweatshop side of online gaming. As usual for me and 2.0 stuff, I'm way more interested in the economics and social influences rippling out from the growth of online games than I am in the actual technologies involved or the games themselves (though a good question is where do you draw the line between those categories)
I'd love to hear what others have come across lately.
Scott
Posted by WARREN, SCOTT
| Jun 21 2007, 04:11:26 PM EDT | Permalink
|
Wednesday June 13, 2007
We all get to digitize books
Check out this story from CNN about the distorted word puzzles we all have to solve to register at Web sites.
"Instead of wasting time typing in random letters and numbers, Carnegie
Mellon researchers have come up with a way for people to type in
snippets of books to put their time to good use, confirm they're not
machines and help speed up the process of getting searchable texts
online."
The idea is that when a machine can't decipher words from a book, get all of us non-machines to do that work, bit by bit. How smart is that?
Posted by BOYER, JOSH
| Jun 13 2007, 11:13:13 AM EDT | Permalink
|
Thursday June 07, 2007
Photosynth
First, I want to thank Erik M. for sharing this amazing video with me.
Fast forward to the 3min mark to see a demo of this amazing software called Photosynth.
The basic idea is that the software takes a huge pile of ordinary photos about a subject (e.g. photos of Notre Dame from Flickr) and virtually stitches them together to form an interactive model of the subject.
I remember reading about this probably seven or eight years ago: holographic storage. It seemed too Star Trek to be real. Now, apparently, it's about to go on the market, and libraries might be one of the first customers. See this Slashdot article for a little more.
Posted by Amanda French
| May 18 2007, 08:39:04 AM EDT | Permalink
|
Wednesday May 16, 2007
LibraryThings tags added to a public library catalog - some thoughts
OK, time to try to breathe some new life into this blog
Yesterday I read about LibraryThing's tagged data being included in the Danbury, CT Public Library's catalog. First, bravo to the Danbury Library for being gutsy enough to go first and experiment with this.
From what I've read of the commentary on NGC4Lib and elsewhere, the results aren't perfect, but they are pretty good. And hey, the results weren't perfect before from just the LC headings, so I believe that mostly this will help. I sometimes wonder, when people point out counterexamples for how searching fails in some particular case, did they actually expect it to work well always?
The emails tend to point out some weirdness in recommendations based on LT's social data, but that's exactly the crux of social data. Odd connections and preferences emerge, especially if the title has a small set of tag data - whether they are in fact helpful is harder to tell because helpfulness in the this context is bound to be so personal.
Years ago I worked part time at a public library as an assistant and one of the most challenging questions I was regularly asked was, "can you recommend a good book?" I could tell you several, but the odds are unless you read and are exactly interested in very similar things as me, you may not like my recommendations. Or maybe you will, who knows? Being able to do that well was an art and it took time to really begin to know authors, genres, reading habits, etc and not focus on my own personal tastes. It wasn't easy and all this was before any social networking tags let alone Amazon reviews.
I looked at the Danbury catalog yesterday and put in the title I'm currently reading, Our Mutual Friend, by Charles Dickens. I guess LT's data worked well enough; several other Victorian novels recommended; though only one by Dickens weirdly enough. And the tags themselves made sense. But Dickens has a lot of tags - he's one of the top 20 LT authors by copies and even though OMF is not one of the most famous of Dickens' works, LT still has data on 687 copies (I salute all of those people for taking the time to tag this good book). When I actually clicked on one of the tags and went into the tag browser, I got a really nice reading list of 19th century fiction, all of which were at that library. Same thing for the tag, "Dickens." And for "London." Nice lists, not random. It was easy to use as a browsing tool and something I think I would enjoy as a patron.
A lot of the power of LT and tags comes when there are enough of them for any given title or author to statistically overwhelm the odd tags and entries that are too idiosyncratic to be of much use to anyone except the tagger herself. I'm not sure how helpful this would be for a really obscure title that LT has little data on - so perhaps I'm wondering just how good tagging is at reasonably producing Long Tail recommendations. Tagging is something that, like Amazon reviews for a given title, while perhaps possessing a Long Tail distribution, I believe only really displays its power in the much smaller set (of titles, of reviews that are tagged as helpful) that accounts for most of the data.
My conclusion after that last run-on Dickensian sentence is that adding LT's huge tag dataset could be a very interesting addition for most public libraries, especially given the price which is supposed to be very low. What's to lose?
Posted by WARREN, SCOTT
| May 16 2007, 11:57:11 AM EDT | Permalink
|
Thursday March 29, 2007
Blog rankings
Just now I had the extremely weird experience of looking something up with Google and getting as the very first result, ta-da! my own blog. It was a bit like dialing a random telephone number and getting my own voice mail.
Here's what happened. Someone wrote me an e-mail about a Petrarchan sonnet and used the word "octet" to refer to the first eight lines. I was pretty sure that this was inaccurate, and that the correct term is "octave," which people mix up with the term for the last six lines, "sestet."
But I wasn't quite sure, so I figured I'd check. I thought I remembered that Karen Ciccone told me that Google now allows truncation with asterisks, so I entered this search string:
petrarchan sonnet sestet oct*
I've now remembered that what Karen actually said was that you can use asterisks within quotation marks to search Google, so that, for instance, you can search "the * is too much with us" and find Wordsworth's famous sonnet. But you know how it is when you're Googling, especially if you're a fast typist -- it's easier to Google than it is to think.
So Google didn't recognize my asterisk as a truncation, and simply searched for "oct." Which, of course, is a well-known abbreviation for "October," so you can see why a blog that uses abbreviated months in its datestamps would rank high. And that helps explain why the first result I got for that search was the blog I created last semester for a graduate course in Victorian Poetry. I also wonder whether Google KNOWS IT'S ME (I was signed in at the time) and is giving me a top result from a site it has reason to believe I'd be interested in.
But it's still weird to me, just how findable the course blogs that I created last semester are, and how high their hit counts continue to be. My course blogs for ENG 560 and ENG 669 continue to make the "hot blogs" list on the front page of WolfBlogs, getting anywhere from 5 to 30 hits per day, usually. The Victorian Poetry blog, in particular, continues to keep its profile high even though no one's visiting the blog anymore. The course is over, so no one's posting or commenting, and no other site has linked to the blog.
So here's my question: Why are those basically defunct blogs still so "hot"? I've looked at the referrer logs, and while they do show lots of hits from Google and other search engines, they also list lots of "direct" (which can mean "unidentified") hits. Sometimes the "direct" hits are double the hits from search engines. Any explanations?
I did write once to Scott Warren, "I'd bet that the reason my course blogs continue to rank high in WolfBlogs is that both the assignments and especially the comments contain a whole lot of really specific, concrete, familiar terms, such as names of authors, poems, and books. Those kinds of terms are heavily searched, as Jakob Nielsen points out." If that's true, it would confirm the real importance of particular kinds of web writing in good web ranking. Blogs do, of course, rank high in Google results generally, but that doesn't explain why my old, unvisited blogs are often getting more hits than (say) Horseless Library.
Wouldn't it be cool if we could make a library website that appeared in the top results whenever information-illiterate students Googled their research topics? Imagine what would happen if all those dynamically-generated catalog pages were crawled.
By the way: That Google search didn't really do it for me, so I turned around in my chair, reached up on the shelf, pulled down my copy of The New Princeton Encyclopedia of Poetry and Poetics, and discovered that I was right. The correct term is "octave."
He quotes a friend: "discovery has moved to the network layer and libraries should stop allocating their time and money trying to build better end-user UI, and concentrate instead on delivery".
He goes on say as "discovery services move to the network there is less reason why libraries should maintain duplicative local data caches."
Please read the full post to put these quotes in context.
What I find interesting is that the library technology community now seems precisely focused on "trying to build better end-user UI" and trying to develop "local data caches" in response to the limitations of licensed access. I don't hear much about delivery and fulfillment services at digital library or library technology conferences. Thoughts?
Some choice quotes in there such as "[in six months Google] Book Search has accomplished enough to transform the academic profession." Also: "Research in my world is very often a personal matter of haggling for more time with the particular librarian in question. They're used to us, and I figure they need a good struggle to keep them alert. But thanks to Google Book Search, these days of scavenger-hunt and tug-of-war are drawing to an end."
Whoa.
Interesting that I found this post through a search industry analyst blog and not through the usual library channels.
An interesting article appeared in the NYT business section on Sunday, http://www.nytimes.com/2007/02/25/business/yourmoney/25slip.html?_r=1&oref=slogin about a company called Blinkx. Blinkx is trying to make digital video searchable and the article discusses their methodology and why it appears to be more successful than Google Video. GV searches metadata associated with video, whereas Blinkx is trying to automatically transcribe speech in videos (an extremely nontrivial problem) and then search that speech rendered into text.
I believe this merits digital library attention because when we discuss institutional repositories, there is always a focus on papers or data, but video doesn't seem as prominent. Yet video may be an important part of what IRs or future university archives end up holding. For instance, last Friday night Blanton Godfrey, the Dean of the College of Textiles, appeared on Watch North Carolina People with Bill Friday. His conversation was wide ranging. I'm trying to add a DVD/video of his appearance to the Textiles Library media collection. But if someone wanted to know if he spoke about a particular subject in his roughly half hour interview, there's no easy way to find out, without searching a transcript or watching the entire show.
This type of content may become more common in the future, especially for prominent individuals. A few weeks ago a number of us had an ongoing discussion about a concept called 'lifelogging' where a person tries to keep track of their entire life digitally (or entire digital life; the two are as yet distinguishable, the second being easier and more common). If video becomes something we collect in earnest, both for repositories and for regular collections, then being able to search that content in a meaningful way (i.e. not GV) becomes important.
It is, however, worth noting the limitations of what Blinkx is trying to do. The last paragraph in the article emphasizes that Blinkx is searching the sounds (or rather transcribed text) of a video and cannot do anything about the images in a video. Because so many videos (unlike the example I mentioned above) may not have much speech to go by, there is still no good way to search them (Blinkx for all intents and purposes is just creating much more textual metadata for a video). Imagine, for instance, trying to search Koyaanisqatsi to find out if there are depictions of certain places/jobs/landmarks, natural events, etc.
Tagging has been endlessly promoted for 2.0 catalogs, but as Tim concludes in his report, getting anyone to tag stuff that isn't theirs seems to be a real problem (heck, we even have to pay people (catalogers) to do it :) ). And avoiding spurious political or opinion tagging would be quite a challenge in any academic environment. Technology is easy; human behavior much less so.
Posted by WARREN, SCOTT
| Feb 20 2007, 04:41:50 PM EST | Permalink
|
Monday January 29, 2007
Immersive Learning Environments Librarian
An interesting follow-up to Scott's last two questions("what are the ramifications for us as librarians in creating learning
spaces, both virtual and real, that will enable focused attention for
learning, not merely disperse it further; and 2) how does one gain
attention for collections in an increasingly (and wonder of wonders
this is not a contradiction) diasporic and dense information space?").
I read on a 'gaming in libraries' listserve this afternoon that McMaster University is now hiring for an "Immersive Learning Environments Librarian" who will, among other things, "steward the creative development, implementation, and evaluation of immersive learning environments for the campus community." One of the goals of this position is to increase awareness of the library's learning resources. That would be one approach to gaining attention for collections. Its not clear to me whether 'Reference and Instruction Librarians' are some of the other learning resources this new position will try to introduce to faculty and students, though.
Scott, I wonder if immersion in a virtual world makes all these streams of information seem somehow more manageable?
Posted by Joe Williams
| Jan 29 2007, 02:50:53 PM EST | Permalink
|
Tuesday January 02, 2007
Filtering and Attention
Below I've copied a post by Gary Frost in response to an IF: Book blog entry termed, Future of the Filter. The metaphor Frost employs in his second paragraph is, to me, wonderful and I think he raises a very good point about the increasing pressure to filter relentless streams of new content (or what filters the filters?). There is only so much to which anyone can pay mind to before a deficit of meaningful and sustained attention, much less thoughtful response, occurs. These discussions are part of a larger school of thought termed the attention economy. Other articles that I've recently perused on the subject are a First Monday article and a cogent interview with Richard Lanham, author of The Economics of Attention: Style and Substance in the Age of Information (thanks to Keith Morgan for that one).
From Gary Frost:
When I was growing up there was only one broadcast TV channel in
Chicago. I believe it was only on three hours each evening. After the
broadcast we would sit in the dark watching the test pattern. Now there
are many more channels but the capacity of our visual attention to a
single screen has not multiplied as well.
One way to visualize the multiplicity of content now presented is to
imagine a circular kaleidoscopic view with content continuously
advancing inward from the perimeter. In this metaphor content
dissipates as it migrates to the center until only a single channel is
apparent at the very center. This metaphor represents the experience
and the attention of the bionic reader. Given such visualization it is
curious that the choices are not limitless but only two; either persist
focus on the resolved channel or supplant it with another.
It's the increasingly short cycle time for the choices at the end that are really at the heart of the matter. Is there such a thing as a "Planck time" for attention, which once we drop below that threshold choices as represented by the ability to understand what we are choosing to view and more importantly, why, become essentially meaningless? That's an extreme and perhaps reductionist statement, but still I wonder. As I read things like this and the other articles above, some of the questions I'm trying to keep in the back of my head are
1)what are the ramifications for us as librarians in creating learning spaces, both virtual and real, that will enable focused attention for learning, not merely disperse it further; and 2) how does one gain attention for collections in an increasingly (and wonder of wonders this is not a contradiction) diasporic and dense information space?
Posted by WARREN, SCOTT
| Jan 02 2007, 04:23:42 PM EST | Permalink
|
Friday December 15, 2006
Open library data as a platform for research and development
I'm not sure how big the audience is for this topic, but there's an excellent recorded discussion on, among other things, the challenges and benefits of making library bibliographic data freely available. The conversation included three Talis employees, Ross Singer and Tim Spaulding of LibraryThing. A motivation for this discussion was the announcement that the winner of a recent Mellon Award for innovative library software would use the prize money to purchase bibliographic data from the Library of Congress and make it freely available.
There are many issues raised by the prospect of making large amounts of bibliographic data freely available for unrestricted use. Plenty of legal, economic, technological and social issues to go around. What I am most interested in is what can be done with large piles of freely available bibliographic data. How could freeing this data improve the services libraries provide to their users? If folks (other than library geeks) could envision tangible benefits of open library data, there would be more support for these efforts. Library conference programs of late demonstrate a healthy interest in making library catalogs better so I think there would be a receptive audience.
Perhaps what is needed is some structure to move this agenda forward. I wonder whether the library community would be receptive to a TREC style approach to spur research and development in improving the library catalog. In this arrangement, a major institution (LoC? OCLC? Amazon.com?) would make a large dataset of bibliographic records available in some useful format and challenge folks to solve one or two specific problems of general interest to the library community. Perhaps some prize money could help generate interest, though press releases highlighting winners don't hurt.
Here are three catalogs-specific problem areas I can think of:
Applications of the FRBR model in catalog discovery interfaces
Relevance ranking models for catalog search
Recommendation systems
Subject clustering
Or perhaps the bibliographic data could be used to be build new services of value outside the catalog.
One last thought... It seems like some of these problems the library community would like to solve might be of interest to developers or researchers outside the library community. It would be great to get broader research interest in our field.
Posted by Tito Sierra
| Dec 15 2006, 12:08:25 PM EST | Permalink
| Comments [1]
Thursday December 14, 2006
Search 2.0
Here's a year-end roundup of the latest trends in search. Is there anything here that could be directly applied to improving digital library search?
Posted by Tito Sierra
| Dec 14 2006, 11:05:14 AM EST | Permalink
|