
Tuesday February 27, 2007
Searching digital video
An interesting article appeared in the NYT business section on Sunday, http://www.nytimes.com/2007/02/25/business/yourmoney/25slip.html?_r=1&oref=slogin about a company called Blinkx. Blinkx is trying to make digital video searchable and the article discusses their methodology and why it appears to be more successful than Google Video. GV searches metadata associated with video, whereas Blinkx is trying to automatically transcribe speech in videos (an extremely nontrivial problem) and then search that speech rendered into text.
I believe this merits digital library attention because when we discuss institutional repositories, there is always a focus on papers or data, but video doesn't seem as prominent. Yet video may be an important part of what IRs or future university archives end up holding. For instance, last Friday night Blanton Godfrey, the Dean of the College of Textiles, appeared on Watch North Carolina People with Bill Friday. His conversation was wide ranging. I'm trying to add a DVD/video of his appearance to the Textiles Library media collection. But if someone wanted to know if he spoke about a particular subject in his roughly half hour interview, there's no easy way to find out, without searching a transcript or watching the entire show.
This type of content may become more common in the future, especially for prominent individuals. A few weeks ago a number of us had an ongoing discussion about a concept called 'lifelogging' where a person tries to keep track of their entire life digitally (or entire digital life; the two are as yet distinguishable, the second being easier and more common). If video becomes something we collect in earnest, both for repositories and for regular collections, then being able to search that content in a meaningful way (i.e. not GV) becomes important.
It is, however, worth noting the limitations of what Blinkx is trying to do. The last paragraph in the article emphasizes that Blinkx is searching the sounds (or rather transcribed text) of a video and cannot do anything about the images in a video. Because so many videos (unlike the example I mentioned above) may not have much speech to go by, there is still no good way to search them (Blinkx for all intents and purposes is just creating much more textual metadata for a video). Imagine, for instance, trying to search Koyaanisqatsi to find out if there are depictions of certain places/jobs/landmarks, natural events, etc.
Try Blinkx out at http://blinkx.com/
Posted by WARREN, SCOTT
| Feb 27 2007, 12:38:06 PM EST
| Permalink
|

Tuesday February 20, 2007
Tagging at LibraryThing and Amazon
Tim Spaulding has written a very interesting summary of an experimental comparison he performed examining some parameters of tagging at LibraryThing and Amazon. It can - and should - be read by librarians at http://www.librarything.com/thingology/2007/02/when-tags-works-and-when-they-dont.php
Tagging has been endlessly promoted for 2.0 catalogs, but as Tim concludes in his report, getting anyone to tag stuff that isn't theirs seems to be a real problem (heck, we even have to pay people (catalogers) to do it :) ). And avoiding spurious political or opinion tagging would be quite a challenge in any academic environment. Technology is easy; human behavior much less so.
Posted by WARREN, SCOTT
| Feb 20 2007, 04:41:50 PM EST
| Permalink
|

Monday January 29, 2007
Immersive Learning Environments Librarian
An interesting follow-up to Scott's last two questions ("what are the ramifications for us as librarians in creating learning
spaces, both virtual and real, that will enable focused attention for
learning, not merely disperse it further; and 2) how does one gain
attention for collections in an increasingly (and wonder of wonders
this is not a contradiction) diasporic and dense information space?").
I read on a 'gaming in libraries' listserve this afternoon that McMaster University is now hiring for an "Immersive Learning Environments Librarian" who will, among other things, "steward the creative development, implementation, and evaluation of immersive learning environments for the campus community." One of the goals of this position is to increase awareness of the library's learning resources. That would be one approach to gaining attention for collections. Its not clear to me whether 'Reference and Instruction Librarians' are some of the other learning resources this new position will try to introduce to faculty and students, though.
Scott, I wonder if immersion in a virtual world makes all these streams of information seem somehow more manageable?
Posted by Joe Williams
| Jan 29 2007, 02:50:53 PM EST
| Permalink
|

Tuesday January 02, 2007
Filtering and Attention
Below I've copied a post by Gary Frost in response to an IF: Book blog entry termed, Future of the Filter. The metaphor Frost employs in his second paragraph is, to me, wonderful and I think he raises a very good point about the increasing pressure to filter relentless streams of new content (or what filters the filters?). There is only so much to which anyone can pay mind to before a deficit of meaningful and sustained attention, much less thoughtful response, occurs. These discussions are part of a larger school of thought termed the attention economy. Other articles that I've recently perused on the subject are a First Monday article and a cogent interview with Richard Lanham, author of The Economics of Attention: Style and Substance in the Age of Information (thanks to Keith Morgan for that one).
From Gary Frost:
When I was growing up there was only one broadcast TV channel in
Chicago. I believe it was only on three hours each evening. After the
broadcast we would sit in the dark watching the test pattern. Now there
are many more channels but the capacity of our visual attention to a
single screen has not multiplied as well.
One way to visualize the multiplicity of content now presented is to
imagine a circular kaleidoscopic view with content continuously
advancing inward from the perimeter. In this metaphor content
dissipates as it migrates to the center until only a single channel is
apparent at the very center. This metaphor represents the experience
and the attention of the bionic reader. Given such visualization it is
curious that the choices are not limitless but only two; either persist
focus on the resolved channel or supplant it with another.
It's the increasingly short cycle time for the choices at the end that are really at the heart of the matter. Is there such a thing as a "Planck time" for attention, which once we drop below that threshold choices as represented by the ability to understand what we are choosing to view and more importantly, why, become essentially meaningless? That's an extreme and perhaps reductionist statement, but still I wonder. As I read things like this and the other articles above, some of the questions I'm trying to keep in the back of my head are
1)what are the ramifications for us as librarians in creating learning spaces, both virtual and real, that will enable focused attention for learning, not merely disperse it further; and 2) how does one gain attention for collections in an increasingly (and wonder of wonders this is not a contradiction) diasporic and dense information space?
Posted by WARREN, SCOTT
| Jan 02 2007, 04:23:42 PM EST
| Permalink
|

Friday December 15, 2006
Open library data as a platform for research and development
I'm not sure how big the audience is for this topic, but there's an excellent recorded discussion on, among other things, the challenges and benefits of making library bibliographic data freely available. The conversation included three Talis employees, Ross Singer and Tim Spaulding of LibraryThing. A motivation for this discussion was the announcement that the winner of a recent Mellon Award for innovative library software would use the prize money to purchase bibliographic data from the Library of Congress and make it freely available.
There are many issues raised by the prospect of making large amounts of bibliographic data freely available for unrestricted use. Plenty of legal, economic, technological and social issues to go around. What I am most interested in is what can be done with large piles of freely available bibliographic data. How could freeing this data improve the services libraries provide to their users? If folks (other than library geeks) could envision tangible benefits of open library data, there would be more support for these efforts. Library conference programs of late demonstrate a healthy interest in making library catalogs better so I think there would be a receptive audience.
Perhaps what is needed is some structure to move this agenda forward. I wonder whether the library community would be receptive to a TREC style approach to spur research and development in improving the library catalog. In this arrangement, a major institution (LoC? OCLC? Amazon.com?) would make a large dataset of bibliographic records available in some useful format and challenge folks to solve one or two specific problems of general interest to the library community. Perhaps some prize money could help generate interest, though press releases highlighting winners don't hurt.
Here are three catalogs-specific problem areas I can think of:
- Applications of the FRBR model in catalog discovery interfaces
- Relevance ranking models for catalog search
- Recommendation systems
- Subject clustering
Or perhaps the bibliographic data could be used to be build new services of value outside the catalog.
One last thought... It seems like some of these problems the library community would like to solve might be of interest to developers or researchers outside the library community. It would be great to get broader research interest in our field.
Posted by Tito Sierra
| Dec 15 2006, 12:08:25 PM EST
| Permalink
|

Thursday December 14, 2006
Search 2.0
Here's a year-end roundup of the latest trends in search. Is there anything here that could be directly applied to improving digital library search?
Posted by Tito Sierra
| Dec 14 2006, 11:05:14 AM EST
| Permalink
|

Wednesday December 13, 2006
8 million books described in LibraryThing
Over at LibraryThing, they announced that they had passed 8 million books today. Now I like and admire LibraryThing and the people behind it quite a lot, but I found the descriptions of those 8 million books a bit disingenious. LibraryThing says, "LibraryThing has between eight and fifty times as many books as the World's Biggest Bookstore, in Toronto, Ontario, if laid end-to-end, LibraryThing's books would stretch 1,075 miles, LibraryThing hold 58,580 books by J. K. Rowling, LibraryThing has 80 times as many books as you have hairs on your head," etc.
The problem is, LibraryThing itself actually does not have a single book. The place where they are correct is when they say, "If LibraryThing were a library, it would be the 14th larget library in the United States." That conditional is important. I may be splitting hairs, but this language question brings up an interesting question - just what is a library? If you have descriptions of books, but no actual books, are you a library? But what if your constituent members have personal libraries? Is the aggregate description of the collected holdings of physically extant libraries, personal or institutional, a library itself?
I think not. Nor are you a bookstore, since in neither case can the institution known as LibraryThing actually provide a book to anyone in and of itself . And that ability to provide resources is the key. Require payment for
providing the end product, you're a store; don't require payment, but seek subsidies for
providing resources to some defined group of people, you are a library. Don't require payment from anybody, seek subsidies or define users, and you are some sort of barter system likely based on personal, rather than institutional, collections.
LibraryThing doesn't
fall nicely into any of those categories, though to be fair it nicely enables the buying of books from several vendors (now including 2 local stores), the location of local libraries holding copies via WorldCat, and looking at popular swap sites like BookMooch. It plays some sort of very interesting middle role that greases the discovery and serendipitous browsing of titles, especially fiction, before any of those other transactions begin and its ability to play that role well grows more potent with every new entry.
What I think I'm driving at, however, is that having book data itself, while really really interesting and very potentially useful for stores, libraries, and swaps, doesn't itself constitute a library. Libraries (book libraries anyway) and book stores, by my definitions, are not just about the collections of physical books they have, though that is the first principle. They also have an economic dimension that revolves around the transactions involved in both the procurement and providing of book resources. And I don't think you can get away from that; not easily anyway.
This isn't to suggest that because LibraryThing is not a library or a store there is a problem. Far from it. Tim Spaulding et al. have done something really unusual in creating something novel. LibraryThing seems to be to be exactly what its name suggests, something for which we don't quite have a word, or at least a good one. If I were to describe it, I would call it at its smallest a personalized version of OCLC (which is based on institutional holdings), but I believe that doesn't do a good job of conveying it to nonlibrarians nor does it fully encompass the scope of what is happening. Perhaps in Geman we could string together a series of words to make a new one that perfectly described LibraryThing, but I'm at a loss in English. Anyone have a good idea for the class of objects that LibraryThing (and OCLC I believe) belongs to? Other than the best resource around to study personalized book reading (not book buying, that's Amazon) - which makes me wonder just when will the first thesis based on LibraryThing data arrive? A one word name. Neologisms welcome.
Posted by WARREN, SCOTT
| Dec 13 2006, 05:29:47 PM EST
| Permalink
|

Tuesday December 12, 2006
Nifty search tool for Iraq report
if:book has an interesting post on inventive ways to present public-domain texts. http://www.futureofthebook.org/blog/archives/2006/12/how_would_you_design_the_iraq.html
One example is Vivisimo, a search engine company, who did the following:
"The search engine crawled the PDF file of the final report issued by
the Iraq Study Group and indexed it by paragraph rather than as a
single document. By breaking the 142 page file into paragraphs, readers
of the report can now search for specific aspects that are of interest
to them instead of having to read through the entire document or
perform a tedious keyword search within the document using the Acrobat
application. When a search query is entered, the search engine returns
the relevant paragraphs in the search results. Additionally, the Velocity Clustering Engine
is used to cluster the search results into related topic areas. The
clusters allow the reader to easily browse related information and
uncover relationships between topics within the report. This demo was
placed online by Vivísimo within minutes of the publication of the
final report, showing the speed with which the Vivísimo Velocity Search Platform can be deployed."
I don't intuitively see how they've selected the topics that "cluster" under my search terms, but it's an interesting set of views of this text.
I wonder what we could learn from this. Is it (clustered/faceted views of search results) something we could do with our own digital content and Endeca? I'd like to consider useful ways to offer access to content we create, and would be glad to hear from those with a grasp of the technical challenges.
Posted by Monica McCormick
| Dec 12 2006, 06:08:01 PM EST
| Permalink
|

Wednesday December 06, 2006
Contests
The folks at programmableweb are maintaining a list of programming contests. Under previous contests they list the OCLC Research Software Contest and the Talis: Mashing Up the Library Competition. I would love to see more library related contests on the list.
Posted by Tito Sierra
| Dec 06 2006, 09:31:30 AM EST
| Permalink
|

Tuesday December 05, 2006
Differences in how researchers and librarians see things
The source for the table below is a report from the UK-based Research Information Network called Researchers and discovery services: behaviors, perceptions and needs. They collected their data by phone interviewing 400 researchers and 50 librarians in the UK. The executive summary is worth a review, though I want to call attention to the section on the comparison between researcher survey results and librarian survey results. 
One of the areas of high divergence that caught my eye is the librarian perception that researchers only undertake simple searches, whereas a significant percentage of researchers self-report as undertaking a variety of search strategies (simple to sophisticated) depending on the task.
Anyone find any of these conclusions surprising?
Posted by Tito Sierra
| Dec 05 2006, 11:19:36 AM EST
| Permalink
|

Thursday November 16, 2006
Book as Terrain
The neatest mashup I've seen in a long time was highlighted today over on If: Book, http://www.futureofthebook.org/blog/archives/2006/11/book_as_terrain.html
What kinds of stories, narratives, navigation, use, and instruction might result from turning book pages, both image and text, into maps? I don't know, but I like thinking about it. The actual tool is at http://www.maplib.net/index.php
Idle playtoy or something a bit deeper? Only time will tell.
Posted by WARREN, SCOTT
| Nov 16 2006, 03:23:26 PM EST
| Permalink
|

Wednesday November 15, 2006
The invisible hand of the library in the marketplace
I'm all hopped up on Kim Duckett's and Scott Warren presentation to the NCSU Librarians Association. Kim and Scott rock! They talked about how explaining the economic role of the library is a good way to teach students why to use the library.
Their talk and an observation by our colleague Josh Wilson got me to thinking about how truly weird the library's economic role is in one particular scenario -- patrons Googling journal articles from on campus. They go from Google (or Yahoo, or a link on some random web page) straight to full text articles. I'm not talking about open access journals. I'm talking about articles in big-money journals published by Elsevier, Springer, et. al. If patrons are on campus, they get to the text seamlessly, no cost, no hassle, no need to use library databases or Journal List. The weird thing is that not only is the article free, there's no notice that it cost anyone, anywhere a dime. This works because Springer et. al. treat an NCSU IP address like a "paid in full" receipt. (Off-campus versions of this story are more complicated, and I won't get into them here.)
I'm all for seamless access, but it sure seems like we, the library, should drop a note in the middle of that process that says "Paid for by NCSU Libraries" Or, to borrow an idea from Andrew Pace, a discount sticker that says "This cost $___, you pay $0, you save 100%." The way we do it now is like an auto mechanic sneaking into a customer's garage at night, silently providing a free tune-up, and deliberately not leaving a "Free tune-up by ACME Garage" note. The car owner is left thinking that his car is never needs servicing. Credit, in the customer's mind, goes to Honda. In our journal article case, credit in a patron's mind goes where? Google? That Springer outfit that provides all the free stuff?
Note that my prescription, dropping a "Paid for by NCSU Libraries" notice in the Google-to-Springer path, is not something we can do. Our servers are not in that path. We would have to ask Springer to do it. And Elsevier, and Wiley, and Nature, and ...
One of the valuable parts of Kim's and Scott's presentation is that it opens students' eyes to something they don't know about. "Lifting a veil" is Kim's metaphor. What can we do systematically to make the economics visible without interfering with the easy access that we also value?
Posted by BOYER, JOSH
| Nov 15 2006, 01:06:20 PM EST
| Permalink
|

Monday November 13, 2006
UnSuggestions
Every recommendation tool I've ever seen is trying to put the patron in contact with things that he or she will like based on prior consumption. This morning I saw a post on LibraryThing's blog that did the opposite. http://www.librarything.com/unsuggester
You can read all about it at http://www.librarything.com/blog/2006/11/booksuggester-and-unsuggester.php
Put in a title and see what other people who own or have read a work likely will never read, buy, or consider. Several examples are provided too (we learn that people who read The Confessions of St. Augustine do not like to read Night Pleasures (though perhaps based on the title they should in order to better understand just what the young Augustine was in fact confessing to)). And likewise. The algorithm is explained and it is quite fun for a few minutes to put in titles and quickly conclude that yeah, I'd never read those other things in a million years and whomever is reading them just has markedly different interests and tastes. It seems trivial at first glance, but it's worth pointing out that Tim Spaulding, the LibraryThing guy, posted earlier on NGG4LIB that
"I plan to use it to calculate "diversity" metrics
for users, and later various "levels." Basically you take a list of
books, eg., twenty top "academic" books and use their associations as
the touchstones that order all other books. I'm hoping it can produce
something like OCLC "Audience level" stats.
"
Data's fun.
Tim
Nowt that's interesting. And I wonder what will come of it.
On a personal level, LibraryThing seems quite neat, but I still haven't found it at all useful for professional work. The stuff I buy with my various collection funds is never found there. If I were in the public library world or possibly buying for the humanities, then I think the relevance might be higher. For my own reading, I already have so many things to read that honestly, the last thing I need are new recommendations (well, I'll take recommendations, but likely won't act on them). What I believe I enjoy most about LibraryThing is the FRBR-ization; seeing all the different covers and editions collocated in one place.
I'm left with two conclusions when reading the blog and Tim Spaulding's post. 1) A nagging sense that OCLC just missed the boat here. and 2) That I agree wholeheartedly with Tim Spaulding. Data is fun.
Posted by WARREN, SCOTT
| Nov 13 2006, 03:01:08 PM EST
| Permalink
|
Time, Space, History - Digital History Projects
At this year's Educause conference Dr. Edward Ayers (University of Virginia) and William Thomas III (University of Nebraska) gave a great presentation about digital history projects they are working on. You can watch their presentation on the Educause conference website. It's very engaging and informative.
They reported on some very interesting digital projects including visualizations to explore the movement of ex-slaves across different areas following the end of slavery and to trace how the development of the railroad system impacted communities in Nebraska and other western states.
One of the most interesting projects they describe focuses on how they worked with groups of students to use the UVA library's special collections to research social history issues. The project is called The Southern History Database. The students were assigned different geographic areas in the South, researched themes in social history, wrote narrative accounts, and populated the database with content. Students then used the entire collection of narrative histories in the database to look for commonalities in social issues across different geographic areas. The project is designed to grow over time with students being the central researchers. It sounds like an incredible opportunity to engage students with historical documents while deepening their connection to the materials through digital technologies.
Posted by DUCKETT, KIMBERLY
| Nov 13 2006, 02:33:54 PM EST
| Permalink
|

Tuesday October 31, 2006
Mozilla Firefox extension for citation management
Carol Vreeland, NCSU Librarian for Life Sciences, forwarded me this link about Zotero, a new extension for managing citations through the Mozilla Firefox 2.0 web browser. Zotero is still in beta development and definitely has limited capabilities. Most noticeable and unfortunate is that it does not yet work with NCSU Libraries' Endeca-powered online catalog. But I thought Zotero was still well worth mentioning because it already does some interesting and useful things.
When Zotero is enabled in your browser, it identifies citations on the webpages you visit and signals you with a small icon in the location bar at the end of the url. Clicking this icon opens Zotero in the bottom half of your screen and automatically adds the cite to any Library you create. This feature seems to work very well in Google Scholar, and also inside some of NCSU's general/multi-purpose databases such as Ebsco's Academic Search Premier and JSTOR.
As you're surfing the web, you can also manually add citations to your library with a few clicks and some text entry. You can also add webpage snapshots, full-text documents, and presonalized notes to your entries. All your Library documents are available both online and offline.
Zotero allows you to import records - a workaround for NCSU Endeca users - and it can export in several formats. It also has the ability to generate a bibliography with items that you highlight, formatting them in either APA, MLA, or Chicago style.
Posted by Joe Williams
| Oct 31 2006, 03:42:55 PM EST
| Permalink
|
|
|

Horseless Library image by Herman Berkhoff
|
| Archives |
|
|
| « February 2007 » | | Sun | Mon | Tue | Wed | Thu | Fri | Sat |
|---|
| | | | | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | | 21 | 22 | 23 | 24 | 25 | 26 | | 28 | | | | | | | | | | | | | Today |
|
|
|
|
|
|
| Links |
|
|
|
|
|
|
|
|