Pages
Categories
- Figure Skating (25)
- Keiko (3)
- Knitting (2)
- mashups (1)
- Metadata (16)
- WorldCat (5)
- Music (1)
- News (4)
- NextGenCatalogs (1)
- On my mind (15)
- Personal (14)
- Research (6)
- ReadingNotes (5)
- Sabbatical (38)
- SemanticWeb (3)
- Theatre (7)
- Travel (5)
- Work (4)
Blogroll
Archives
Meta
We had to say goodbye to our good friend Bo today. We first met Bo when our neighbors brought him home as a puppy in 2003. We enjoyed watching him grow into a full-size Great Dane. He was such a good friend to our dog Keiko that we built a special gate in our fence so that Bo and Keiko could play together in the backyard. After Keiko died in 2008, our neighbors were generous enough to let Bo spend a lot of time at our house, acting as our surrogate dog until we were ready to adopt another dog of our own. When we adopted Powder in January 2009, she and Bo instantly became buddies and he continued to spend a lot of time with us. He frequently joined us for walks around the neighborhood and stayed with us when his primary family was away. This past August, we noticed that he was limping a lot on walks. His dad took him to the vet and returned with a diagnosis of bone cancer. The only visible sign was a large, inoperable tumor growing in his left front leg. I’m grateful that we had a few months to process this terrible news and give our large gray friend a proper send-off with lots of attention, treats and special privileges. My deepest condolences to his primary family. Bo will always have a special place in my heart. I miss him.
I haven’t subscribed to cable for more than 10 years, and I’ve been thoroughly enjoying Netflix since 2003. More evidence that the rest of America is starting to catch on … Cable subscribers flee, but is Internet to blame?
For live sports, I get much better coverage for the sport I most like to watch, figure skating, via Universal Sports (free over the air in my area) and icenetwork.com than the cable networks ever provided. My annual subscription to Icenetwork costs $39.95, and with that I get unlimited access to live and video-on-demand streaming of complete events from the US and abroad, without annoying ads or commentary inserted into the broadcast. Plus, I have access to an archive of similar content going back to 2007. It’s wonderful! Now, I feel kind of sorry for football and other “major sport” fans who are still subject to the restrictions of exclusive broadcast rights contracts held by major networks and cable companies.
A code4lib post about OpenURL resolution and WorldCat got me thinking about something again …
Matching discrete resources across databases is a huge problem right now. Too frequently, searches for items based on more or less complete bibliographic citations will fail to locate the resource in one or more databases due to variations in how bibliographic data is structured and/or varying levels of completeness in the metadata. How could one go about developing an algorithm that could automatically generate some sort of identifier for resources that lack unique identifiers (or for which some combination of elements such as author’s last name, title, date of publication, but not identifier is known) that could be dynamically applied when searching a mega-index like Summon or WorldCat. Similar to the algorithms that Gracenote uses to match music CDs to metadata in their database. This seems like something that could be tackled with natural language processing techniques. How difficult would it be to develop a basic algorithm? How would one approach testing and optimizing such algorithm(s) to improve cross domain search results? I can’t believe that somebody, somewhere isn’t working on this already. But how to locate those people/that research? Is it a sub-set/specialized application of relevancy ranking?
Finally! OCLC truly makes a move toward “moving cataloging to the network level.” Now if only they could introduce some substantial improvements to Connexion Client and/or other cataloging software/interfaces that would help to simplify and streamline cataloging work. But, I guess one (slow) step at a time is all one can expect from an elephant.
I’m basically a news junkie, but I rarely make public comments on what I read on the news wires or hear on NPR. But I find this story on how Netflix saw its profits surge in the 4th Quarter of 2008 fascinating because it really jives with my own experience.
My husband and I had cable TV for awhile, but when we moved to Monmouth in 2000, the only cable provider in town sucked big time, so we gave it up and mounted an antenna on the highest peak of the roof of our 2-story house. With the help of a signal booster and a rotating antenna, we can pull in over the air stations from both Portland and Eugene, and this has been sufficient for us, since we aren’t interested in most of the programming on the cable networks anyway. For years, we have also purchased favorite movies for our home video library, first on VHS and now on DVD. We’ve been collecting for about 15 years now and we own around 250 titles.
In the past five years, things have really improved for us. We’ve had HDTV for 2-3 years, ever since we purchased an HD-ready TV and an HD receiver (cost us about $850 total). We’d watch PBS a lot, since they had better quality programming, including a lot more in HD than many other stations. On the recommendation of friends, we joined Netflix in July 2003. After a few months with Netflix, I really started to wonder why anyone would bother with cable or satellite TV when they could get Netflix service for less than $20 per month. While it’s true that there is some info-tainment programming you can’t get on Netflix (e.g. sports, news, latest episodes of series), watching movies and TV series via Netflix is *way* superior to watching on cable, as far as I’m concerned. Biggest plusses: no commercials, and you control the timing.
Then, about 2 years ago, minet came online, and we added super speedy broadband Internet access for $30 per month. Our broadband combined with the increasing amount of streaming video content that is becoming available online is very powerful. With broadband and video streaming, I am now able to get much better access to sports coverage for the sports I watch (Figure skating and NASCAR) through reasonably-priced, targeted subscription services than I was ever able to get on cable or broadcast TV. I also love watching broadcast network TV shows on-demand through the network websites. And Netflix has really done a good job taking advantage of broadband as a delivery mechanism. We use the “watch instantly” feature fairly regularly, especially for the few cable TV series we like (e.g. Weeds), and we’ve found it to be incredibly reliable, even though we stream via our wireless router to a laptop connected to our new, 32-inch flat screen HDTV using a dual monitor configuration and a DV-I to HDMI cable. For sound, we use a simple stereo wire to connect to our surround sound receiver via the headphone jack on the laptop.
Though the AP article didn’t provide any data on this, I’d be willing to bet that more people in America are discovering what we discovered a long time ago; Netflix provides a whole lot more entertainment value for the money than cable or satellite TV, and in this recession, people are now are dropping their cable and/or satellite subscriptions in favor of Netflix. I think this is a great development, though I don’t know how much longer all of the info-tainment delivery options we currently enjoy will remain so affordable once the suits figure out the business model(s). But I sure am enjoying it while it lasts. I’m also kicking myself for not having purchased Netflix stock back in 2003 …
Thinking about the future of the library catalog, the following has been bugging me for a long time. I’ve finally written something about it that is at least semi-articulate, I hope, and perhaps a statement that I can use as a basis for my sabbatical project.
Selection of resources relevant to a particular user community has long been a function of libraries, especially academic libraries. In this day of information overload, the selection function is more valuable and more challenging than ever before. In the past, factors such as acquisitions budgets and limited availability of physical items greatly restricted the amount of content that a single library could collect. Today, with the proliferation of electronic content, these restrictions have been greatly reduced. Libraries can choose to “collect” materials relevant to their user community even if they don’t actually acquire and/or store copies of the information objects. They accomplish this by placing resource surrogates where users can find them. In a sense, the catalog itself becomes the collection.
(more…)
I’m taking a badly needed sabbatical from July 1, 2009 through June 30, 2010. The problem is, I have many good project ideas, but I haven’t had any success actually nailing down a concrete project yet. It’s getting a bit frustrating because I’ve been thinking about this a lot for more than a year now. I’ve even been in contact with some people who might offer interesting project opportunities, but nothing has materialized yet, partially because I haven’t been able to articulate a well-defined project idea. I know there are lots of interesting metadata and digital repository/library projects out there that could benefit from my labor, but I’m at a loss as to how to approach getting hooked up with such a project. How do I convince people that matter that I have the knowledge, experience, and movtivation to contribute something meaningful to their projects in exchange for nothing more than some useful experience for myself? Everybody just seems inclined to write me off as another useless, obsolete cataloger. Maybe that’s all I really am. Maybe the sabbatical is just a waste of time and I should just get out of the field entirely. That’s an awfully tempting thought.
These days, I often find myself wishing that WorldCat was built on a Wiki platform with robust web services built in, so that I could easily correct or add information to the master record when I find errors or omissions, and I could link in to WorldCat to pull out basic bibliographic information that I could incorporate into resource discovery tools built and maintained locally. The WorldCat API is certainly a step in the right direction, but we have a long way to go. Is anybody at OCLC thinking along these lines and/or working on this?
I heard a really interesting story on All Things Considered last Thursday. A computer scientist from Carnegie Mellon came up with a brilliant idea. He developed a system that uses text from digitized print sources that OCR programs could not correctly decipher as the anti-bot key text that appears on certain websites (like Ticketmaster). So, now, instead of just keying computer-generated text rendered in fuzzy images, people are prompted to key words that are often easy for humans to recognize, but almost impossible for the bots to comprehend. The websites also capture the data entered by the end-users in order to improve the text capture for digitized books. When enough people agree on what the word is, the data is fed back to the source digitization project and used to improve OCR-generated full-text. The really cool thing is how much work has been accomplished through micro-contributions of time and knowledge made by millions of people.
This is just totally cool. Wouldn’t it be fabulous if we could find similar methods for capturing data that would help to improve metadata for bibliographic resources? Imagine if OCLC could come up with a similar mechanism for collecting variations on WorldCat master records made by individual libraries and individual users. Master records could be enhanced substantially without painstaking work from OCLC Quality Control staff.
The most valuable and expensive aspect of cataloging is capturing human knowledge effectively. We need systems that will allow end users to make small contributions to enhancing metadata easily and seamlessly, and give professionals the tools they need to quickly and systematically analyze this data, so that it can be incorporated it into the infrastructure. That seems like a key part of the Semantic Web: developing ways to capture, organize, and relate little bits of information and knowledge from all over the place into a coherent whole.
Arghhh!! When will this insanity end?
I’ve spent a good portion of the past two days struggling to update the records in our online catalog for titles included in Oxford Reference Online. This process is so annoying and frustrating that I’m about ready to give up entirely. Why don’t I? Because I need to add our holdings for these titles to WorldCat.
A couple of years ago, I made the mistake of loading the MARC records provided in the database publisher’s free record set into our local catalog. The main problem now is that I need to get holdings for these titles added to OCLC in support of the new, WorldCat based Summit catalog. So, no problem, I thought, I’ll just extract the ISBN numbers from the publisher-supplied records I still have in the database, put those into a text file, and do a batch search of WorldCat to download these records to a local file in Connexion, and then export the records to our local catalog, overlaying on ISBN number. Sounds pretty straightforward, but it’s actually a huge pain the butt!
First problem: I used screen-scraping (only method possible) to gather current ISBNs for the 71 titles that we still have the non-OCLC records for in our database and saved them in a plain text file that I then uploaded for batch searching in Connexion Client. All 71 of them came up with multiple matches, even though I used all possible limits to try to restrict my searching to just records for the online/electronic resource versions of these books. I’ve slogged through records for about 15 titles so far, and I’ve observed common characteristics that appear on the most acceptable records, but Connexion Client won’t let me filter records within my local file based on those characteristics (e.g. a specific member library symbol in the 040, encoding level I, etc.). So, the only way to select the records to use for this project is to look through all of the 3-5 records retrieved for each ISBN. To catalog 71 titles, therefore, I must examine 3-5 times that many records. If OCLC is going to permit so many duplicate records in WorldCat, they really need to give us more options for limiting and filtering the records retrieved in response to a search. In this case, if I could limit to records with a particular encoding level, English language records (OCLC only allows you to limit based on the language of the content, not the record itself), or contributed by a particular member library other than DLC, it would save me a lot of work. And I would have to do this just to add our holdings to WorldCat, even if I weren’t exporting the records to our local catalog as well.
Second problem: I have to review and make some edits to each record, even if the record is of good quality. Notably, I must update the 049 and add a 949 to each record in order to get our III system to process the records correctly when they load. I wrote a macro that does most of this work for me, but that took me about an hour this morning, including the time I had to spend updating our III load profiles to optimize overlay based on ISBN. Even after specifying that overlay comparison be based upon the normalized form of the ISBN in the 020 field, the III system doesn’t seem to normalize the 020 correctly in all cases. For example, when the ISBN is followed by (pbk.), the III normalization program retains pbk as part of the ISBN, so overlay doesn’t work if that isn’t on the incoming OCLC record. Thus, I have to check and clean up the 020 fields in the existing records in our catalog or the overlay won’t work in many cases.
Third problem: The titles in this database are based on the latest edition of the same title in print. And the publisher doesn’t provide any kind of notification or list of updated content, so you’re left on your own to find the updated titles. Since it has been a year since I last worked on this database, I need to search for updated content and dead links at this point, too. This leads to lots more manual review and checking, since there is no way other than human review to determine if the bibliographic record matches the content currently online at the database site.
This is too hard! And I, as the cataloger, am too distracted keeping track of all the mechanical aspects of searching, selecting, and downloading records to focus attention on the intellectual aspects of cataloging, like providing subject access to these resources that suits our local context. There has to be a better way to do this kind of stuff!!!