Pages

Categories

Blogroll

Archives

Meta

Thinking about the future of the library catalog, the following has been bugging me for a long time. I’ve finally written something about it that is at least semi-articulate, I hope, and perhaps a statement that I can use as a basis for my sabbatical project.

Selection of resources relevant to a particular user community has long been a function of libraries, especially academic libraries. In this day of information overload, the selection function is more valuable and more challenging than ever before. In the past, factors such as acquisitions budgets and limited availability of physical items greatly restricted the amount of content that a single library could collect. Today, with the proliferation of electronic content, these restrictions have been greatly reduced. Libraries can choose to “collect” materials relevant to their user community even if they don’t actually acquire and/or store copies of the information objects. They accomplish this by placing resource surrogates where users can find them. In a sense, the catalog itself becomes the collection.
(more…)

November 11th, 2008 | Tags: | Category: Metadata, On my mind, Sabbatical |

I’m taking a badly needed sabbatical from July 1, 2009 through June 30, 2010. The problem is, I have many good project ideas, but I haven’t had any success actually nailing down a concrete project yet. It’s getting a bit frustrating because I’ve been thinking about this a lot for more than a year now. I’ve even been in contact with some people who might offer interesting project opportunities, but nothing has materialized yet, partially because I haven’t been able to articulate a well-defined project idea. I know there are lots of interesting metadata and digital repository/library projects out there that could benefit from my labor, but I’m at a loss as to how to approach getting hooked up with such a project. How do I convince people that matter that I have the knowledge, experience, and movtivation to contribute something meaningful to their projects in exchange for nothing more than some useful experience for myself? Everybody just seems inclined to write me off as another useless, obsolete cataloger. Maybe that’s all I really am. Maybe the sabbatical is just a waste of time and I should just get out of the field entirely. That’s an awfully tempting thought.

October 28th, 2008 | Tags: | Category: On my mind, Sabbatical |

One of the biggest problems faced by academic library subject selectors today is building collections that include materials from a wide variety of sources, and particularly, how to integrate content that is available for free on the Web into existing collections. This is a complex problem with many facets: identifying relevant content, determining what is available for free online and where it is available, constructing and maintaining persistent links to the content in a remote location and/or downloading it and archiving it locally if allowed, and creating or obtaining and enhancing records to represent this content that can be incorporated into local discovery services (such as a library’s online catalog).

I would like to build an online community, perhaps similar to Wikipedia, where subject selectors could find information and tools that would help with the various aspects associated with building collections in the Web environment. Components could include:

  • Identification of “core” resources in various subjects (contributed directly by experts and derived through/supported by statistical analysis)
  • Utilize various APIs (such as Google Books, WorldCat, Internet Archive) to obtain bibliographic information and links for resources.
  • Support customization of metadata and export of records in various formats to local catalogs or other databases.
  • Support for tagging and generating customized feeds from the central server that could be incorporated into local web pages.
September 30th, 2008 | Tags: | Category: Sabbatical |
  • Make it much easier to create linking entries between records, e.g. support point and click creation of 776 and other linking entry fields between records for the same bibliographic content in print or electronic format
  • Provide access to full-text of cataloging rules and LC cataloging manuals within the OCLC cataloging interface (via an API that accesses the data sources directly?)
  • Provide color coding in the record editing display similar to that provided in a software programming integrated development environment (IDE)
  • Provide much more robust tools for analyzing record contents and updating records in batches, perhaps utilizing data visualization display techniques (as per Hillmann & Dushay)
  • Improved record validation to check for internal inconsistencies in records (e.g. a in ILLS fixed field, but no ill. in 300|b)
September 29th, 2008 | Tags: | Category: Metadata, Sabbatical |

These days, I often find myself wishing that WorldCat was built on a Wiki platform with robust web services built in, so that I could easily correct or add information to the master record when I find errors or omissions, and I could link in to WorldCat to pull out basic bibliographic information that I could incorporate into resource discovery tools built and maintained locally. The WorldCat API is certainly a step in the right direction, but we have a long way to go. Is anybody at OCLC thinking along these lines and/or working on this?

September 10th, 2008 | Tags: | Category: Metadata, On my mind |

I heard a really interesting story on All Things Considered last Thursday. A computer scientist from Carnegie Mellon came up with a brilliant idea. He developed a system that uses text from digitized print sources that OCR programs could not correctly decipher as the anti-bot key text that appears on certain websites (like Ticketmaster). So, now, instead of just keying computer-generated text rendered in fuzzy images, people are prompted to key words that are often easy for humans to recognize, but almost impossible for the bots to comprehend. The websites also capture the data entered by the end-users in order to improve the text capture for digitized books. When enough people agree on what the word is, the data is fed back to the source digitization project and used to improve OCR-generated full-text. The really cool thing is how much work has been accomplished through micro-contributions of time and knowledge made by millions of people.

This is just totally cool. Wouldn’t it be fabulous if we could find similar methods for capturing data that would help to improve metadata for bibliographic resources? Imagine if OCLC could come up with a similar mechanism for collecting variations on WorldCat master records made by individual libraries and individual users. Master records could be enhanced substantially without painstaking work from OCLC Quality Control staff.

The most valuable and expensive aspect of cataloging is capturing human knowledge effectively. We need systems that will allow end users to make small contributions to enhancing metadata easily and seamlessly, and give professionals the tools they need to quickly and systematically analyze this data, so that it can be incorporated it into the infrastructure. That seems like a key part of the Semantic Web: developing ways to capture, organize, and relate little bits of information and knowledge from all over the place into a coherent whole.

August 19th, 2008 | Tags: | Category: Metadata, On my mind |

Arghhh!! When will this insanity end?

I’ve spent a good portion of the past two days struggling to update the records in our online catalog for titles included in Oxford Reference Online. This process is so annoying and frustrating that I’m about ready to give up entirely. Why don’t I? Because I need to add our holdings for these titles to WorldCat.

A couple of years ago, I made the mistake of loading the MARC records provided in the database publisher’s free record set into our local catalog. The main problem now is that I need to get holdings for these titles added to OCLC in support of the new, WorldCat based Summit catalog. So, no problem, I thought, I’ll just extract the ISBN numbers from the publisher-supplied records I still have in the database, put those into a text file, and do a batch search of WorldCat to download these records to a local file in Connexion, and then export the records to our local catalog, overlaying on ISBN number. Sounds pretty straightforward, but it’s actually a huge pain the butt!

First problem: I used screen-scraping (only method possible) to gather current ISBNs for the 71 titles that we still have the non-OCLC records for in our database and saved them in a plain text file that I then uploaded for batch searching in Connexion Client. All 71 of them came up with multiple matches, even though I used all possible limits to try to restrict my searching to just records for the online/electronic resource versions of these books. I’ve slogged through records for about 15 titles so far, and I’ve observed common characteristics that appear on the most acceptable records, but Connexion Client won’t let me filter records within my local file based on those characteristics (e.g. a specific member library symbol in the 040, encoding level I, etc.). So, the only way to select the records to use for this project is to look through all of the 3-5 records retrieved for each ISBN. To catalog 71 titles, therefore, I must examine 3-5 times that many records. If OCLC is going to permit so many duplicate records in WorldCat, they really need to give us more options for limiting and filtering the records retrieved in response to a search. In this case, if I could limit to records with a particular encoding level, English language records (OCLC only allows you to limit based on the language of the content, not the record itself), or contributed by a particular member library other than DLC, it would save me a lot of work. And I would have to do this just to add our holdings to WorldCat, even if I weren’t exporting the records to our local catalog as well.

Second problem: I have to review and make some edits to each record, even if the record is of good quality. Notably, I must update the 049 and add a 949 to each record in order to get our III system to process the records correctly when they load. I wrote a macro that does most of this work for me, but that took me about an hour this morning, including the time I had to spend updating our III load profiles to optimize overlay based on ISBN. Even after specifying that overlay comparison be based upon the normalized form of the ISBN in the 020 field, the III system doesn’t seem to normalize the 020 correctly in all cases. For example, when the ISBN is followed by (pbk.), the III normalization program retains pbk as part of the ISBN, so overlay doesn’t work if that isn’t on the incoming OCLC record. Thus, I have to check and clean up the 020 fields in the existing records in our catalog or the overlay won’t work in many cases.

Third problem: The titles in this database are based on the latest edition of the same title in print. And the publisher doesn’t provide any kind of notification or list of updated content, so you’re left on your own to find the updated titles. Since it has been a year since I last worked on this database, I need to search for updated content and dead links at this point, too. This leads to lots more manual review and checking, since there is no way other than human review to determine if the bibliographic record matches the content currently online at the database site.

This is too hard! And I, as the cataloger, am too distracted keeping track of all the mechanical aspects of searching, selecting, and downloading records to focus attention on the intellectual aspects of cataloging, like providing subject access to these resources that suits our local context. There has to be a better way to do this kind of stuff!!!

August 05th, 2008 | Tags: | Category: Metadata, On my mind, Work |

OCLC’s new xISBN services look really cool. I just wish that I could figure out how to make use of them. Sigh. Maybe this is a sabbatical project?

June 17th, 2008 | Tags: | Category: Metadata, Sabbatical |

Maybe I’m way behind the times, but just yesterday I discovered a really cool online music service called Last.fm

Based in London, UK, Last.fm markets itself as a music social networking site. You can sign up for a free account (optional to listen to music, but necessary to take advantage of many site features), and then listen to complete music tracks from several major labels for free. You can develop customized playlists and “radio” stations, tag tracks with your own keywords, and find other people to discuss music with. If you sign up for an account and download their special (free) software, the service will learn what types of music you like and recommend similar tracks. It does this based on what you listen to at the Last.fm site and, if you allow it, also what you listen to through other music software (such as iTunes or Windows Media Player) or on your iPod.

Unlike similar services, such as Pandora, you get more control over the music that you hear, and you get links to album and artist information, as well as links to purchase MP3 downloads from retailers such as Amazon.com. This makes the service really nice, like iTunes combined with a customizable online radio station.

Are there other services like this out there?

May 19th, 2008 | Tags: | Category: Music, News |

There has been a lot of buzz around “next generation catalogs” in the past couple of years. Many of the next generation projects out there focus on updating the user interface to the catalog, but don’t do much about updating the guts of the systems that feed that interface. Now, in many cases this is a direct result of the developers not having access to the guts of their legacy ILS systems to change anything, and given that constraint, developers have been able to do some really cool stuff.

Those who say that this interface work is simply putting lipstick on a pig have a point, however. The emergence of the web has radically changed the information environment and it is simply impossible to aggregate metadata for all resources that might meet users’ information needs in a single database cum catalog. I believe that structured metadata remains vitally important in the web environment, but it will be distributed. We need to find ways to uniquely identify resources in the web environment and utilize extant metadata as much as possible, no matter where it lives.

For those who focus on building collections for a discrete group of users, “cataloging” work in the future will focus on manipulating and extending extant metadata in order to define relationships that guide users to resources. In order for this to happen, “catalogers” need a platform upon which build resource discovery tools that help users find information relevant to their needs, regardless of format or storage location.

If this is going to happen, “cataloging” has got to involve more than painstakingly creating individual metadata records that describe individual resources. Catalogers need systems that support the higher-level intellectual work that facilitates resource discovery. Right now, human catalogers must devote way too much effort to making sure that the correct alphanumeric code is in the correct sequential position in an individual record. How can we get beyond this?

March 10th, 2008 | Tags: | Category: Metadata, On my mind, Sabbatical |
Older Posts »