Pages
Categories
- Figure Skating (25)
- Keiko (3)
- Knitting (2)
- mashups (1)
- Metadata (16)
- WorldCat (5)
- Music (1)
- News (4)
- NextGenCatalogs (1)
- On my mind (15)
- Personal (14)
- Research (6)
- ReadingNotes (5)
- Sabbatical (38)
- SemanticWeb (3)
- Theatre (7)
- Travel (5)
- Work (4)
Blogroll
Archives
Meta
Overall impression
Provides valuable insight into the contemporary library cataloging environment. Concludes that the Library of Congress (LC) currently shoulders a disproportionate share of the cost of producing catalog records, contributing to an overall impression among librarians and vendors that the cost of such production is substantially lower than it actually is. The authors argue that this has contributed to the development of a situation in which neither libraries nor vendors are willing to invest the resources required to do original cataloging of many mainstream materials, leading to the continued accumulation of backlogs and hindering the efficacy of cooperative cataloging programs. Alongside the helpful data and analysis, the report also includes some highly questionable estimates, assumptions and assertions. Excellent bibliography featuring many recent reports and articles analyzing cataloging practice and the future of cataloging.
Opportunities for additional research
- To what extent do the requirements of complying with national/LC standards currently contribute to the cost associated with original cataloging, especially for organizations that are much smaller than LC, where economies of scale limit development of human and technical resources to support cataloging operations?
- The methodology utilized to estimate the current number of catalogers in North America is highly questionable. The admittedly rough estimates presented in this report need to be subjected to additional scrutiny and research. Desperately!
- This report repeatedly asserts that substantial resources are currently wasted by catalogers who insist upon editing records of already acceptable quality for local use without valid reason or sufficient benefit. The report does not present sufficient evidence to support this assertion. More focused study is needed to examine this issue, so that we can understand the extent to which local edits are made and why. The report describes this type of editing as entirely unnecessary and “redundant.” To what extent is this true? What is the true cost-benefit of this type of editing? To what extent is this “redundant” effort the result of current systems limitations and/or architecture, in which copies of the same record are stored in isolation from one another and catalogers have very limited ability to make useful edits to “master” records for a variety of technical and not-so-technical reasons?
Notable quotes
p. 4: “Conclusions are based on careful consideration of survey results; interviews and conversations with practicing librarians and vendors; discussion among members of the Bibliographic Record Production social network; extensive reading; participation in the OCLC/NISO Metadata Symposium (April 2009); and our own direct experience with cataloging production and distribution.”
p. 14: “In response to the query about what else we should know about libraries’ MARC records environments, there was great appreciation for LC’s work as well as pleas that it continue. There is general dismay about the quality of non-LC vendor provided records. There were several negative remarks about LC’s cessation of Series Authority work. Many libraries stated that free records are essential to their operations. And many reported difficulty managing vendor supplied record sets for eBook collections and serials packages.”
p. 25: “This tension — between community values and commercial values, between idealism and pragmatism, between social responsibility and private benefit – has deeply affected some aspects of the library market. Cataloging, regarded by many as the heart of librarianship, is one of those areas.”
p. 26: “In general, libraries understate (or simply don’t recognize) the full costs associated with cataloging. This renders questionable any comparison with stated prices from vendors, who typically do have a good handle on costs, since their continued operation depends upon it.”
p. 27:” [due to the contributions of LC]…an entire industry has developed around free (or at least very cheap) MARC records. …many libraries and vendors benefit from a product for which production costs are not recovered.”
p. 27: “Over the past five years, LC has absorbed significant budget cuts, and faces continuing pressure. It has undertaken major staff reductions, especially in its cataloging operations. CIP, as a program that is not directly related to LC’s mission, and for which the costs of production divert staff resources from other programs, must obviously be considered for adjustment. Given the existing level of dependency, such a change would affect the entire profession and the industry based on that profession.”
p. 27: “The market for cataloging records is in some important respects dysfunctional. In our view, the biggest issue is that the market lacks sufficient incentives to stimulate the production of new cataloging records. Obviously, many books, journals, electronic resources and other items are being cataloged, so some elements of the market are working. But structurally, it seems clear that something is amiss.”
p. 27: “Because of their own staffing constraints, or unwillingness to bear the cost of original record creation, many libraries simply wait for another library to catalog an item they have already received. On average those items are held for three to six months, with periodic searches of OCLC to determine whether another library has blinked. While this makes sense as a way of controlling costs, it does not provide optimal service for users.”
p. 29: “Why do so few libraries join BIBCO or CONSER (which also relies on fewer than 50 members)? And what factors determine which records are contributed? It seems likely that titles most needed and valued locally by the contributing library would take priority. Because most of the big PCC contributors are research libraries, it’s also likely that many titles are specialized, and therefore may not be widely held. Therefore, fewer libraries would benefit from that contribution, as opposed to, say, DVDs, which are cited as a major problem by most libraries in the survey. Regardless, these libraries are committed to contributing to the community. Which titles they catalog would matter less if participation were broader. A cooperative system only works well if everyone participates.”
p. 29: “But somehow the incentives to produce new cataloging records are insufficient, from both the commercial and community viewpoints. Otherwise, there would be greater participation in cooperative programs, and/or more vendors seeking to become cataloging agencies. At bottom, we believe this is because cataloging costs and therefore prices are understated and artificially depressed. An even more sobering possibility is that the profession does not believe that cataloging is worth what it costs to create it; that will be quickly determined once all production costs are factored into the price. In the meantime, this is a market that appears to require adjustment.”
p. 33-34: “While major economies of scale have been realized, troublesome issues remain. First, there are thousands of small libraries that operate outside of the shared cataloging infrastructure. Most of these lack the capacity to produce MARC records—they have no catalogers. Second, catalogers have an almost unstoppable urge to improve, tweak, customize and “localize” national-level records; redundant work is still widespread. Third, despite efficiencies, cataloging backlogs continue to grow, not only for audio-visual materials, rare books, and non-Roman languages, but even for the most commonly-held materials.”
p.34: “We now operate in a context where questions about the efficacy of the MARC record and the centrality of the OPAC are continually posed. We wrestle with keeping libraries relevant, and assuring their participation at the network level. We have also entered an era where questions about cost and return on investment are routinely asked of non?profit entities such as libraries. In the long run, there may be better and cheaper alternatives than MARC. In the short run, there may be ways to reduce the cost of producing MARC records.”
p. 34: “LC’s own ILS cannot, for instance, accept ONIX records directly – they must first be converted to MARC21. This will undoubtedly change over time, but for now, most libraries will continue to need cataloging records delivered in MARC format—it is the only usable solution.”
p. 34: “There remain strong arguments for use of standard cataloging principles–controlled vocabulary, classification, subject analysis, and authority control—packaged and delivered in a consistent format. While MARC records may need to be extended, embellished (supplemented with full text, flap copy, excerpts, user tags), for now they provide a common standard and a cooperative infrastructure that controls costs.”
p. 35:”Once records have been distributed to vendors, most seek to add value. In general, this means matching records to a group of titles being shipped, and adding fund, location and electronic invoicing data, updating or adding proxy prefixes to URLs. These value adds are important to workflows, but do not necessarily change the bibliographic data (although some vendors do perform CIP upgrades if they are needed). Once libraries receive the records, via OCLC, the vendor, or another source, many also seek to add value in other ways. As noted in the survey results, 80% of libraries perform some degree of local editing on the records, to customize them for their own constituency. Increasingly, libraries are adding or linking to table of contents, Amazon, and other external sources in order to enrich the bibliographic description.”
p.36: “Another factor affecting shared capacity is the increasing emphasis on making accessible material that is unique locally. This shifts priorities within a given library, and makes it more difficult to share capacity, and to dedicate hours to work that might benefit the community as a whole over one’s own institution.”
p.36:”While it is true that the records produced by LC need to be better supported, it is difficult to imagine the profession and the industry without them. They provide enormous value, to a degree that is difficult to calculate.”
An article published in today’s Chronicle of Higher Education offers a tantalizing bit of insight into one approach Google is considering for cleaning up and enhancing Google Books data and metadata. They are reportedly offering substantial grants and soliciting proposals from select humanities scholars for projects such as: “Developing systems for crowd-sourced corrections to book data and metadata.” It will be interesting to see how humanities scholars respond to that particular suggestion; seems like it might be of more interest to librarians/catalogers than to the scholars themselves. I know that I’d love to work with Google on developing this type of system. I’m certainly curious to see what they come up with, if anything.
My library recently implemented WorldCat Local as the default catalog/discovery interface for our collections. I learned about a new book today, so I went to the our new catalog to see about getting a copy. I typed the title of the book (“bright sided”) into the WorldCat Local search box and clicked search. The first page of search results gave me 4 articles that appear to provide reviews of the book. You can see for yourself the remaining 6 results, all articles, none of which appear to have anything to do with either the book or the terms I searched. For example, the title of the 5th item is: “The physical state and plasma biochemical profile of young calves on arrival at a slaughter plant.” When I do an Advanced search and enter my terms as Title, the book shows up as the 5th item in the results list (below the reviews and the “fold”, so I must scroll down to discover this).
When I do a keyword search on the terms “bright sided” in the old catalog interface, I get the response “no entries found” and a prominently placed button that I can click on to pass my search through to our consortial catalog, Summit, where the book comes up as the first item in the results list.
This seems like a pretty common use case: person finds out about a new book and goes to the library to see if they can borrow a copy. Which catalog interface performs better for the user? I’d say the old catalog since it tells me immediately that my library does not own the book in question, and provides an easy way for me to repeat my search in the consortial catalog, where the book is found immediately, and I can request it immediately. Because WorldCat Local automatically promotes hits for items owned by my library to the top of the results display, regardless of any other measure of relevancy to the search, the WorldCat Local interface plunges me into confusion and leaves me there wondering where to go next to find the answer to the relatively simple question: can I borrow this book? Ironically, the free WorldCat.org interface actually does a better job of answering my question than subscription-based WorldCat Local because the book appears as the first hit in the results list, and it shows libraries near me (based on my IP address) that own the book.
Perhaps the “show me first what my library owns” algorithm that is the main product distinction/selling point of WorldCat Local works better with fuzzier, topical keyword searches. I don’t know, I haven’t really researched that. It sure doesn’t seem to offer the best approach for known item searches, however, especially in cases where the local library doesn’t own a given title (or doesn’t have holdings attached properly in WorldCat).
One of the attractive things about the internship I’m doing was getting the opportunity to work with a tool specifically designed to facilitate the collection and, to a certain extent, curation of metadata. It seems like that’s increasingly what catalogers are being asked to do when we’re asked/expected to take metadata in a wide variety of formats with widely varying levels of completeness into catalogs or other local databases for discovery and/or resource management purposes. Unfortunately, most of our systems were designed with a vastly differnt paradigm in mind. Certainly, ILS systems are designed to ingest bibliographic records in MARC format created elsewhere, but they operate on the principle that a person is going to sit there and manually compare a physical object, such as a book, to each individual bibliographic record and make adjustments manually to suit local needs.
The system I’m working with during my internship is designed specifically to collect data from multiple sources and combine it into a highly sophisticated database where that data can be analyzed, manipulated and output in very flexible ways. It is also specifically intended to allow people with little to no knowledge of databases or programming to do pretty sophisticated stuff without having to learn much about programming. Though the system is still in a fairly early stage of development, I can see that it has real potential. Over the next couple of months, I’m going to experiment with building some library-oriented applications on this platform to test its relevance to solving the challenges that librarians face with regard to data curation.
The internship is going well so far. My main task for the week was completing basic orientation and training. I can’t go into much detail because I’m bound by a non-disclosure agreement, but I think it’s safe to say that the software I’m working with does some pretty impressive stuff and definitely has features that could/should be incorporated into next-generation cataloging systems. Whether any library or library vendor could manage to develop or afford to purchase access to software with this level of sophistication is an open question, unfortunately. I’ve got real tasks to complete on the team schedule in the coming week, so we’ll see how I handle that challenge.
So far, the work environment is good, but quite different. Working in a private corporate setting, there are perks that would be inconceivable in a public sector setting. There is also a high level of focus on specific tasks, and expectations for individual and group performance are high, but not unreasonable. People are friendly and low key, but there is an intensity and focus on quality here that I find really appealing. I am more optimistic than ever that this internship is going to be a really good experience.
A version of this post was published on the ALA TechSource Blog on July 31, 2009.
From Legacy Data to Linked Data: Preparing Libraries for Web 3.0
Monday, July 13, 8:00 to 10:00 a.m.
How can library cataloging data be transformed to function within “Web 3.0″ and be understood by non-library web applications? Speakers from both the library and Semantic Web communities will explore the situation in a non-technical manner and describe current work underway to transform legacy library data into linked data.
Moderator: Corey A. Harper, Metadata Services Librarian, New York University
Speakers: Eric Miller, President, Zepheira, Inc.; Diane Hillmann, Director of Metadata Initiatives, Information Institute of Syracuse; Jennifer Bowen, Co-Principal Investigator, eXtensible Catalog Project, University of Rochester; Rebecca Guenther, Senior Networking and Standards Specialist, Network Development & MARC Standards Office, Library of Congress
This session was a highlight of the 2009 ALA Annual Conference. It brought together four recognized leaders to discuss the emergence of linked data on the Web and the role that the library community can play in realizing the Semantic Web. The session drew a standing room only crowd, and offered a glimpse at the future of cataloging.
First up was Eric Miller, President and co-founder of Zepheira, who provided an overview of the current state of linked data development. Miller defined linked data as: “a term used to describe a recommended best practice for exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using URIs.” He emphasized sharing and connecting data as the key elements. Thousands of organizations and individuals are currently participating in creating linked data, and the availability of linked data has increased tremendously over the past six months.
Linked data principles:
- URIs represent “things”: people, places, concepts, departments
- Using HTTP-compliant URIs makes data more accessible
- When serving URIs, deliver useful, reusable information
- Leverage standards (RDF, SKOS, etc.)
- Add context. It’s all about connecting, creating meaningful relationships between data.
Miller argued that the Web itself is becoming the basic architecture for building applications. Linked data applications don’t run ON the Web; they are applications OF the Web. Users increasingly want their data back, and they want it back their way. With linked data, users are no longer limited to searching based on relationships that have been pre-defined by application developers, database designers, or librarians; users can create and search based on relationships that are meaningful to them. Miller’s company Zepheira is currently working with the Library of Congress to create Recollection, a new platform intended to provide more useful tools and processes for sharing diverse content across the myriad collections covered by the LC Digital Preservation Program. This will empower users to create new views for existing data, combine data sets in customizable ways, and build communities around the data, allowing them to collaborate in curating and connecting collections in customized ways. Zepheira has also launched Freemix, a new social networking application designed to allow users to mix and share data.
In closing, Miller noted that in the linked data environment, credibility is more important than ever before. Libraries are trusted institutions with a wealth of experience in organizing and managing information resources. The library community needs to position itself to leverage this reputation and take a larger role in the development of linked data applications. Linked data has arrived, and the library community cannot afford to be left behind.
Diane Hillmann’s presentation addressed the question: Are Libraries Ready for Linked Data? Her answer: a resounding yes! Linked data is all about relationships, libraries have been concerned with expressing relationships between information objects for a very long time, and we now understand that we must use machine-based methods if we want to do a really good job. Traditional cataloging provides attribute = value pairs, for example: Title = [value] or Author = [value]. These attributes are embedded within a record that has an identifier. Because they don’t have independent identifiers, attributes cannot be referenced outside the context of a record. Linked data is based upon a model of triples consisting of subject, predicate, and object, which permits the assignment of identifiers at the attribute level. Identifiers can also be assigned to relationships between attributes. Hillmann is currently involved in building a registry that maintains and serves relationship identifiers: http://metadataregistry.org/ A vocabulary based on RDA should be completely registered within a few weeks. It will be freely accessible to support linked data applications implemented by libraries and others. The registry, combined with the availability of applications and tools such as those being developed in conjunction with the eXtensible Catalog project, constitute essential infrastructure required to enable the library community to become more actively engaged with both using and creating linked data.
Jennifer Bowen provided an overview of the eXtensible Catalog (XC) project, and described how XC supports linked data. One of the primary goals of the project is to build open source software that supports reuse of MARC-encoded library metadata in an extensible environment. Though it has added to the cost of development, XC has been designed specifically to support linked data. XC metadata is based on the FRBR model, and it supports a level of granularity similar to MARC. XC also facilitates metadata harvesting via OAI-PMH and transformation of Dublin Core (DC) metadata. The XC application profile is being developed in accordance with the guidelines for DC application profiles, though it does not mandate the inclusion of DC Metadata Initiative (DCMI) terms. XC requires that terms be defined in RDF, and it is designed to utilize metadataregistry.org. XC incorporates terms from several namespaces and defines a 37 custom elements in its own namespace. Some of the custom elements mirror elements defined in other metadata schemes that are not yet registered, such as RDA and MARC. One of XC’s biggest strengths is that it enables experimentation. It provides Web-based tools that support harvesting, troubleshooting, transformation, and enhancement of metadata outside the context of existing legacy systems. Librarians can explore new approaches to managing metadata with no danger of permanently corrupting or destroying data stored in legacy systems.
Next steps for XC:
- Finalize schema and registry of elements
- Publish application profile
- Identify and define metadata elements for user generated metadata
- Enable schema data to be harvested as RDF
Rebecca Guenther described efforts currently underway at the Library of Congress to make controlled vocabularies available as linked data. The Library of Congress Vocabularies Service is intended to facilitate development and maintenance of vocabularies maintained by LC and make them freely available to both libraries and the broader Web community. The service provides comprehensive information about the vocabularies in addition to the exposing the vocabularies themselves as linked data. Most vocabularies will be represented using the Simple Knowledge Organization System (SKOS), an RDF application that was recently finalized by the W3C. Currently, LCSH is the only vocabulary available, but others will be offered in the future, including LC Name authorities. The service also offers bulk download of data in RDF format. Now that the service is officially up and running, LC plans to advocate for use and solicit user feedback more actively. Also still to come: a mechanism for updating data as changes are made in the underlying vocabularies and the development of an OWL schema for LCSH to provide greater granularity and a means for expressing facets, since SKOS lacks this capability.
Here’s an interesting tidbit in response to those out there who argue that there is no longer a need for bibliographic control when you can just search across full-text using Google. Having encountered this situation myself many times, it’s nice to see that others have noticed the issue. What can be done to improve the metadata for these items so that linkages between related works (be they editions, additional volumes, etc.) can be made explicit and browsable to readers?
I’m starting a new series of posts to document some of the nitty gritty challenges faced by catalogers who work with WorldCat. Today’s challenge involves adding content to WorldCat records.
Today, I cataloged a Continuing Resource (aka Serial) with a title change: The United Nations Today. I found a lovely CONSER record for the new title, #273856824, that I was pretty happy with, only there was some valuable information missing from the record that I wanted to add: 1. The ISBN number that was printed on the piece and 2. The URL for the United Nations web site that includes selected full-text from the print edition.
There is no ISSN printed on the piece, nor was there already an ISSN in the CONSER record. Here’s what I saw when I attempted to validate the record after adding the 020 with the ISBN:
WorldCat would not allow me to add a 020 to a Continuing Resource record, even though this title has an ISBN but no ISSN. I added the ISBN to the record in our local catalog because standard numbers are critical data elements for record matching. But I was prevented from doing this in WorldCat by policy.
As for the URL to the companion web site, I was not permitted to add this to the master record because it is a CONSER, Encoding Level = blank record and mine is not a CONSER library. Again, this URL is included in the copy of the WorldCat record stored in our local catalog, so patrons can get access to online full-text through our catalog, but not through WorldCat.
My preference would be to add this information to the WorldCat master record, where it would be available to a much broader audience. But, current WorldCat policies prevented this.
Finally! OCLC truly makes a move toward “moving cataloging to the network level.” Now if only they could introduce some substantial improvements to Connexion Client and/or other cataloging software/interfaces that would help to simplify and streamline cataloging work. But, I guess one (slow) step at a time is all one can expect from an elephant.
Thinking about the future of the library catalog, the following has been bugging me for a long time. I’ve finally written something about it that is at least semi-articulate, I hope, and perhaps a statement that I can use as a basis for my sabbatical project.
Selection of resources relevant to a particular user community has long been a function of libraries, especially academic libraries. In this day of information overload, the selection function is more valuable and more challenging than ever before. In the past, factors such as acquisitions budgets and limited availability of physical items greatly restricted the amount of content that a single library could collect. Today, with the proliferation of electronic content, these restrictions have been greatly reduced. Libraries can choose to “collect” materials relevant to their user community even if they don’t actually acquire and/or store copies of the information objects. They accomplish this by placing resource surrogates where users can find them. In a sense, the catalog itself becomes the collection.
(more…)
