Pages
Categories
- Figure Skating (25)
- Keiko (3)
- Knitting (2)
- mashups (1)
- Metadata (16)
- WorldCat (5)
- Music (1)
- News (4)
- NextGenCatalogs (1)
- On my mind (15)
- Personal (14)
- Research (6)
- ReadingNotes (5)
- Sabbatical (38)
- SemanticWeb (3)
- Theatre (7)
- Travel (5)
- Work (4)
Blogroll
Archives
Meta
A version of this post was published on the ALA TechSource Blog on July 31, 2009.
From Legacy Data to Linked Data: Preparing Libraries for Web 3.0
Monday, July 13, 8:00 to 10:00 a.m.
How can library cataloging data be transformed to function within “Web 3.0″ and be understood by non-library web applications? Speakers from both the library and Semantic Web communities will explore the situation in a non-technical manner and describe current work underway to transform legacy library data into linked data.
Moderator: Corey A. Harper, Metadata Services Librarian, New York University
Speakers: Eric Miller, President, Zepheira, Inc.; Diane Hillmann, Director of Metadata Initiatives, Information Institute of Syracuse; Jennifer Bowen, Co-Principal Investigator, eXtensible Catalog Project, University of Rochester; Rebecca Guenther, Senior Networking and Standards Specialist, Network Development & MARC Standards Office, Library of Congress
This session was a highlight of the 2009 ALA Annual Conference. It brought together four recognized leaders to discuss the emergence of linked data on the Web and the role that the library community can play in realizing the Semantic Web. The session drew a standing room only crowd, and offered a glimpse at the future of cataloging.
First up was Eric Miller, President and co-founder of Zepheira, who provided an overview of the current state of linked data development. Miller defined linked data as: “a term used to describe a recommended best practice for exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using URIs.” He emphasized sharing and connecting data as the key elements. Thousands of organizations and individuals are currently participating in creating linked data, and the availability of linked data has increased tremendously over the past six months.
Linked data principles:
- URIs represent “things”: people, places, concepts, departments
- Using HTTP-compliant URIs makes data more accessible
- When serving URIs, deliver useful, reusable information
- Leverage standards (RDF, SKOS, etc.)
- Add context. It’s all about connecting, creating meaningful relationships between data.
Miller argued that the Web itself is becoming the basic architecture for building applications. Linked data applications don’t run ON the Web; they are applications OF the Web. Users increasingly want their data back, and they want it back their way. With linked data, users are no longer limited to searching based on relationships that have been pre-defined by application developers, database designers, or librarians; users can create and search based on relationships that are meaningful to them. Miller’s company Zepheira is currently working with the Library of Congress to create Recollection, a new platform intended to provide more useful tools and processes for sharing diverse content across the myriad collections covered by the LC Digital Preservation Program. This will empower users to create new views for existing data, combine data sets in customizable ways, and build communities around the data, allowing them to collaborate in curating and connecting collections in customized ways. Zepheira has also launched Freemix, a new social networking application designed to allow users to mix and share data.
In closing, Miller noted that in the linked data environment, credibility is more important than ever before. Libraries are trusted institutions with a wealth of experience in organizing and managing information resources. The library community needs to position itself to leverage this reputation and take a larger role in the development of linked data applications. Linked data has arrived, and the library community cannot afford to be left behind.
Diane Hillmann’s presentation addressed the question: Are Libraries Ready for Linked Data? Her answer: a resounding yes! Linked data is all about relationships, libraries have been concerned with expressing relationships between information objects for a very long time, and we now understand that we must use machine-based methods if we want to do a really good job. Traditional cataloging provides attribute = value pairs, for example: Title = [value] or Author = [value]. These attributes are embedded within a record that has an identifier. Because they don’t have independent identifiers, attributes cannot be referenced outside the context of a record. Linked data is based upon a model of triples consisting of subject, predicate, and object, which permits the assignment of identifiers at the attribute level. Identifiers can also be assigned to relationships between attributes. Hillmann is currently involved in building a registry that maintains and serves relationship identifiers: http://metadataregistry.org/ A vocabulary based on RDA should be completely registered within a few weeks. It will be freely accessible to support linked data applications implemented by libraries and others. The registry, combined with the availability of applications and tools such as those being developed in conjunction with the eXtensible Catalog project, constitute essential infrastructure required to enable the library community to become more actively engaged with both using and creating linked data.
Jennifer Bowen provided an overview of the eXtensible Catalog (XC) project, and described how XC supports linked data. One of the primary goals of the project is to build open source software that supports reuse of MARC-encoded library metadata in an extensible environment. Though it has added to the cost of development, XC has been designed specifically to support linked data. XC metadata is based on the FRBR model, and it supports a level of granularity similar to MARC. XC also facilitates metadata harvesting via OAI-PMH and transformation of Dublin Core (DC) metadata. The XC application profile is being developed in accordance with the guidelines for DC application profiles, though it does not mandate the inclusion of DC Metadata Initiative (DCMI) terms. XC requires that terms be defined in RDF, and it is designed to utilize metadataregistry.org. XC incorporates terms from several namespaces and defines a 37 custom elements in its own namespace. Some of the custom elements mirror elements defined in other metadata schemes that are not yet registered, such as RDA and MARC. One of XC’s biggest strengths is that it enables experimentation. It provides Web-based tools that support harvesting, troubleshooting, transformation, and enhancement of metadata outside the context of existing legacy systems. Librarians can explore new approaches to managing metadata with no danger of permanently corrupting or destroying data stored in legacy systems.
Next steps for XC:
- Finalize schema and registry of elements
- Publish application profile
- Identify and define metadata elements for user generated metadata
- Enable schema data to be harvested as RDF
Rebecca Guenther described efforts currently underway at the Library of Congress to make controlled vocabularies available as linked data. The Library of Congress Vocabularies Service is intended to facilitate development and maintenance of vocabularies maintained by LC and make them freely available to both libraries and the broader Web community. The service provides comprehensive information about the vocabularies in addition to the exposing the vocabularies themselves as linked data. Most vocabularies will be represented using the Simple Knowledge Organization System (SKOS), an RDF application that was recently finalized by the W3C. Currently, LCSH is the only vocabulary available, but others will be offered in the future, including LC Name authorities. The service also offers bulk download of data in RDF format. Now that the service is officially up and running, LC plans to advocate for use and solicit user feedback more actively. Also still to come: a mechanism for updating data as changes are made in the underlying vocabularies and the development of an OWL schema for LCSH to provide greater granularity and a means for expressing facets, since SKOS lacks this capability.
2009 ALA Annual Conference
- Attended my last LRTS Editorial Board meeting.
- Attended the OCLC Developers Network Luncheon.
- Met with Patrick Hogan and Nannette Naught to discuss ways I might be able to contribute to the development of the RDA Online Toolkit during my sabbatical.
Attended From ONIX to MARC and back again: new metadata service options at OCLC (OCLC sponsored program)
Renee Register provided an update on OCLC’s next generation cataloging project. The main focus of the project over the past couple of years has been partnering with publishers to better utilize and share UNIX and MARC metadata in WorldCat and publisher database records. OCLC has now started enriching WorldCat records with ONIX data, look for the code OCLNG in the 040 field. NISO and OCLC published a white paper: “Streamlining book metadata workflow” on June 30, 2009, and it is now available on the NISO website. Metadata services for publishers was launched in July 2009, based on the success of the pilot project. http://publishers.oclc.org has links to additional information and research. Next steps: incorporate WorldCat identities, produce additional mapping between terminologies, refine FRBR algorithms, and continue to build collaboration between publishers and libraries. Areas for collaborative work include: best practices, optimization of identifiers, and optimization of subjects.
Kyle and I had dinner with Cyril Oberlander and Lorcan Dempsey at Kitty O’Sheas in the Chicago Hilton.
2009 ALA Annual Conference
ALCTS Continuing Resources Section College and Research Libraries Interest Group program
Adam Chandler: Towards OpenURL Quality Metrics
Chandler described a study conducted at Cornell University in 2008 to evaluate the quality of OpenURL metadata. He noted that in the 10 years since the launch of the OpenURL standard, he could find no evidence of a study establishing quality benchmarks. In 2008-09, the Cornell OpenURL resolver received more than 400,000 requests, and studies show that users now expect to find online full-text for journal articles. It is therefore very important for OpenURL resolvers to perform with a high degree of accuracy and reliability. Unfortunately, there are many possible points of failure in current OpenURL systems. OpenURL metadata problems are a major source of failure. Following methodology developed by Hughes (2004), the Cornell study identified key elements used by various content providers in their link to targets (e.g. title, author, date) and used regular expression matching and Perl scripts to normalize data and build a database to analyze the variety of formatting found in OpenURL metadata from various providers. Chandler has also created an online tool for running reports on this database. Data analysis continues, and Serials Solutions recently agreed to provide some data. See: http://openurlquality.blogspot.com/ for updates and additional findings.
Peter McCracken: KBART Update
McCracken provided an update on the progress of the KBART Working Group. Three main problems with OpenURL currently exist: bad data, bad formatting, and lack of knowledge among vendors/suppliers about the value and importance of OpenURL. The KBART Working Group includes members from link resolver/ERM system vendors, publishers, subscription agents and aggregators, and the main goal of the initiative is to improve the quality of data for everyone and thereby improve the performance of OpenURL resolvers. Improving access for patrons will be the measure of success; if a library has online full-text content from one or more sources, the OpenURL resolver should return accurate results for all sources based on a single search. Right now, the focus is on identifying points of failure and developing solutions. Deliverables include: establishing best practices for delivery, content, and structure of data. They are currently working on identifying 15 distinct fields for data, and EBSCO has supplied a sample file of data for testing. They plan to analyze this file to identify missing data and determine how to fill in any missing elements. They also plan to work on how to routinely collect and take appropriate action on error reports and other feedback from libraries that could help to improve data quality. In the near future, they hope to work on methods for facilitating direct communication between third party resolver providers and database vendors, so that libraries no longer have to take total responsibility for informing vendors about the content that they have purchased. They also plan to address issues associated with consortial packages and non-textual resources.
Regina Reynolds: Best Practices for Presentation of E-Serials
Focused primarily on the problems caused when e-journal publishers and/or aggregators utilize latest entry style treatment for serial title changes in their systems. Basic problem: people cite articles using the title that the serial has at the time they read it. If the title changes later, those old citations persist. If publishers/aggregators eliminate all references to earlier title(s) in their systems, it makes it very difficult for end-users to locate known articles based on older citations. She cited JSTOR as an example of a vendor that provides excellent access and linkages between titles.
EBSCO Academic luncheon
Discussed challenges EBSCO has faced due to the recession. They are cutting costs internally, but not content. Can’t afford to lower prices for customers. Discouraged libraries from canceling journal subscriptions based on full-text availability in EBSCO databases because this drives up the prices that EBSCO must pay to license the full-text content, and will ultimately mean that EBSCO has to raise prices for its full-text database products. Provided information on upcoming database releases, including a new Art & Architecture database intended to compete with Art Abstracts and full-text versions of America: History and Life and Historical Abstracts. Libraries that own some flavor of Academic Search will receive pricing that reflects only the new full-text content they are getting if they subscribe to the full-text versions of these new products. Provided a preview of the EBSCO Discovery Service. New databases and Discovery Service to be released before the end of 2009.
Look Before You Leap: Taking RDA for a Test Drive
Arrived around 2:15 because the EBSCO luncheon didn’t let out until about 1:30, and I had to get myself from the Chicago Hilton all the way down to McCormick West. Even with pretty direct CTA bus service, transit between the downtown hotels and the convention center in Chicago took a long time (and the CTA buses were much faster than the ALA shuttle buses). I arrived in time to see most of Nannette Naught’s presentation, which demonstrated various aspects of the RDA Online toolkit product being developed by ALA Editions. I skipped out part way through the next speaker’s presentation, because it didn’t interest me much and spent some time cruising the conference exhibits. I stopped by the LibLime booth for a brief demo of Koha’s cataloging module. I returned to the RDA ses sion around 5:00 p.m., in the middle of a presentation about the plans for the national libraries RDA testing project. While I understand their concerns, this testing project seems too narrowly focused on the logistical aspects of implementing RDA, primarily in large library environments, with very little focus on actually assessing RDA itself and whether it is going to work any better than AACR2/MARC. I hope that somebody else is going to take a look at that aspect, since that’s the most important thing, as far as I’m concerned. I’d be willing to help out with such a project during my sabbatical if I could get hooked up with some similar-minded people.
Kyle and I treated Anya and Russ Arnold to dinner at Russian Tea Time to welcome Anya as the new Summit Program Manager at the Orbis Cascade Alliance.
2009 ALA Annual Conference
Attended Creating Library Web Services: Mashups and APIs
Generally a useful event, though not as instructive or productive as I had originally hoped. The first half of the day was primarily spent on discussing web services, APIs, etc. in general and demonstrating existing applications, most developed and deployed within the library context. I would have liked to get less cursory demo and more explanation of how the applications work, what tools are used to build and maintain them. The second half of the day was a bit more hands-on, exploring Yahoo Pipes, but the presenter didn’t provide much in the way of a structured exercise to practice on, so it was mostly just playing around. Yahoo Pipes looks interesting, but I wasn’t able to do much with it during the session. I will need to complete online tutorials and spend more time exploring it on my own in order to determine how I might be able to use it.
Kyle and I had dinner at the original Morton’s with Stephen Smith, our boss when we were cataloging Graduate Assistants at the University of Illinois library, and another former cataloging GA. A bit pricey, but delicious with good service. As we were finishing our meal, Dustin Hoffman walked into the restaurant for dinner. It is amazing; he looks exactly the same in person as he does on the screen.