Pages
Categories
- Figure Skating (9)
- Keiko (3)
- Knitting (2)
- Metadata (14)
- WorldCat (5)
- Music (1)
- News (1)
- On my mind (12)
- Personal (14)
- Research (2)
- ReadingNotes (1)
- Sabbatical (31)
- SemanticWeb (3)
- Theatre (7)
- Travel (5)
- Work (4)
Blogroll
Archives
Meta
I have not been bored since returning from the ALA Annual Conference in Chicago. My main activity has been preparing for my trip to Cambridge, MA this fall. I purchased my plane ticket, and I’ve spent a lot of time searching for short-term furnished housing. The good news: I think I’ve found a place that will meet my needs and my budget!
I’ve spent a fair amount of time writing up reports on sessions I attended at 2009 ALA Annual Conference for my personal blog, and I was invited to submit a write-up of one session to the ALA TechSource blog. I’ve just finished that up, so hopefully it will appear soon.
I’ve skated at Sherwood 4 times, for a total of about 8 hours. I also got my skates sharpened for the first time in more than a year. Wow! what a difference that makes.
Kyle and I went to the new Harry Potter movie, and I finished re-reading the book for about the 4th time. The movie wasn’t too bad, but the book is so much better! I just don’t understand why they changed and added some of the things they did in the movie, when what is written in the book would have worked just as well. It will be interesting to see how they cope with the fact that they omitted some fairly major details from this film in the final two films.
A version of this post was published on the ALA TechSource Blog on July 31, 2009.
From Legacy Data to Linked Data: Preparing Libraries for Web 3.0
Monday, July 13, 8:00 to 10:00 a.m.
How can library cataloging data be transformed to function within “Web 3.0″ and be understood by non-library web applications? Speakers from both the library and Semantic Web communities will explore the situation in a non-technical manner and describe current work underway to transform legacy library data into linked data.
Moderator: Corey A. Harper, Metadata Services Librarian, New York University
Speakers: Eric Miller, President, Zepheira, Inc.; Diane Hillmann, Director of Metadata Initiatives, Information Institute of Syracuse; Jennifer Bowen, Co-Principal Investigator, eXtensible Catalog Project, University of Rochester; Rebecca Guenther, Senior Networking and Standards Specialist, Network Development & MARC Standards Office, Library of Congress
This session was a highlight of the 2009 ALA Annual Conference. It brought together four recognized leaders to discuss the emergence of linked data on the Web and the role that the library community can play in realizing the Semantic Web. The session drew a standing room only crowd, and offered a glimpse at the future of cataloging.
First up was Eric Miller, President and co-founder of Zepheira, who provided an overview of the current state of linked data development. Miller defined linked data as: “a term used to describe a recommended best practice for exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using URIs.” He emphasized sharing and connecting data as the key elements. Thousands of organizations and individuals are currently participating in creating linked data, and the availability of linked data has increased tremendously over the past six months.
Linked data principles:
- URIs represent “things”: people, places, concepts, departments
- Using HTTP-compliant URIs makes data more accessible
- When serving URIs, deliver useful, reusable information
- Leverage standards (RDF, SKOS, etc.)
- Add context. It’s all about connecting, creating meaningful relationships between data.
Miller argued that the Web itself is becoming the basic architecture for building applications. Linked data applications don’t run ON the Web; they are applications OF the Web. Users increasingly want their data back, and they want it back their way. With linked data, users are no longer limited to searching based on relationships that have been pre-defined by application developers, database designers, or librarians; users can create and search based on relationships that are meaningful to them. Miller’s company Zepheira is currently working with the Library of Congress to create Recollection, a new platform intended to provide more useful tools and processes for sharing diverse content across the myriad collections covered by the LC Digital Preservation Program. This will empower users to create new views for existing data, combine data sets in customizable ways, and build communities around the data, allowing them to collaborate in curating and connecting collections in customized ways. Zepheira has also launched Freemix, a new social networking application designed to allow users to mix and share data.
In closing, Miller noted that in the linked data environment, credibility is more important than ever before. Libraries are trusted institutions with a wealth of experience in organizing and managing information resources. The library community needs to position itself to leverage this reputation and take a larger role in the development of linked data applications. Linked data has arrived, and the library community cannot afford to be left behind.
Diane Hillmann’s presentation addressed the question: Are Libraries Ready for Linked Data? Her answer: a resounding yes! Linked data is all about relationships, libraries have been concerned with expressing relationships between information objects for a very long time, and we now understand that we must use machine-based methods if we want to do a really good job. Traditional cataloging provides attribute = value pairs, for example: Title = [value] or Author = [value]. These attributes are embedded within a record that has an identifier. Because they don’t have independent identifiers, attributes cannot be referenced outside the context of a record. Linked data is based upon a model of triples consisting of subject, predicate, and object, which permits the assignment of identifiers at the attribute level. Identifiers can also be assigned to relationships between attributes. Hillmann is currently involved in building a registry that maintains and serves relationship identifiers: http://metadataregistry.org/ A vocabulary based on RDA should be completely registered within a few weeks. It will be freely accessible to support linked data applications implemented by libraries and others. The registry, combined with the availability of applications and tools such as those being developed in conjunction with the eXtensible Catalog project, constitute essential infrastructure required to enable the library community to become more actively engaged with both using and creating linked data.
Jennifer Bowen provided an overview of the eXtensible Catalog (XC) project, and described how XC supports linked data. One of the primary goals of the project is to build open source software that supports reuse of MARC-encoded library metadata in an extensible environment. Though it has added to the cost of development, XC has been designed specifically to support linked data. XC metadata is based on the FRBR model, and it supports a level of granularity similar to MARC. XC also facilitates metadata harvesting via OAI-PMH and transformation of Dublin Core (DC) metadata. The XC application profile is being developed in accordance with the guidelines for DC application profiles, though it does not mandate the inclusion of DC Metadata Initiative (DCMI) terms. XC requires that terms be defined in RDF, and it is designed to utilize metadataregistry.org. XC incorporates terms from several namespaces and defines a 37 custom elements in its own namespace. Some of the custom elements mirror elements defined in other metadata schemes that are not yet registered, such as RDA and MARC. One of XC’s biggest strengths is that it enables experimentation. It provides Web-based tools that support harvesting, troubleshooting, transformation, and enhancement of metadata outside the context of existing legacy systems. Librarians can explore new approaches to managing metadata with no danger of permanently corrupting or destroying data stored in legacy systems.
Next steps for XC:
- Finalize schema and registry of elements
- Publish application profile
- Identify and define metadata elements for user generated metadata
- Enable schema data to be harvested as RDF
Rebecca Guenther described efforts currently underway at the Library of Congress to make controlled vocabularies available as linked data. The Library of Congress Vocabularies Service is intended to facilitate development and maintenance of vocabularies maintained by LC and make them freely available to both libraries and the broader Web community. The service provides comprehensive information about the vocabularies in addition to the exposing the vocabularies themselves as linked data. Most vocabularies will be represented using the Simple Knowledge Organization System (SKOS), an RDF application that was recently finalized by the W3C. Currently, LCSH is the only vocabulary available, but others will be offered in the future, including LC Name authorities. The service also offers bulk download of data in RDF format. Now that the service is officially up and running, LC plans to advocate for use and solicit user feedback more actively. Also still to come: a mechanism for updating data as changes are made in the underlying vocabularies and the development of an OWL schema for LCSH to provide greater granularity and a means for expressing facets, since SKOS lacks this capability.
Kyle and I took some time out to explore Chicago on Monday afternoon and Tuesday morning, before catching our flight back to PDX. After having lunch with a friend, at a nice little cafe very close to the oldest house in Chicago, we hopped a cab from McCormick Place to the Field Museum. I had never visited the Field before. To our surprise, admission was free (thank you Target!) because it was the second Monday of the month, so that was nice. Unfortunately, this meant that the museum was absolutely crawling with people, especially day care/day camp groups. We saw Sue and wandered through most of the free exhibits, but it was just too crowded and noisy to linger very long reading signs or pondering things. I’m more interested in art museums than natural history museums anyway, so I wasn’t too bothered. Now I can say I’ve been there, and it was interesting to tour the building from an architectural point of view.
After escaping the crowds, we wandered around the rest of the “museum campus,” which also includes Shed Aquarium and a planetarium. We considered visiting the aquarium, but the cheapest adult admission was $24.95 and it was already 3:00 p.m. The price seemed a little steep given the amount of time and energy we had left. Besides, it was an absolutely gorgeous day for walking around outside; sunny, hardly a cloud in the sky, and not hot and humid (as one would expect in the Midwest in July). I took a few pictures of Kyle and the downtown Chicago skyline. Around about 5:00 p.m. we were ready to head back toward our hotel on the Magnificent Mile, so we took the water taxi up to Navy Pier, and spent some time wandering around there (since I hadn’t been there before either). After a short nap at the hotel, we headed to dinner at the Rock Bottom brew pub on the corner of State and Grand and had a nice, Oregon-style dinner of microbrew and American pub food. Kyle had chicken fried chicken–can’t get too much more American than that! After four days in the big city, we were definitely ready to head home.
On Tuesday, we didn’t have to be at O’Hare until about 1:30, so we wandered around the Magnificent Mile for a couple of hours. We did some shopping and stopped by the Water Tower, where we viewed historic photos of the aftermath of the Great Chicago Fire of 1871. Then it was back on the train to the airport. Our flight out of O’Hare was delayed by about 45 minutes on departure, but we made our connection in Denver no problem. I haven’t flown through Denver for several years, and it’s nicer now than I remember; speedy underground trains between the terminals and free Wi-Fi. We arrived at PDX right on time, and we arrived home at about 10:00 p.m. Many thanks to Janeanne and Kerry for dropping Powder off earlier in the evening. It was super nice to arrive home to greetings from a happy puppy!
All in all, the Chicago trip was a pretty good one. I attended several interesting sessions and made some valuable connections related to my sabbatical. And we enjoyed a few days of fine dining and sight seeing in the big city.
2009 ALA Annual Conference
- Attended my last LRTS Editorial Board meeting.
- Attended the OCLC Developers Network Luncheon.
- Met with Patrick Hogan and Nannette Naught to discuss ways I might be able to contribute to the development of the RDA Online Toolkit during my sabbatical.
Attended From ONIX to MARC and back again: new metadata service options at OCLC (OCLC sponsored program)
Renee Register provided an update on OCLC’s next generation cataloging project. The main focus of the project over the past couple of years has been partnering with publishers to better utilize and share UNIX and MARC metadata in WorldCat and publisher database records. OCLC has now started enriching WorldCat records with ONIX data, look for the code OCLNG in the 040 field. NISO and OCLC published a white paper: “Streamlining book metadata workflow” on June 30, 2009, and it is now available on the NISO website. Metadata services for publishers was launched in July 2009, based on the success of the pilot project. http://publishers.oclc.org has links to additional information and research. Next steps: incorporate WorldCat identities, produce additional mapping between terminologies, refine FRBR algorithms, and continue to build collaboration between publishers and libraries. Areas for collaborative work include: best practices, optimization of identifiers, and optimization of subjects.
Kyle and I had dinner with Cyril Oberlander and Lorcan Dempsey at Kitty O’Sheas in the Chicago Hilton.
2009 ALA Annual Conference
ALCTS Continuing Resources Section College and Research Libraries Interest Group program
Adam Chandler: Towards OpenURL Quality Metrics
Chandler described a study conducted at Cornell University in 2008 to evaluate the quality of OpenURL metadata. He noted that in the 10 years since the launch of the OpenURL standard, he could find no evidence of a study establishing quality benchmarks. In 2008-09, the Cornell OpenURL resolver received more than 400,000 requests, and studies show that users now expect to find online full-text for journal articles. It is therefore very important for OpenURL resolvers to perform with a high degree of accuracy and reliability. Unfortunately, there are many possible points of failure in current OpenURL systems. OpenURL metadata problems are a major source of failure. Following methodology developed by Hughes (2004), the Cornell study identified key elements used by various content providers in their link to targets (e.g. title, author, date) and used regular expression matching and Perl scripts to normalize data and build a database to analyze the variety of formatting found in OpenURL metadata from various providers. Chandler has also created an online tool for running reports on this database. Data analysis continues, and Serials Solutions recently agreed to provide some data. See: http://openurlquality.blogspot.com/ for updates and additional findings.
Peter McCracken: KBART Update
McCracken provided an update on the progress of the KBART Working Group. Three main problems with OpenURL currently exist: bad data, bad formatting, and lack of knowledge among vendors/suppliers about the value and importance of OpenURL. The KBART Working Group includes members from link resolver/ERM system vendors, publishers, subscription agents and aggregators, and the main goal of the initiative is to improve the quality of data for everyone and thereby improve the performance of OpenURL resolvers. Improving access for patrons will be the measure of success; if a library has online full-text content from one or more sources, the OpenURL resolver should return accurate results for all sources based on a single search. Right now, the focus is on identifying points of failure and developing solutions. Deliverables include: establishing best practices for delivery, content, and structure of data. They are currently working on identifying 15 distinct fields for data, and EBSCO has supplied a sample file of data for testing. They plan to analyze this file to identify missing data and determine how to fill in any missing elements. They also plan to work on how to routinely collect and take appropriate action on error reports and other feedback from libraries that could help to improve data quality. In the near future, they hope to work on methods for facilitating direct communication between third party resolver providers and database vendors, so that libraries no longer have to take total responsibility for informing vendors about the content that they have purchased. They also plan to address issues associated with consortial packages and non-textual resources.
Regina Reynolds: Best Practices for Presentation of E-Serials
Focused primarily on the problems caused when e-journal publishers and/or aggregators utilize latest entry style treatment for serial title changes in their systems. Basic problem: people cite articles using the title that the serial has at the time they read it. If the title changes later, those old citations persist. If publishers/aggregators eliminate all references to earlier title(s) in their systems, it makes it very difficult for end-users to locate known articles based on older citations. She cited JSTOR as an example of a vendor that provides excellent access and linkages between titles.
EBSCO Academic luncheon
Discussed challenges EBSCO has faced due to the recession. They are cutting costs internally, but not content. Can’t afford to lower prices for customers. Discouraged libraries from canceling journal subscriptions based on full-text availability in EBSCO databases because this drives up the prices that EBSCO must pay to license the full-text content, and will ultimately mean that EBSCO has to raise prices for its full-text database products. Provided information on upcoming database releases, including a new Art & Architecture database intended to compete with Art Abstracts and full-text versions of America: History and Life and Historical Abstracts. Libraries that own some flavor of Academic Search will receive pricing that reflects only the new full-text content they are getting if they subscribe to the full-text versions of these new products. Provided a preview of the EBSCO Discovery Service. New databases and Discovery Service to be released before the end of 2009.
Look Before You Leap: Taking RDA for a Test Drive
Arrived around 2:15 because the EBSCO luncheon didn’t let out until about 1:30, and I had to get myself from the Chicago Hilton all the way down to McCormick West. Even with pretty direct CTA bus service, transit between the downtown hotels and the convention center in Chicago took a long time (and the CTA buses were much faster than the ALA shuttle buses). I arrived in time to see most of Nannette Naught’s presentation, which demonstrated various aspects of the RDA Online toolkit product being developed by ALA Editions. I skipped out part way through the next speaker’s presentation, because it didn’t interest me much and spent some time cruising the conference exhibits. I stopped by the LibLime booth for a brief demo of Koha’s cataloging module. I returned to the RDA ses sion around 5:00 p.m., in the middle of a presentation about the plans for the national libraries RDA testing project. While I understand their concerns, this testing project seems too narrowly focused on the logistical aspects of implementing RDA, primarily in large library environments, with very little focus on actually assessing RDA itself and whether it is going to work any better than AACR2/MARC. I hope that somebody else is going to take a look at that aspect, since that’s the most important thing, as far as I’m concerned. I’d be willing to help out with such a project during my sabbatical if I could get hooked up with some similar-minded people.
Kyle and I treated Anya and Russ Arnold to dinner at Russian Tea Time to welcome Anya as the new Summit Program Manager at the Orbis Cascade Alliance.
2009 ALA Annual Conference
Attended Creating Library Web Services: Mashups and APIs
Generally a useful event, though not as instructive or productive as I had originally hoped. The first half of the day was primarily spent on discussing web services, APIs, etc. in general and demonstrating existing applications, most developed and deployed within the library context. I would have liked to get less cursory demo and more explanation of how the applications work, what tools are used to build and maintain them. The second half of the day was a bit more hands-on, exploring Yahoo Pipes, but the presenter didn’t provide much in the way of a structured exercise to practice on, so it was mostly just playing around. Yahoo Pipes looks interesting, but I wasn’t able to do much with it during the session. I will need to complete online tutorials and spend more time exploring it on my own in order to determine how I might be able to use it.
Kyle and I had dinner at the original Morton’s with Stephen Smith, our boss when we were cataloging Graduate Assistants at the University of Illinois library, and another former cataloging GA. A bit pricey, but delicious with good service. As we were finishing our meal, Dustin Hoffman walked into the restaurant for dinner. It is amazing; he looks exactly the same in person as he does on the screen.
Accomplishments:
- Received offer and verbally accepted an internship position with a software company.
- Flew to Chicago to attend the 2009 ALA Annual Conference
- Read Metadata (Zeng & Qin, 2008)
The day started out on a very positive note when I received an offer for an internship position I’ve been pursuing since mid-June. I’ll be living and working in Cambridge, MA for most of fall term 2009. I’m excited! The only downside is that I have to find housing, and I have to leave Kyle and Powder behind in Oregon.
The cheapest route to Chicago from Portland was to fly via DFW. The flight from PDX to DFW went without a hitch, but our luck ran out in Dallas. After all of the passengers boarded the plane and the doors had been secured, the pilot found out that the plane had a maintenance issue in the cargo hold that needed to be resolved before we could depart. We sat on the plane at the gate for about an hour while that was fixed. It wasn’t too bad; the American Airlines flight crew did a good job of keeping the passengers informed about what was happening and comfortable, which I really appreciated. A big change from years past when American would just leave you sitting there and not share any information at all. It did get a bit warm on the plane. It was 100+ degrees outside and the AC on the Super 80 just couldn’t keep up. It was after 10:00 p.m. when we finally got into O’Hare. Due to late-night construction work, it took us about 2 hours on CTA trains to get to our hotel, so it was after midnight by the time we checked-in. Fortunately our room at the Allerton Hotel was waiting for us. It was quite nice and comfortable.
My sabbatical has finally arrived! I feel both energized and relieved. I didn’t get to sleep in, though, because I had a phone interview scheduled at 7:00 a.m. for the Expert Information Organizer internship at ITA Software. The more I learn about this project, the more intrigued I become. We’ll see what happens.
The interview got me thinking about intelligent agents, a topic I have researched in the past, but haven’t followed closely over the past couple of years, so I decided to look around for some books/articles to update myself on the state of the art. I found a really interesting article, which was actually available full-text in ebrary Academic Complete, so I spent several hours reading and taking notes on it.
While the ebrary interface still leaves a lot to be desired, it was nice to actually use one of the books in the collection, rather than just mucking around with getting the metadata loaded into the online catalog and linked up with WorldCat. And, had I not done that work before I left, I probably wouldn’t have discovered that the WOU library has access to this book today, so that was gratifying.
Other things I accomplished today:
- I determined that my new Acer Aspire One netbook can endure about 5.5 hours of continuous use on a fully charged battery. I also continued configuring it: installed Thunderbird, Zotero (2.0 beta), and the Add to Connotea button on my Firefox toolbar menu.
- Experimented a bit with Connotea. It seems kind of clunky. There seems to be quite a time lag required to update MyLibrary, and I don’t yet see/understand how to get it to handle citation data in a structured way. Guess I’ll have to invest some more time in learning to use it.
- Biked to Rickreall and back on this perfect summer day. Felt very good even though I haven’t been on a bike in more than a year. Also did 30 minutes of Yoga (Yogamazing: Yoga for Flexibility) and walked the dogs after Kyle got home from work.
All in all, a pretty relaxing and productive start to my sabbatical.
Biskup, Thomas, Heyer, Nils, and Marx Gómez, Jorge. (2007). Building Sound Semantic Web Frameworks for Scalable and Fault-Tolerant Systems. In: Sugumaran, V. (ed.): Application of Agents and Intelligent Information Technologies. Hershey (PA): Idea Group Publishing, pp. 153-181.
Summary
Describes a theoretical framework for implementing the Semantic Web. Core technologies: agents, ontologies, Web services, and personalization. Provides a useful overview explaining the roles that existing technologies, including: XML, RDF, HTTP and SOAP, and ontology languages like SHOE and DAML, can play. Defines the WASP (Web services, Agents, Semantic Web, Personalization) model and a related system architecture named HIVE. WASP will utilize Web services for communication, Agents for modeling typical tasks and solutions, Semantic Web technologies (e.g. XML, RDF) as “a means to provide data and information in a consistent manner that allows retrieval and reasoning”, and Personalization technologies to configure processes to meet needs expressed by individual users. Agents are the central component in this framework because they implement the business logic. Introduces the concept of hyperservices, which are derived from integrating the core WASP technologies into a new type of service infrastructure. Analyzes and presents functional requirements for Semantic Web architecture.
My impressions
This article helped me get a better understanding of the big picture and how all of the various technological pieces (e.g. XML, RDF, SOAP, etc.) could fit together to realize the Semantic Web. I am intrigued by the central role that agents play in this model, and I could see a role for people (possibly librarians, catalogers in particular) in helping to define and develop agents and metadata schema, ontologies, etc. to provide an increasingly structured environment for agents to operate in.