Pages
Categories
- Figure Skating (9)
- Keiko (3)
- Knitting (2)
- mashups (1)
- Metadata (16)
- WorldCat (5)
- Music (1)
- News (2)
- NextGenCatalogs (1)
- On my mind (13)
- Personal (13)
- Research (6)
- ReadingNotes (5)
- Sabbatical (37)
- SemanticWeb (3)
- Theatre (7)
- Travel (5)
- Work (4)
Blogroll
Archives
Meta
Big news announced today in library land: SkyRiver Files Antitrust Suit Against OCLC
SkyRiver Technologies vs. OCLC (full-text PDF)
It’s hard to imagine that much good will come from this. A whole bunch of resources (time, energy, money) that could otherwise go into enhancing library systems and services will be wasted on this legal battle. Services provided by Google and others will continue to marginalize and undermine libraries, while the “players” in the library sphere squabble themselves into oblivion. Will they take the entire field down with them?
A code4lib post about OpenURL resolution and WorldCat got me thinking about something again …
Matching discrete resources across databases is a huge problem right now. Too frequently, searches for items based on more or less complete bibliographic citations will fail to locate the resource in one or more databases due to variations in how bibliographic data is structured and/or varying levels of completeness in the metadata. How could one go about developing an algorithm that could automatically generate some sort of identifier for resources that lack unique identifiers (or for which some combination of elements such as author’s last name, title, date of publication, but not identifier is known) that could be dynamically applied when searching a mega-index like Summon or WorldCat. Similar to the algorithms that Gracenote uses to match music CDs to metadata in their database. This seems like something that could be tackled with natural language processing techniques. How difficult would it be to develop a basic algorithm? How would one approach testing and optimizing such algorithm(s) to improve cross domain search results? I can’t believe that somebody, somewhere isn’t working on this already. But how to locate those people/that research? Is it a sub-set/specialized application of relevancy ranking?
JEFFREY BEALL: OCLC: A Review p. 85-93
From the opening paragraph: “…I aspire to the high road: objective analysis, keeping in mind that the word radical is in this book’s title.” (p. 85). While I’m not a huge fan of OCLC myself these days, this chapter offers very little by way of objective analysis. It is more of a curmudgeonly rant. Offering little or no compelling evidence, Beall alleges that:
- OCLC is more interested in hiring MBAs that MLSs
- OCLC’s primary mission is to “separate libraries from their money” (p. 87)
- The launch of the Connexion Client software was a complete fiasco that lasted several years
- OCLC does not provide sufficient incentive (or support) for libraries to upgrade records in WorldCat, leading to a serious degradation of quality in the database
- OCLC Research is little more than a propaganda machine.
The only one of his allegations that I tend to agree with is 4. The lack of credible evidence and the overall belligerent tone of this essay makes any criticisms Beall levels, regardless of their validity, excessively easy to dismiss. How does this help solve the very real problems that catalogers face today?
BETH THORNTON: The Existential Crisis of a Cataloger p. 13-17
Presents a summary of challenges facing cataloging, supported by analysis and quotes from reports by Calhoun (2006) and the UC Libraries Bibliographic Services Task Force (2005), among others. She describes her personal response to the current situation, including conflicted feelings. Some of the criticisms leveled at the cataloging establishment in these reports are valid, so what can librarians do to address them and remain relevant? Thornton cites a number of available resources:
- Libraries continue to advertise cataloging positions, and they are hiring young, intelligent, energetic professionals for these positions.
- At least some people within the library community continue to recognize the value of traditional cataloging standards and vocabularies that have evolved over many years and continue to do so.
- Cooperative cataloging organizations seek to provide training and opportunities for innovation through collaboration. She cites CONSER efforts to simplify serials cataloging, while continuing to create records that serve user needs.
- Catalog advocates like Thomas Mann.
Thornton concludes by saying that librarians should bring valuable cataloging traditions with us as we “chase after the shiny things” that are paving the way to the future.
Overall impression
Provides valuable insight into the contemporary library cataloging environment. Concludes that the Library of Congress (LC) currently shoulders a disproportionate share of the cost of producing catalog records, contributing to an overall impression among librarians and vendors that the cost of such production is substantially lower than it actually is. The authors argue that this has contributed to the development of a situation in which neither libraries nor vendors are willing to invest the resources required to do original cataloging of many mainstream materials, leading to the continued accumulation of backlogs and hindering the efficacy of cooperative cataloging programs. Alongside the helpful data and analysis, the report also includes some highly questionable estimates, assumptions and assertions. Excellent bibliography featuring many recent reports and articles analyzing cataloging practice and the future of cataloging.
Opportunities for additional research
- To what extent do the requirements of complying with national/LC standards currently contribute to the cost associated with original cataloging, especially for organizations that are much smaller than LC, where economies of scale limit development of human and technical resources to support cataloging operations?
- The methodology utilized to estimate the current number of catalogers in North America is highly questionable. The admittedly rough estimates presented in this report need to be subjected to additional scrutiny and research. Desperately!
- This report repeatedly asserts that substantial resources are currently wasted by catalogers who insist upon editing records of already acceptable quality for local use without valid reason or sufficient benefit. The report does not present sufficient evidence to support this assertion. More focused study is needed to examine this issue, so that we can understand the extent to which local edits are made and why. The report describes this type of editing as entirely unnecessary and “redundant.” To what extent is this true? What is the true cost-benefit of this type of editing? To what extent is this “redundant” effort the result of current systems limitations and/or architecture, in which copies of the same record are stored in isolation from one another and catalogers have very limited ability to make useful edits to “master” records for a variety of technical and not-so-technical reasons?
Notable quotes
p. 4: “Conclusions are based on careful consideration of survey results; interviews and conversations with practicing librarians and vendors; discussion among members of the Bibliographic Record Production social network; extensive reading; participation in the OCLC/NISO Metadata Symposium (April 2009); and our own direct experience with cataloging production and distribution.”
p. 14: “In response to the query about what else we should know about libraries’ MARC records environments, there was great appreciation for LC’s work as well as pleas that it continue. There is general dismay about the quality of non-LC vendor provided records. There were several negative remarks about LC’s cessation of Series Authority work. Many libraries stated that free records are essential to their operations. And many reported difficulty managing vendor supplied record sets for eBook collections and serials packages.”
p. 25: “This tension — between community values and commercial values, between idealism and pragmatism, between social responsibility and private benefit – has deeply affected some aspects of the library market. Cataloging, regarded by many as the heart of librarianship, is one of those areas.”
p. 26: “In general, libraries understate (or simply don’t recognize) the full costs associated with cataloging. This renders questionable any comparison with stated prices from vendors, who typically do have a good handle on costs, since their continued operation depends upon it.”
p. 27:” [due to the contributions of LC]…an entire industry has developed around free (or at least very cheap) MARC records. …many libraries and vendors benefit from a product for which production costs are not recovered.”
p. 27: “Over the past five years, LC has absorbed significant budget cuts, and faces continuing pressure. It has undertaken major staff reductions, especially in its cataloging operations. CIP, as a program that is not directly related to LC’s mission, and for which the costs of production divert staff resources from other programs, must obviously be considered for adjustment. Given the existing level of dependency, such a change would affect the entire profession and the industry based on that profession.”
p. 27: “The market for cataloging records is in some important respects dysfunctional. In our view, the biggest issue is that the market lacks sufficient incentives to stimulate the production of new cataloging records. Obviously, many books, journals, electronic resources and other items are being cataloged, so some elements of the market are working. But structurally, it seems clear that something is amiss.”
p. 27: “Because of their own staffing constraints, or unwillingness to bear the cost of original record creation, many libraries simply wait for another library to catalog an item they have already received. On average those items are held for three to six months, with periodic searches of OCLC to determine whether another library has blinked. While this makes sense as a way of controlling costs, it does not provide optimal service for users.”
p. 29: “Why do so few libraries join BIBCO or CONSER (which also relies on fewer than 50 members)? And what factors determine which records are contributed? It seems likely that titles most needed and valued locally by the contributing library would take priority. Because most of the big PCC contributors are research libraries, it’s also likely that many titles are specialized, and therefore may not be widely held. Therefore, fewer libraries would benefit from that contribution, as opposed to, say, DVDs, which are cited as a major problem by most libraries in the survey. Regardless, these libraries are committed to contributing to the community. Which titles they catalog would matter less if participation were broader. A cooperative system only works well if everyone participates.”
p. 29: “But somehow the incentives to produce new cataloging records are insufficient, from both the commercial and community viewpoints. Otherwise, there would be greater participation in cooperative programs, and/or more vendors seeking to become cataloging agencies. At bottom, we believe this is because cataloging costs and therefore prices are understated and artificially depressed. An even more sobering possibility is that the profession does not believe that cataloging is worth what it costs to create it; that will be quickly determined once all production costs are factored into the price. In the meantime, this is a market that appears to require adjustment.”
p. 33-34: “While major economies of scale have been realized, troublesome issues remain. First, there are thousands of small libraries that operate outside of the shared cataloging infrastructure. Most of these lack the capacity to produce MARC records—they have no catalogers. Second, catalogers have an almost unstoppable urge to improve, tweak, customize and “localize” national-level records; redundant work is still widespread. Third, despite efficiencies, cataloging backlogs continue to grow, not only for audio-visual materials, rare books, and non-Roman languages, but even for the most commonly-held materials.”
p.34: “We now operate in a context where questions about the efficacy of the MARC record and the centrality of the OPAC are continually posed. We wrestle with keeping libraries relevant, and assuring their participation at the network level. We have also entered an era where questions about cost and return on investment are routinely asked of non?profit entities such as libraries. In the long run, there may be better and cheaper alternatives than MARC. In the short run, there may be ways to reduce the cost of producing MARC records.”
p. 34: “LC’s own ILS cannot, for instance, accept ONIX records directly – they must first be converted to MARC21. This will undoubtedly change over time, but for now, most libraries will continue to need cataloging records delivered in MARC format—it is the only usable solution.”
p. 34: “There remain strong arguments for use of standard cataloging principles–controlled vocabulary, classification, subject analysis, and authority control—packaged and delivered in a consistent format. While MARC records may need to be extended, embellished (supplemented with full text, flap copy, excerpts, user tags), for now they provide a common standard and a cooperative infrastructure that controls costs.”
p. 35:”Once records have been distributed to vendors, most seek to add value. In general, this means matching records to a group of titles being shipped, and adding fund, location and electronic invoicing data, updating or adding proxy prefixes to URLs. These value adds are important to workflows, but do not necessarily change the bibliographic data (although some vendors do perform CIP upgrades if they are needed). Once libraries receive the records, via OCLC, the vendor, or another source, many also seek to add value in other ways. As noted in the survey results, 80% of libraries perform some degree of local editing on the records, to customize them for their own constituency. Increasingly, libraries are adding or linking to table of contents, Amazon, and other external sources in order to enrich the bibliographic description.”
p.36: “Another factor affecting shared capacity is the increasing emphasis on making accessible material that is unique locally. This shifts priorities within a given library, and makes it more difficult to share capacity, and to dedicate hours to work that might benefit the community as a whole over one’s own institution.”
p.36:”While it is true that the records produced by LC need to be better supported, it is difficult to imagine the profession and the industry without them. They provide enormous value, to a degree that is difficult to calculate.”
Understanding the New Discovery Landscape: Federated Search, Web-scale Discovery, Next-Generation Catalog and the rest Webcast
Date: Thursday, May 6, 2010
Time: 2:00 PM EDT
Duration: 60-minutes
Sponsored by Serials Solutions – http://www.serialssolutions.com/summon
Archived copy of complete presentation
My summary and reaction:
- Webinar was essentially an advertisement for Summon.
- Summon sounds like a promising product, worthy of more investigation.
- Does Summon help libraries expose/integrate the contents of their collections on the broader Web, e.g. into Google/Google Scholar results? Libraries definitely need the metadata aggregation, unified index and holdings/rights management pieces, but how much do we really need the search infrastructure and interface? What libraries really need is a method for putting the information about what they have out there where users will encounter it when they are searching Google, etc. The library web site is not where users are inclined to do a lot of their searching, and not just because most of our current interfaces suck. Information resources under bibliographic control by libraries are an ever shrinking portion of the information environment that users want to search. Does Summon help to address the larger problem of integrating discovery of “library” resources with those of the entire Web? While certainly a step in the right direction, Summon still seems to focus on library resources, segregated from the rapidly expanding collection of information resources available on the Web in general.
- How much does Summon cost? Can Summon be implemented at the consortial level (e.g. by Orbis Cascade Alliance for all or selected members who elect to participate)?
- Can Summon access resource sharing data to help connect users with resources that are available through consortial resource sharing agreements, as opposed to locally owned/licensed resources?
- Is Summon just another “black box” product seeking to substitute for what libraries really need: access to metadata and tools that allow them to create and maintain aggregated indexes of content for themselves, that they can then expose to mainstream web resource discovery platforms (e.g. Google)?
Speakers:
- Marshall Breeding
- Helen Livingston, University of South Australia
- Jane Burke (moderator)
Marshall Breeding:
- Crowded landscape of information providers on the web (Google, Wikipedia, Amazon.com, Ask.com)
- Weaknesses in the interfaces to our current systems drive users toward other sites that have better interfaces, even if we provide better quality resources.
- Catalog is primary tool libraries have provided for years: book, card, OPAC, NextGen, now moving toward Web-scale
- NextGen catalog: modernize interface to incorporate mainstream features now “expected” by end-users, based on their experience on the broader web
- Most libraries now have web sites as well, need to integrate web site and catalog
- Most libraries currently offer a disjointed approach: “menu of silos”, this is an obstacle for users. They don’t care about the complex infrastructure behind the scenes, they want a unified experience on the front-end. Bring all library content together into a single entry point, not dumbed-down.
- Complicated by the fact that many electronic resources libraries to which libraries provide access are siloed in proprietary systems, not easy to provide integrated access.
- Discovery layer applications seek to address this problem.
- http://www.librarytechnology.org/discovery.pl provides a list of current discovery apps
- We need to take advantage of freely available free-text, complemented by high-quality library metadata to provide a better discovery and access system for users. No solution for this yet.
- Federated search is insufficient, problematic.
- Discovery tool based on local pre-populated index (e.g. from ILS, local repositories) is also insufficient, but perhaps required to make the connection between local and web-scale discovery tools.
- Web-scale applications seek to: build/provide a consolidated index that includes all content (local and remote, often article-level content, local social networking content) in a single search.
- See Breeding’s book Next-Gen Library Catalogs published by Neal Schuman for additional information
Helen Livingston:
- UniSA Library
- Started with Voyager catalog
- Added VuFind in Jan. 2009
- Implemented Summon Sept. 2009
- Recently implemented Moodle for online courses
- 75% of funds spent on electronic resources
- Huge journal collection via e-journal aggregators
- Have placed Summon search boxes throughout the university web site, will soon be in Moodle
- Implementation: massive data clean up to implement VuFind was invaluable in quickly implementing Summon. Esp. had to clean up 008 and control fields because Summon uses these fields for searching/filtering.
- Summon implementation polarized library staff: some embrace it fully, some feel like it dumbs-down searching too much, perhaps sufficient for undergraduate research, but insufficient for higher level research needs.
- Survey: overwhelmingly positive response from users.
- Summon currently has some weaknesses, see slides for detailed list of those they’ve found.
- In summary: She loves it
Jane Burke:
- Summon overview
- Students want to be self-sufficient and think they are, they don’t want to ask questions
- Very important to provide “modern” web-based services
- Anthropological research done prior to developing Summon system.
- Main goal: is to create a clear and compelling starting place for search
- Summon built on the same type of infrastructure as Google, etc. Built to scale.
- Normalized metadata => unified index => holdings/rights => relevance ranked results => API => user interface
- Simple, easy, fast. Give users tools to help them refine results set.
- Supports most major IR systems.
- Includes database recommender service for individual databases.
- Provides mobile app.
- http://serialssolutions.com/summon
Questions:
- Importance of currency of coverage? One measure of the quality of discovery application.
- Currently no ” did you mean” in Summon
- UniSA: link resolver needs improvement, now that people can discover the stuff, they get frustrated when they can’t get to the full-text content due to weakness in the link resolver.
- Summon supports explicit boolean searches.
Chapter 11 – Using Tools to Create Mashups
Google Mashup Editor (GME) (now defunct, see Google App Engine)
Creating mashups is difficult due to the need to learn how to use multiple APIs and knit together information from divergent sources. A framework such as RDF promises to make this easier to accomplish, due to standardization. Cites http://simile.mit.edu/ as an example of a project that provides a practical way to look at Semantic Web technology. Tools facilitate reuse of code, “simplify the routine stuff,” allowing the developer to focus on solving problems at a more abstract level.
The example in this chapter involves building a mashup that shows geotagged photos retrieved via the Flickr API on a Google map. It uses GME and Yahoo!Pipes to extend a mashup created in Chapter 10 using PHP and JavaScript. Now that GME is dead, it appears that you can’t actually run the code developed in this chapter, which is unfortunate. Yee provides an example of iterative app building, however, and it is interesting to note how GME tags, HTML and JavaScript all get jumbled together in a single file.
GME was a text-based programming environment; Yahoo!Pipes is visual. Yahoo!Pipes can convert XML to RSS 2.0. Chapter 4 includes a tutorial on using Yahoo!Pipes.
Google Code provides hosted Subversion services for software development. Appears to be free.
Advantages of using mashup tools like GME and Yahoo!Pipes:
- Server is hosted, no need to run you own
- Hosting server takes care of server-side (i.e. PHP) functionality, so no knowledge of PHP is required; only have to know some JavaScript to handle client-side processing
- Access to Subversion services
- Access to apps/code snippets/ideas/tips from other developers working with tools
- Makes it easier for developers to quickly implement/test ideas, esp. in situations where they don’t have a long-term stake in a particular application.
Disadvantages of using mashup tools:
- Each tool requires you to learn a new framework
- Sometimes tools don’t give you exactly what you want/need
- Must reveal code to host, and your code is “branded” to a particular host
- Application is dependent upon host; problematic if host discontinues service (as GME already has)
- Allows non-programmers to create mashups, but leaves them vulnerable if they are dependent upon the underlying services for production applications.
Yee’s overall assessment of mashup tools like GME and Yahoo!Pipes: “I think that the GME and Yahoo! Pipes makes it easier for programmers to create certain types of mashups, though it’s not so clear whether they open up mashup development for a nonprogramming audience.”
Useful links from this chapter:
Yee provides a useful table of mashup tools near the end of the chapter. I created an annotated version of this table and published it via Google Docs.
An article published in today’s Chronicle of Higher Education offers a tantalizing bit of insight into one approach Google is considering for cleaning up and enhancing Google Books data and metadata. They are reportedly offering substantial grants and soliciting proposals from select humanities scholars for projects such as: “Developing systems for crowd-sourced corrections to book data and metadata.” It will be interesting to see how humanities scholars respond to that particular suggestion; seems like it might be of more interest to librarians/catalogers than to the scholars themselves. I know that I’d love to work with Google on developing this type of system. I’m certainly curious to see what they come up with, if anything.
My library recently implemented WorldCat Local as the default catalog/discovery interface for our collections. I learned about a new book today, so I went to the our new catalog to see about getting a copy. I typed the title of the book (“bright sided”) into the WorldCat Local search box and clicked search. The first page of search results gave me 4 articles that appear to provide reviews of the book. You can see for yourself the remaining 6 results, all articles, none of which appear to have anything to do with either the book or the terms I searched. For example, the title of the 5th item is: “The physical state and plasma biochemical profile of young calves on arrival at a slaughter plant.” When I do an Advanced search and enter my terms as Title, the book shows up as the 5th item in the results list (below the reviews and the “fold”, so I must scroll down to discover this).
When I do a keyword search on the terms “bright sided” in the old catalog interface, I get the response “no entries found” and a prominently placed button that I can click on to pass my search through to our consortial catalog, Summit, where the book comes up as the first item in the results list.
This seems like a pretty common use case: person finds out about a new book and goes to the library to see if they can borrow a copy. Which catalog interface performs better for the user? I’d say the old catalog since it tells me immediately that my library does not own the book in question, and provides an easy way for me to repeat my search in the consortial catalog, where the book is found immediately, and I can request it immediately. Because WorldCat Local automatically promotes hits for items owned by my library to the top of the results display, regardless of any other measure of relevancy to the search, the WorldCat Local interface plunges me into confusion and leaves me there wondering where to go next to find the answer to the relatively simple question: can I borrow this book? Ironically, the free WorldCat.org interface actually does a better job of answering my question than subscription-based WorldCat Local because the book appears as the first hit in the results list, and it shows libraries near me (based on my IP address) that own the book.
Perhaps the “show me first what my library owns” algorithm that is the main product distinction/selling point of WorldCat Local works better with fuzzier, topical keyword searches. I don’t know, I haven’t really researched that. It sure doesn’t seem to offer the best approach for known item searches, however, especially in cases where the local library doesn’t own a given title (or doesn’t have holdings attached properly in WorldCat).
I skated for about 3 hours this morning, and it was really discouraging. I wasn’t particularly tired, and I wasn’t sore like I was from doing too many side lunges and squats in my at-home workout last week, but I just felt really sluggish and awkward on the ice today. My left knee isn’t holding up well to skating twice a week (for about 6 hours total). It is sore pretty much all the time now, and it gets very stiff very quickly if I take a short break from skating (e.g. for ice resurfacing). I’m still struggling with my forward outside and inside double 3-turns and with any maneuver that involves my left back outside edge (e.g. clockwise backward crossovers, backward swing rolls, chasses or progressives). I don’t know if it’s related to the pain my my left knee or not, but I am really having difficulty getting comfortable over my left side, especially when going backwards. And I just can’t get the kind of power that I need to fill out the patterns on the Bronze dances.
On the positive side, my backward power 3-turns felt better than usual today, and I’m definitely making progress on keeping my hips pulled up and in check in general. I worked my forward inside 3-turns quite a bit today, as well as the 8-step mohawk sequence from the Juvenile moves, and those were feeling more secure by the end of the session today. Sigh. I hope that I’m feeling less sluggish and clumsy on Friday for my lesson with Leone.