Pages
Categories
- Figure Skating (25)
- Keiko (3)
- Knitting (2)
- mashups (1)
- Metadata (16)
- WorldCat (5)
- Music (1)
- News (4)
- NextGenCatalogs (1)
- On my mind (15)
- Personal (14)
- Research (6)
- ReadingNotes (5)
- Sabbatical (38)
- SemanticWeb (3)
- Theatre (7)
- Travel (5)
- Work (4)
Blogroll
Archives
Meta
I heard a really interesting story on All Things Considered last Thursday. A computer scientist from Carnegie Mellon came up with a brilliant idea. He developed a system that uses text from digitized print sources that OCR programs could not correctly decipher as the anti-bot key text that appears on certain websites (like Ticketmaster). So, now, instead of just keying computer-generated text rendered in fuzzy images, people are prompted to key words that are often easy for humans to recognize, but almost impossible for the bots to comprehend. The websites also capture the data entered by the end-users in order to improve the text capture for digitized books. When enough people agree on what the word is, the data is fed back to the source digitization project and used to improve OCR-generated full-text. The really cool thing is how much work has been accomplished through micro-contributions of time and knowledge made by millions of people.
This is just totally cool. Wouldn’t it be fabulous if we could find similar methods for capturing data that would help to improve metadata for bibliographic resources? Imagine if OCLC could come up with a similar mechanism for collecting variations on WorldCat master records made by individual libraries and individual users. Master records could be enhanced substantially without painstaking work from OCLC Quality Control staff.
The most valuable and expensive aspect of cataloging is capturing human knowledge effectively. We need systems that will allow end users to make small contributions to enhancing metadata easily and seamlessly, and give professionals the tools they need to quickly and systematically analyze this data, so that it can be incorporated it into the infrastructure. That seems like a key part of the Semantic Web: developing ways to capture, organize, and relate little bits of information and knowledge from all over the place into a coherent whole.
No Comments »
No comments yet.
RSS feed for comments on this post.
Leave a comment
You must be logged in to post a comment.