Pages
Categories
- Figure Skating (25)
- Keiko (3)
- Knitting (2)
- mashups (1)
- Metadata (16)
- WorldCat (5)
- Music (1)
- News (4)
- NextGenCatalogs (1)
- On my mind (15)
- Personal (14)
- Research (6)
- ReadingNotes (5)
- Sabbatical (38)
- SemanticWeb (3)
- Theatre (7)
- Travel (5)
- Work (4)
Blogroll
Archives
Meta
A code4lib post about OpenURL resolution and WorldCat got me thinking about something again …
Matching discrete resources across databases is a huge problem right now. Too frequently, searches for items based on more or less complete bibliographic citations will fail to locate the resource in one or more databases due to variations in how bibliographic data is structured and/or varying levels of completeness in the metadata. How could one go about developing an algorithm that could automatically generate some sort of identifier for resources that lack unique identifiers (or for which some combination of elements such as author’s last name, title, date of publication, but not identifier is known) that could be dynamically applied when searching a mega-index like Summon or WorldCat. Similar to the algorithms that Gracenote uses to match music CDs to metadata in their database. This seems like something that could be tackled with natural language processing techniques. How difficult would it be to develop a basic algorithm? How would one approach testing and optimizing such algorithm(s) to improve cross domain search results? I can’t believe that somebody, somewhere isn’t working on this already. But how to locate those people/that research? Is it a sub-set/specialized application of relevancy ranking?
No Comments »
No comments yet.
RSS feed for comments on this post.
Leave a comment
You must be logged in to post a comment.