As a part of our research about the application of Semantic Web technologies in digital collections of libraries, archives and museums in Spain, we are working on methods to link automatically entities to knowledge bases (VIAF, DBpedia, etc.).
Tim Berners-Lee established at 2006 four rules to build the Semantic Web. The fourth rule is “Include links to other URIs so that they can discover more things”. When we talk about digital collections of libraries, archives and museums, this fourth rule is one of the most difficult to implement (the other three rules can be managed automatically from bibliographic records markup in Dublin Core, MARC21,etc.).
Unlike unstructured text, the entity linking applied to bibliographic records has the advantage that entities are semantically well defined by properties (ex. dc: creator, dc:subject, etc.). This means that the first step of identifying, disambiguating and categorizing named entities is solved. Thus, the main issue is how to search and match in VIAF, DBpedia, etc. the persons, organizations, subjects, locations, etc. included in bibliographic records. And this is a problem because a) sometimes the entity surface form in the bibliographic record is not the same that the knowledge base surface form, and b) some surface forms in the bibliographic record has more than one match in the knowledge base (ambiguity).
Nowadays, we are experimenting with OpenRefine to trying to solve these problems. We are defining several algorithms that would have to allow to obtain the exact match between bibliographic entities and knowledge base entities.
We will keep you update of our progresses…
Enjoy it!
Andreu Sulé
University of Barcelona