Document Type

Conference Proceeding

Publication Date

4-2010

Abstract

Web 2.0 has changed the way we share and keep up with information. We communicate through social media platforms and make the information we exchange to a large extent publicly available. Linked Open Data (LOD) follows the same paradigm of sharing information but also makes it machine accessible. LOD provides an abundance of structured information albeit in a less formally rigorous form than would be desirable for Semantic Web applications. Nevertheless, most of the LOD assertions are community reviewed and we can rely on their accuracy to a large extent. In this work we want to follow the Web 2.0 spirit by first using LOD as fact corpus and training data to automatically create domain models, but second to expand the LOD by adding automatically extracted assertions after careful evaluation, thus completing the knowledge lifecycle. The creation of these models is fully automated but open to user-interaction. It will rely on a combination of different strategies using search, link-graph-analysis, information extraction and evaluation. The following steps show how to get from initial data to more data while building models on the way: 1. Linked Open Data: Data, Information, Knowledge to be freely used for Web applications. 2. Model Creation: Build a hierarchy of concepts relevant to a domain based on an initial set of descriptive keywords. Then enhance the hierarchy with facts that relate its concepts. These facts are, where available, taken from LOD directly or automatically extracted from text, using LOD as training facts. 3. Evaluation and Use: Experts and other users that are experienced in the field of interest evaluate the models explicitly by judging the accuracy of facts and implicitly through model-augmented searching, browsing and classification. Sufficiently vetted facts can then be automatically added to the LOD cloud. The evaluation thus completes a self-advancing cycle for sharing of knowledge as well as the refinement of the fact extractor and LOD in general. The models that are created can be used to aid in search, browsing and classification of content. A domain model here is a formal representation of a field of interest that does not aim at providing the same representational rigor that would be expected from ontologies that are based on formal logics, but still provide a concise and closed description of a domain with categories, individuals and relationships. Since the models are created on demand, high-interest domains will soon also have a stronger representation on the LOD cloud after the extracted facts have been vetted. Very often the evaluating the model, i.e. evaluating the extracted facts in the model, is equivalent to the intended use of the model. This makes the evaluation an integral part of the knowledge life cycle. It uses human/social computation to verify facts and allows us to add new knowledge to LOD, thus advancing the overall state of knowledge of the Web.

Comments

Presented at WebSci10: Extending the Frontiers of Society On-Line, Raleigh, NC, April 26th-27th, 2010.


Share

COinS