Kno.e.sis Publications

Provenance Context Entity (PaCE): Scalable Provenance Tracking for Scientific RDF Data

Satya S. Sahoo, Wright State University - Main Campus
Olivier Bodenreider
Pascal HitzlerFollow
Amit P. Sheth, Wright State University - Main CampusFollow
Krishnaprasad Thirunarayan, Wright State University - Main CampusFollow

Document Type

Conference Proceeding

Publication Date

2010

Abstract

The Resource Description Framework (RDF) format is being used by a large number of scientific applications to store and disseminate their datasets. The provenance information, describing the source or lineage of the datasets, is playing an increasingly significant role in ensuring data quality, computing trust value of the datasets, and ranking query results. Current provenance tracking approaches using the RDF reification vocabulary suffer from a number of known issues, including lack of formal semantics, use of blank nodes, and application-dependent interpretation of reified RDF triples. In this paper, we introduce a new approach called Provenance Context Entity (PaCE) that uses the notion of provenance context to create provenance-aware RDF triples. We also define the formal semantics of PaCE through a simple extension of the existing RDF(S) semantics that ensures compatibility of PaCE with existing Semantic Web tools and implementations. We have implemented the PaCE approach in the Biomedical Knowledge Repository (BKR) project at the US National Library of Medicine. The evaluations demonstrate a minimum of 49% reduction in total number of provenance-specific RDF triples generated using the PaCE approach as compared to RDF reification. In addition, performance for complex queries improves by three orders of magnitude and remains comparable to the RDF reification approach for simpler provenance queries.

Comments

The featured PDF document is the unpublished, peer-reviewed version of this proceeding.

The featured abstract was published in the final version of this proceeding, which appeared in Lecture Notes in Computer Science, 6187 and may be found at http://link.springer.com/chapter/10.1007%2F978-3-642-13818-8_32 .

This report is from proceedings of the 22nd International Conference, SSDBM, Heidelberg, Germany, June 30-July 2, 2010.

Repository Citation

Sahoo, S. S., Bodenreider, O., Hitzler, P., Sheth, A. P., & Thirunarayan, K. (2010). Provenance Context Entity (PaCE): Scalable Provenance Tracking for Scientific RDF Data. Lecture Notes in Computer Science, 6187, 461-470.
https://corescholar.libraries.wright.edu/knoesis/17

DOI

10.1007/978-3-642-13818-8_32

Download

Included in

Bioinformatics Commons, Communication Technology and New Media Commons, Databases and Information Systems Commons, OS and Networks Commons, Science and Technology Studies Commons

COinS

Kno.e.sis Publications

Provenance Context Entity (PaCE): Scalable Provenance Tracking for Scientific RDF Data

Document Type

Publication Date

Abstract

Comments

Repository Citation

DOI

Included in

Search

Browse

About

SelectedWorks Sites

Kno.e.sis Publications

Provenance Context Entity (PaCE): Scalable Provenance Tracking for Scientific RDF Data

Authors

Document Type

Publication Date

Abstract

Comments

Repository Citation

DOI

Included in

Share

Search

Browse

About

SelectedWorks Sites