Document Type


Publication Date



The eScience paradigm is enabling researchers to collaborate over the Web in virtual laboratories and conduct experiments on an industrial scale. But, the inherent variability in the quality and trust associated with eScience resources necessitates the use of provenance information describing the origin of an entity. Existing systems often model provenance using ambiguous terminology, have poor domain semantics and include modeling inconsistencies that hinders interoperability. Further, mere collection of provenance information is of little value without a well-defined and scalable query mechanism.

In this paper, we present 'PrOM', a framework that addresses both the modeling and querying issues in eScience provenance management. The theoretical underpinning for PrOM consists of, (a) a novel foundational ontology for provenance representation called 'Provenir', and (b) the first set of query operators to be defined for provenance query and analysis. The PrOM framework also includes a scalable provenance query engine that supports complex queries (high 'expression complexity') over a very large real world dataset with 308 million RDF triples. The query engine uses a new class of materialized views for query optimization that confers significant advantages (up to three orders of magnitude) in query performance.