We develop a data integration approach for the efficient evaluation of queries over autonomous source databases. The approach is based on some novel applications and extensions of constraint database techniques. We assume the existence of a global database schema. The contents of each data source are described using a set of constraint tuples over the global schema; each such tuple indicates possible contributions from the source. The “source description catalog” (SDC) of a global relation consists of its associated constraint tuples. Such a method of description is advantageous since it is flexible to add new sources and to modify existing ones. In our framework, to evaluate a conjunctive query over the global schema, a plan generator first identifies relevant data sources by “evaluating” the query against the SDCs using techniques of constraint query evaluation; it then formulates an evaluation plan, consisting of some specialized queries over different paths. The evaluation of a query associated with a path is done by a sequence of partial evaluations at data sources along the path, similar to sideways information passing of Datalog; the partially evaluated queries travel along their associated paths. Our SDC based query planning is efficient since it avoids the NP-complete query rewriting process. We can achieve further optimization using techniques such as emptiness test.
& Su, J.
(1999). Data Integration by Describing Sources with Constraint Databases. Proceedings of the 15th International Conference on Data Engineering, 1999, 374-381.