PhD Thesis Bell

Discovery of Metadata in Relational Databases for Semantic Query Optimization (PhD Thesis by Siegfried Bell)

Semantic query optimization promises to free users from the need of understanding the intricacies of the database design and the set theoretical relationships required to join tables when making an efficient query. The aim of semantic query optimization is to use knowledge for reformulating a query into one that may require less answering time than the original query. Most approaches have the disadvantage of presuming this knowledge to be given by an expert or stated in the data dictionary as integrity constraints. This drawback can be overcome by using discovered knowledge which enables us to use semantic query optimization for each database even without any semantic description of the data.

Discovering data about data, i.e. metadata, entails a new point of view, because only states of databases are considered. A first consequence of this new view is that data dependencies as metadata and their relationships have to be extended by an expanded axiomatization. A second consequence is, for example, that null values have to be treated as semantical gaps.

In this work, an approach to semantic query optimization is presented which discovers and maintains metadata in relational databases in order to reformulate queries. A partial logic ensures that databases, metadata, and queries can be handled in one framework. The investigated metadata consists of domain, cardinality, unary inclusion, key and functional dependencies, and new kinds of rules. In addition, it is shown how discovered metadata can be used to transform queries into more effective queries which may require less answering time. Finally, an empirical evaluation implies, together with the theoretical results, that the shift from integrity constraints to discovered metadata improves semantic query optimization.

CR Categories: H.2.1, H.2.3, I.2.3, I.2.4, I.2.6