Hauptnavigation

Peco Network Inductive Logic Programming

Theory restructuring: A Perspective on Design & Maintenance of Knowledge Based Systems (PhD Thesis by Edgar Sommer)

Theory restructuring is an emerging research issue in the field of Artificial Intelligence, at the apex of Machine Learning, Knowledge Acquisition, Knowledge Engineering and Logic Programming, aimed at supporting the design and maintenance of knowledge bases for knowledge based systems. The task is to transform a given theory (aka knowledge base) into a different form without changing coverage of a goal concept or concepts, i.e. to in some sense improve the theory's structure, without changing its inferential outcome.

A knowledge base (KB) is a formal model of some problem domain and consists, at the core and very generally, of facts and rules. The facts describe the problem domain, and the task of a knowledge based system (KBS) is to infer new facts on the basis of these given ones and the rules, and/or to answer queries about the validity of a new fact with yes or no.

Practical experience with designing such KBs [Sommer et al. 94] has shown a need for addressing the task of maintenance, as well as the tasks addressed in the established fields of Knowledge Acquisition, Representation & Engineering, Machine Learning and Logic Programming. More succinctly, reorganization and correcting knowledge bases is the most time-consuming phase during KBS development [Carbonell 91].

At various points in the life and design cycle of a knowledge base, the rules may be redundant, inconsistent and only partially cover the concepts that represent the inferential goal, and it may be very difficult to gain insight into this status quo. Naturally, which rules are desirable and which are not depends on the application, but even on an abstract level, an analysis of a KB's status quo can be variously motivated:

``Semantically'' motivated: How good are the rules at solving their prime objective, covering the goal concept(s), that is, inferring new instances and/or answering queries? If coverage is not complete, can this failure be pinpointed and characterized? Are some rules redundant, in the basic sense that omitting them has no effect on the set of computable answers? Can this redundancy be characterized? Is the rule set inconsistent and can reasons for inconsistency be pinpointed?

Pragmatically motivated: Is the model of the real problem domain as simple as it could be? Is it coherent and homogeneous not only in the sense of performance, but in the sense that similarities in the ``real world'' are reflected in similar structures in the model? More specifically, are valid and useful relationships in the real world explicit as concepts in the model? Are these concepts put to use consistently and throughout? Is the inferential goal reached by way of a chain of inferences with intermediate concepts (deep theory) or in a single, complex step (flat theory; see figures in the appendix)? The latter may be desirable in view of run-time optimization, while the former may be easier to understand and modify.

Externally motivated: Is the model adequate, i.e. correct with respect to the aspect of the world it is meant to represent?

From this perspective, the maintenance task can be divided into three interrelated topics: validation, revision, and restructuring. Validation is concerned with determining whether the knowledge base, as a formal model of reality, does in fact fulfill the purpose. [Meseguer 92] gives a concise overview of this issue. Revision is concerned with changing the computed answer set of the KBS to fix some problem found in validation, for instance. [Wrobel 94] gives a detailed treatment. Restructuring is concerned with changes to the knowledge base that do not alter the computed answers, but rather improve other criteria, such as understandability or execution time.

KB Maintenance: Analysis, evaluation & reorganization

My main interest lies in the investigation and development of methods for maintaining knowledge bases of logical knowledge based systems, i.e. knowledge bases that can be interpreted as restricted first order logical theories.

Analysis concerns such criteria as redundancy, utility and coverage of the rules in the theory, the presentation of the overall inferential structure, and explanation of why specific inferences do or do not occur in it.

Evaluation involves answering the question: beyond accuracy, what constitutes a good theory? This means developing a set of criteria useful in comparing different, empirically equivalent theories (or different versions of a theory, depending on your point of view).

Reorganization concerns the modification of a given theory's inferential structure without changing the answer set. Some examples of such modification are:

Elimination of unnecessary rules in theory and unnecessary conditions in individual rules.

Introduction of new concepts into the theory that allow a more concise re-expression of the rules. Motivation is twofold: finding meaningful and useful new concepts in a theory, and providing a deeper inferential structure of the theory, making it more modular, easier to understand and modify.

Flattening of inferential structure by replacing intermediate concepts with their definitions where they occur in rules. This is the complement of the previous, and minimizes the number of inferences the KBS must perform at runtime.

Embedding Machine learning & Knowledge Acquisition

Restructuring is embedded in a context that combines Machine Learning, Knowledge Acquisition and Knowledge Engineering. Machine learning is concerned with developing algorithms capable of discovering new relationships --- expressed as rules --- between concepts in a given KB. In the context of KBS, these activities can also be viewed as forms of automatic analysis of existing knowledge, resulting in novel interpretations of existing data and/or the discovery of new relationships and structures in initially unstructured data [Sommer 95]. More simply, ML offers an alternative to manual elicitation and formalization of rules from experts in cases where data rather than human expertise is more readily available.

Early work in ML was not concerned with this embedded aspect of learning algorithms, but rather with ``pure'' algorithms that take input and produce a result (a concept or rule) not necessarily expressed in the same language as the input, and not interpreted in any way by the algorithm itself (one-shot learning). More recent work [Morik et al. 93] is taking ML in the direction of open systems, where learning results may function as input for subsequent learning passes (closed-loop learning) , and where a uniform knowledge representation allows knowledge sharing between different components for differing purposes (such as knowledge browser, inference engine, explanation, revision and truth maintenance modules).

Work along these lines --- using ML as a provider of rules for a KBS --- has shown a need for more explicit and elaborate forms of evaluation and post-processing than either the areas of ML and Knowledge Acquisition have been concerned with. In ML, the main success criterion has been accuracy: what percentage of the given examples are covered by the induced theory, and in some cases, what percentage of probable future examples will be covered? This does not solve the task of combining the result of several algorithms. Other aspects of utility, such as understandability, modularity, maintainability have not been in the main focus of ML research.

In Knowledge Acquisition, emphasis has been on manual construction of rules via elicitation, often in the absence of significant numbers of concrete examples. Here, the success criterion is based on experts judgment: does the expert agree with the rules that have been formulated by the knowledge engineer on the basis of the expert's utterances? Theory restructuring aims at filling the gap between these two

by allowing for various forms of inspection and evaluation of a KB, by offering ways of reorganizing a KB to make it more understandable, maintainable and more modular, by offering means of characterizing alternative, empirically equivalent forms of KBs, and by offering means of sifting a large number of rules produced by a number of competing ML algorithms (and knowledge engineers) for a most concise set of necessary rules.

References

Carbonell 91 Jaime G. Carbonell. Scaling up KBS via machine learning. In Yves Kodratoff (ed.) , Proc. Fifth European Working Session on Learning (EWSL-91). Springer, 1991.

Meseguer 92 Pedro Meseguer. Towards a conceptual framework for expert system validation. AI Communications, 5(3):119--135, sep 1992.

Morik et al. 93 K. Morik, S. Wrobel, J.-U. Kietz, and W. Emde. Knowledge Acquisition and Machine Learning. Academic Press, London, 1993.

Sommer et al. 94 E. Sommer, K. Morik, J.M. Andre, and M. Uszynski. What On-line Learning Can Do for Knowledge Acquisition. Knowledge Acquisition, 6:435--460, 1994.

Sommer 95 E. Sommer. Induction, evaluation, restructuring: Data analysis as a machine learning loop. In George E. Lasker, editor, Proc. of the Conference on Intelligent Data Analysis (IDA-95), 1995.

Wrobel 94 Stefan Wrobel. Concept Formation and Knowledge Revision. Kluwer Academic Publishers, Dordrecht, Netherlands, 1994.