The alogorithm PROGOL uses inverted entailment to generalize a set of positive
examples with respect to background knowledge B. Examples are definite clauses.
Additionally constraints can be given in the form of headless horn clauses.
The outer loop of PROGOL chooses a yet uncovered example, generalizes it
to a definite clause and adds that clause to the background knowledge.
This is iterated until all examples are covered by the (extended) background
knowledge.
The principle of inverting entailment:
For every generalization h of an example e with respect to background knowledge
B it is required that B ⋀ h ⊧ e.
Applying the deduction theorem leads to B ⋀ ¬ e ⊧
¬ h.
Let ⊥ be the most specific clause, more general (with respect to B)
than e. Then B ⋀ ¬ e ⊧ ¬ ⊥ ⊧
¬ h and h ⊧ ⊥ for all hypotheses h more general than e.
Generalizing one example:
The hypothesis space for the generalization of single examples is a syntactically
restricted subset of the set of definite clauses.
The restriction is specified by the user via so called "mode declarations",
restricting definite clauses in two ways:
- The maximal depth i of definite clauses is bound.
- The allowed predicate symbols for head and body of the clause are given.
Another indirect restriction of the search space is given by restricting the number of
maximal resolution steps when calculating inferences.
There will usually be many possible generalizations of an example. The generalization
process can be regarded as a search through a lattice of hypotheses, quasi-ordered by
generality.
PROGOL uses θ-subsumption for quasi-ordering the search space.
- The most general element of this lattice is the empty clause.
- Let ⊥ be the most specific definite clause, such that
B ⋀ ⊥ ⋀ ¬ e derives the empty clause in
a fixed number of resolution steps.
The most specific element within the user-specified search space is called
⊥i. It is the the most specific clause of depth at most
i, θ-subsuming the clause ⊥.
Within the lattice a top-down search is performed.
The description length of clauses serves as a quality measure.
The underlying assumption of this approach is, that clauses with shorter representations
generalize better than those with longer representations.
For details, please refer to the publication above. |