Characterization (Descriptive Setting)

Name Characterization (Descriptive Setting)

Description
The characterization task in Inductive Logic Programming is about finding interesting regularities in datasets. If you have a complex dataset, it is often impossible to directly see such regularities in the data. Having a description of the same dataset, represented in a readable formalism (e.g. first order logic rules), often makes finding interesting regularities a much easier task.

The requirements for a suited description of a dataset in the characterization task are high, because the idea is to find all rules (except redundant ones) that hold, when applied to the dataset. We are looking for a set of rules with the following properties:

Of course every rule has to be correct according to he data.

Every rule explains at least one observation within the dataset, that is not explained by any of the other rules. This helps to avoid unnecessary rules.

If a rule is correct, according to the data, it is part of our set, or it can logically be derived from it. This is very much like finding all interesting rules, we just focus on those from which all the others can be derived, which will usually make the result much smaller and therefore more readable.

We do not have unnecessary rules in our set, that is our set has no subset with the required properties.

Definition In more technical terms:
Given

a set of unclassified examples E in a language LE,

background knowledge B in a language LB,

a hypotheses language LH,

the task is to find a set of hypotheses H ⊂ LH, so that

M⁺(B ∪ E) ⊂ M(H) (H is valid),

for all hypotheses h ∈ H there is at least one example e ∈ E, so B, E \ {e} ⊭ e and B, E \ {e}, h ⊨ e (H is necessary),

H ⊨ h for all h ∈ LH being valid and necessary (H is complete), and

there is no H' ⊂ H, that is valid and complete (H is minimal),

where

an interpretation I is a model of a theory T, denoted I ∈ M(T) if it fulfills all of the propositions in T.

an interpretation I is a minimal model of a theory T, denoted I ∈ M⁺(T), if I ∈ M(T) and there is no interpretation I' ⊂ I with I' ∈ M(T).

Methods GOLEM
PROGOL

Theories Inductive Logic Programming (ILP)