Next: Candidate Generation Phase
Up: The WOLFIE Algorithm and
Previous: Solving the Lexicon Acquisition
Our Solution: WOLFIE
The above analysis leads us to believe that the Lexicon Acquisition
Problem is computationally intractable. Therefore, we can not perform
an efficient search for the best lexicon. Nor can we use a standard
induction algorithm. Therefore, we have
implemented WOLFIE7, outlined in Figure 6, which finds an approximate solution to
the Lexicon Acquisition Problem.
Our approach is to generate a set of
candidate lexicon entries, from which the final learned lexicon is
derived by greedily choosing the ``best'' lexicon item at each point,
in the hopes of finding a final (minimal) covering lexicon.
We do not actually learn interpretation functions, so do
not guarantee that we will find a covering lexicon.8
Even if we were
to search for interpretation functions, using a greedy search would
also not guarantee covering the input, and of course it also does not
guarantee that a minimal lexicon is found. However, we will later
present experimental results demonstrating that our greedy approach
performs well.
Figure 6:
WOLFIE Algorithm Overview
 |
WOLFIE first derives an initial set of candidate meanings for
each phrase. The algorithm for generating candidates, LICS, attempts
to find a ``maximally common'' meaning for each phrase, which biases
toward both finding a small lexicon by covering many vertices of a
tree at once, and finding a lexicon that actually does cover the
input. Second, WOLFIE chooses final lexicon entries from this
candidate set, one at a time, updating the candidate set as it goes,
taking into account our assumptions of single-use, connectedness, and
exclusivity. The basic scheme for choosing entries from the candidate
set is to maximize the prediction of meanings given phrases, but also
to find general meanings. This adds a tension between LICS, which
cover many vertices, and generality, which biases towards fewer
vertices. However, generality, like LICS, helps lead to a small
lexicon since a general meaning will more likely apply widely across a
corpus.
Let us explain the algorithm in further detail by way of an example,
using Spanish instead of English to illustrate the difficulty somewhat
more clearly. Consider the following corpus:
- 1.
- ¿ Cuál es el capital del estado con la población más grande?
answer(C, (capital(S,C), largest(P, (state(S), population(S,P))))).
- 2.
- ¿ Cuál es la punta más alta del estado con la area más
grande?
answer(P, (high_point(S,P), largest(A, (state(S), area(S,A))))).
- 3.
- ¿ En que estado se encuentra Texarkana?
answer(S, (state(S), eq(C,cityid(texarkana,_)), loc(C,S))).
- 4.
- ¿ Qué capital es la más grande?
answer(A, largest(A, capital(A))).
- 5.
- ¿ Qué es la area de los estados unitos?
answer(A, (area(C,A), eq(C,countryid(usa)))).
- 6.
- ¿ Cuál es la población de un estado que bordean a Utah?
answer(P, (population(S,P), state(S), next_to(S,M), eq(M,stateid(utah)))).
- 7.
- ¿ Qué es la punta más alta del estado con la capital Madison?
answer(C, (high_point(B,C), loc(C,B), state(B),
capital(B,A), eq(A,cityid(madison,_)))).
The sentence representations here are slightly different than the tree
representations given in the problem definition, with the main
difference being the addition of existentially quantified variables
shared between some leaves of a representation tree. As mentioned in
Section 2.1, the representations are Prolog queries to a
database. Given such a query, we can create a tree
that conforms to our formalism, but with this addition of quantified
variables.
An example is shown in Figure 7 for the
Figure 7:
Tree with Variables
![\includegraphics[height=3in,angle=-90]{answer.eps}](img32.gif) |
representation of the third sentence. Each vertex is a predicate name
and its arity, in the Prolog style, e.g., state/1, with quantified
variables at some of the leaves. For each outgoing
edge (n,m) of a vertex n, the edge is labeled with the argument
position filled by the subtree rooted by m. If there is not an
edge labeled with a given argument position, the argument is a free
variable. Each vertex labeled with a variable (which can occur only at
leaves) is an existentially quantified variable whose scope is the entire
tree (or query). The learned lexicon, however, does not need to
maintain the identity between variables across distinct lexical entries.
Another representation difference is that we will
strip the answer predicate from the input to our
learner,9 thus
allowing a forest of directed trees as input rather than a single tree. The
definition of the problem easily extends such that the root of each
tree in the forest must be in the domain of some interpretation function.
Evaluation of our system using this representation is given in
Section 5.1; evaluation using a representation without
variables or forests is presented in Section 5.2. We
previously [Thompson1995] presented results demonstrating
learning representations of a different form, that of a case-role
representation [Fillmore1968] augmented with Conceptual
Dependency [Schank1975] information. This last representation
conforms directly to our problem definition.
Now, continuing with the example of solving the Lexicon Acquisition
Problem for this corpus, let us also assume for simplification,
although not required, that sentences are
stripped of phrases that we know have empty meanings
(e.g., ``qué'', ``es'', ``con'', and ``la'').
We will similarly assume that it is known
that some phrases refer directly to given database constants (e.g.,
location names), and remove those phrases and their meaning from the
training input.
Next: Candidate Generation Phase
Up: The WOLFIE Algorithm and
Previous: Solving the Lexicon Acquisition
Cindi Thompson
2003-01-02