Next: Evaluation of the Focus
Up: An Empirical Approach to
Previous: Other Work on Temporal
Analysis
The implementation is an important proof of concept.
However, as discussed in Section 6, various kinds of errors are
reflected in the results; many are not directly related to discourse
processing or
temporal reference resolution. Examples are (1) completely null inputs,
when the semantic parser or speech recognizer fails, (2) numbers mistaken
as dates, and (3) failures to recognize that a relation can
be established, due to a lack of specific domain knowledge.
To evaluate the algorithm itself, in this section we separately
evaluate the components of our method for temporal reference
resolution. Sections 8.1 and 8.2 assess
the key contributions of this work: the focus model (in Section
8.1) and the deictic and anaphoric relations
(in Section 8.2). These evaluations required us to perform
extensive additional manual annotation of the data. In order to
preserve the test dialogs as unseen test data, these
annotations were performed on the training data only. In Section
8.3, we isolate the architectural components of our
algorithm, such as the certainty factor calculation and the critics, to
assess the effects they have on performance.
Coverage and Ambiguity of the Relations Defined in the
Model
Evaluation of the Architectural Components
Next: Evaluation of the Focus
Up: An Empirical Approach to
Previous: Other Work on Temporal