next up previous
Next: Comparison Up: Learning Concept Hierarchies from Previous: Evaluation


Results

As already mentioned above, we evaluate our approach on two domains: tourism and finance. The ontology for the tourism domain is the reference ontology of the comparison study presented by [40], which was modeled by an experienced ontology engineer. The finance ontology is basically the one developed within the GETESS project [57]; it was designed for the purpose of analyzing German texts on the Web, but also English labels are available for many of the concepts. Moreover, we manually added the English labels for those concepts whose German label has an English counterpart with the result that most of the concepts ($>$95%) finally yielded also an English label.8 The tourism domain ontology consists of 293 concepts, while the finance domain ontology is bigger with a total of 1223 concepts9. Table 2 summarizes some facts about the concept hierarchies of the ontologies, such as the total number of concepts, the total number of leave concepts, the average and maximal length of the paths from a leave to the root node as well as the average and maximal number of children of a concept (without considering leave concepts).

Table 2: Ontology statistics
  Tourism Finance
No. Concepts 293 1223
No. Leaves 236 861
Avg. Depth 3.99 4.57
Max. Depth 6 13
Max. Children 21 33
Avg. Children 5.26 3.5


As domain-specific text collection for the tourism domain we use texts acquired from the above mentioned web sites, i.e. from http://www.lonelyplanet.com as well as from http://www.all-in-all.de. Furthermore, we also used a general corpus, the British National Corpus10. Altogether, the corpus size was over 118 Million tokens. For the finance domain we considered Reuters news from 1987 with over 185 Million tokens11.

Subsections
next up previous
Next: Comparison Up: Learning Concept Hierarchies from Previous: Evaluation
Philipp Cimiano 2005-08-04