Discussion

Next: Related Work Up: Results Previous: Smoothing

Discussion

We have shown that our FCA-based approach is a reasonable alternative to similarity-based clustering approaches, even yielding better results on our datasets with regard to the

measure defined in Section 5. The main reason for this is that the concept hierarchies produced by FCA yield a higher recall due to the higher number of concepts, while maintaining the precision relatively high at the same time. Furthermore, we have shown that the conditional probability performs reasonably well as information measure compared to other more elaborate measures such as PMI or the one used by [51]. Unfortunately, applying a smoothing method based on clustering mutually similar terms does not improve the quality of the automatically learned concept hierarchies. Table 8 highlights the fact that every approach has its own benefits and drawbacks. The main benefit of using FCA is on the one hand that on our datasets it performed better than the other algorithms thus producing better concept hierarchies On the other hand, it does not only generate clusters - formal concepts to be more specific - but it also provides an intensional description for these clusters thus contributing to better understanding by the ontology engineer (compare Figure 2 (left)). In contrast, similarity-based methods do not provide the same level of traceability due to the fact that it is the numerical value of the similarity between two high-dimensional vectors which drives the clustering process and which thus remains opaque to the engineer. The agglomerative and divisive approach are different in this respect as in the agglomerative paradigm, initial merges of small-size clusters correspond to high degrees of similarity and are thus more understandable, while in the divisive paradigm the splitting of clusters aims at minimizing the overall cluster variance thus being harder to trace. A clear disadvantage of FCA is that the size of the lattice can get exponential in the size of the context in the worst case thus resulting in an exponential time complexity -- compared to $O(n^2\log n)$ and

for agglomerative clustering and Bi-Section-KMeans, respectively. The implementation of FCA we have used is the concepts tool by Christian Lindig¹⁴, which basically implements Ganter's Next Closure algorithm [25,26] with the extension of Aloui for computing the covering relation as described by [29]. Figure 12 shows the number of seconds over the number of attribute/object pairs it took FCA to compute the lattice of formal concepts compared to the time needed by a naive

implementation of the agglomerative algorithm with complete linkage. It can be seen that FCA performs quite efficiently compared to the agglomerative clustering algorithm. This is due to the fact that the object/attribute matrix is sparsely populated. Such observations have already been made before. [29] for example suspect that the lattice size linearly increases with the number of attributes per object. [38] presents empirical results analyzing contexts with a fill ratio below 0.1 and comes to the conclusion that the lattice size grows quadratically with respect to the size of the incidence relation

. Similar findings are also reported by [9]. Figure 13 shows the number of attributes over the terms' rank, where the rank is a natural number indicating the position of the word in a list ordered by decreasing term frequencies. It can be appreciated that the amount of (non-zero) attributes is distributed in a Zipfian way (compare [65]), i.e. a small number of objects have a lot of attributes, while a large number of them has just a few. In particular, for the tourism domain, the term with most attributes is person with 3077 attributes, while on average a term has approx. 178 attributes. The total number of attributes considered is 9738, so that we conclude that the object/attribute matrix contains almost 98% zero values. For the finance domain the term with highest rank is percent with 2870 attributes, the average being ca. 202 attributes. The total number of attributes is 21542, so that we can state that in this case more than 99% of the matrix is populated with zero-values and thus is much sparser than the ones considered by [38]. These figures explain why FCA performs efficiently in our experiments. Concluding, though the worst-time complexity is exponential, FCA is much more efficient than the agglomerative clustering algorithm in our setting.

**Figure 12:** Comparison of the time complexities for FCA and agglomerative clustering for the tourism and finance domains
$\begin{figure} \begin{center} \epsfig{file=tourismtime.eps, width=12cm}\\ \v... ...s, width=12cm} \end{center} \vspace{-0.5cm} \vspace{-0.5cm} \end{figure}$

**Figure 13:** Distribution of Features: number of (non-zero) features over word rank
$\begin{figure} \begin{center} \epsfig{file=distribution.eps, width=12cm} \end{center} \vspace{-0.5cm} \end{figure}$

Table 8: Trade-offs between different taxonomy construction methods

	Effectiveness (F')		Worst Case	Traceability	Size of
	Tourism	Finance	Time Complexity		Hierarchies
FCA	44.69%	38.85%		Good	Large
Agglomerative Clustering:
Complete Linkage	36.85%	33.35%	$O(n^2\log n)$	Fair	Small
Average Linkage	36.55%	32.92%
Single Linkage	38.57%	32.15%
Bi-Section-KMeans	36.42%	32.77%		Weak	Small

Next: Related Work Up: Results Previous: Smoothing

Philipp Cimiano 2005-08-04