In Table 6, the results of the three methods (Joshi-2005, McInnes-2007, and Stevenson-2008) are taken from Stevenson et al. [1]. These three methods are supervised methods and used various machine http://www.selleckchem.com/products/XL184.html learning algorithm and wide sets of features. For example, Stevenson-2008 used linguistic features, CUI’s, MeSH terms, and combination of these features. They employed three learners VSM (vector space model), Na?ve Bayes (NB), and SVM. The results included in Table 6 are their best results with VSM and (linguistic + MeSH) features [1]. The method of Joshi-2005 uses five supervised learning methods and collocation features, while McInnes-2007 uses NB [1].Our evaluation is done on 31 words (as explained in Section 3). We obtained the results of the other methods on these 31 words from the references shown in Table 6 to allow for direct comparison.
The best result reported in their paper is 87.8% using all words with VSM model and for McInnes 85.3% also with the whole set [1]. The best result of Stevensons-2008 for subsets was 85.1% using a subset of 22 words defined by Stevenson et al. [1].The results of the three methods (single, subset, full) in Table 6 are taken directly from Agirre et al. [2]. As shown in Table 6, the average accuracy of these three methods (68.8%, 59.7%, and 63.5%) on the 31 words is significantly lower than our method (90.3%) and also the average accuracy of their method on the whole set (65.9%, 63.0%, and 65.9%); we note that their method is unsupervised and does not require tagged instances [2].
In another work, Jimeno-Yepes and Aronson evaluate four unsupervised methods on the whole NLM-WSD set [4] as well as NB and combination of the four methods. The accuracy of the four methods ranges from 58.3% to 88.3% (NB) on the whole set, and NB was found to be the best performer followed by CombSW (76.3%) [4]. The average accuracy results of NB and two combinations (NB, CombSW, and CombV) on our 31 word-subset are 86%, 73.1%, and 72.1% respectively which are lower than our results, see Table 6.When we applied our system onto the species disambiguation task, the results are also encouraging as shown in Table 8. The evaluation results of our method compare very well with those reported in [9] as shown in Table 7. From their results (Table 7), we notice that the best overall performance was obtained with the ML method (machine learning) with precision, recall, and F1 values being equal at 82.69. Our results as shown in Table 8 are not directly comparable with those in Table 7 due to the difference in the size of test set. However, Brefeldin_A we can see that our method’s performance is reasonably well standing in terms of precision, recall, and F1.