Probabilistic Topic Models for Learning Terminological Ontologies

被引:51
作者
Wei, Wang [1 ]
Barnaghi, Payam [2 ]
Bargiela, Andrzej [1 ]
机构
[1] Univ Nottingham, Sch Comp Sci, Semanyih 43500, Selangor Darul, Malaysia
[2] Univ Surrey, Fac Engn & Phys Sci, Ctr Commun Syst Res, Surrey GU2 7XH, England
关键词
Knowledge acquisition; ontology learning; ontology; probabilistic topic models;
D O I
10.1109/TKDE.2009.122
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Probabilistic topic models were originally developed and utilized for document modeling and topic extraction in Information Retrieval. In this paper, we describe a new approach for automatic learning of terminological ontologies from text corpus based on such models. In our approach, topic models are used as efficient dimension reduction techniques, which are able to capture semantic relationships between word-topic and topic-document interpreted in terms of probability distributions. We propose two algorithms for learning terminological ontologies using the principle of topic relationship and exploiting information theory with the probabilistic topic models learned. Experiments with different model parameters were conducted and learned ontology statements were evaluated by the domain experts. We have also compared the results of our method with two existing concept hierarchy learning methods on the same data set. The study shows that our method outperforms other methods in terms of recall and precision measures. The precision level of the learned ontology is sufficient for it to be deployed for the purpose of browsing, navigation, and information search and retrieval in digital libraries.
引用
收藏
页码:1028 / 1040
页数:13
相关论文
共 31 条
  • [1] An introduction to MCMC for machine learning
    Andrieu, C
    de Freitas, N
    Doucet, A
    Jordan, MI
    [J]. MACHINE LEARNING, 2003, 50 (1-2) : 5 - 43
  • [2] [Anonymous], 2009, Vision Res., DOI [DOI 10.1016/J.VISRES.2008.09.007, 10.1016/j.visres.2008.09.007]
  • [3] [Anonymous], ICSITR97021 U BERK
  • [4] The Semantic Web - A new form of Web content that is meaningful to computers will unleash a revolution of new possibilities
    Berners-Lee, T
    Hendler, J
    Lassila, O
    [J]. SCIENTIFIC AMERICAN, 2001, 284 (05) : 34 - +
  • [5] Using linear algebra for intelligent information retrieval
    Berry, MW
    Dumais, ST
    OBrien, GW
    [J]. SIAM REVIEW, 1995, 37 (04) : 573 - 595
  • [6] Biermann Chris., 2005, LDV Forum, P75
  • [7] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
  • [8] Cimiano P, 2005, LECT NOTES COMPUT SC, V3513, P227
  • [9] Cimiano P, 2006, ONTOLOGY LEARNING PO
  • [10] Cunningham Hamish., 2005, ENCY LANGUAGE LINGUI