Automatic recognition of multi-word terms: The C-value/NC-value method

被引:140
作者
Frantzi K. [1 ]
Ananiadou S. [1 ]
Mima H. [2 ]
机构
[1] Centre for Computational Linguistics, UMIST, Manchester, M60 1QD
[2] Dept. of Information Science, University of Tokyo, Bunkyo-ku, Tokyo 113
关键词
Automatic extraction; Automatic term recognition (ATR); Domain independence; Linguistic and statistical information; Terms;
D O I
10.1007/s007999900023
中图分类号
学科分类号
摘要
Technical terms (henceforth called terms), are important elements for digital libraries. In this paper we present a domain-independent method for the automatic extraction of multi-word terms, from machine-readable special language corpora. The method, (C-value/NC-value), combines linguistic and statistical information. The first part, C-value, enhances the common statistical measure of frequency of occurrence for term extraction, making it sensitive to a particular type of multi-word terms, the nested terms. The second part, NC-value, gives: 1) a method for the extraction of term context words (words that tend to appear with terms); 2) the incorporation of information from term context words to the extraction of terms. © 2000 Springer-Verlag.
引用
收藏
页码:115 / 130
页数:15
相关论文
共 29 条
  • [1] Ananiadou S., Towards a Methodology For Automatic Term Recognition, (1988)
  • [2] Ananiadou S., A methodology for automatic term recognition, Proc. 15th International Conference On Computational Linguistics, pp. 1034-1038, (1994)
  • [3] Bourigault D., Surface grammatical analysis for the extraction of terminological noun phrases, Proc. 14th International Conference On Computational Lingustics, pp. 977-981, (1992)
  • [4] Brill E., A rule-based part of speech tagger, Proc. 3rd Conference of Applied Natural Language Processing, (1992)
  • [5] Brill E., A Corpus-Based Approach to Language Learning, (1993)
  • [6] Dagan I., Church K., Termight: Identifying and translating technical terminology, Proc. 7th Conference of the European Chapter of the Association For Computational Linguistics, pp. 34-40, (1995)
  • [7] Dagan I., Pereira F., Lee L., Similarity-based estimation of word cooccurence probabilities, Proc. 32nd Annual Meeting of the Association For Computational Linguistics, pp. 272-278, (1994)
  • [8] Daille B., Gaussier E., Lange J.-M., Towards automatic extraction of monolingual and bilingual terminology, Proc. 15th International Conference On Computational Linguistics, pp. 515-521, (1994)
  • [9] Damerau F.J., Generating and evaluating domain-oriented multi-word terms from texts, Information Processing & Management, 29, 4, pp. 433-447, (1993)
  • [10] Dunning T., Accurate methods for the statistics of surprise and coincidence, Computational Linguistics, 19, 1, pp. 61-74, (1993)