Automatic recognition of multi-word terms: The C-value/NC-value method

被引：147

作者：

Frantzi K. ^{[1
]}

Ananiadou S. ^{[1
]}

Mima H. ^{[2
]}

机构：

[1] Centre for Computational Linguistics, UMIST, Manchester, M60 1QD

[2] Dept. of Information Science, University of Tokyo, Bunkyo-ku, Tokyo 113

来源：

International Journal on Digital Libraries | 2000年 / 3卷 / 2期

关键词：

Automatic extraction; Automatic term recognition (ATR); Domain independence; Linguistic and statistical information; Terms;

D O I：

10.1007/s007999900023

中图分类号：

学科分类号：

摘要：

Technical terms (henceforth called terms), are important elements for digital libraries. In this paper we present a domain-independent method for the automatic extraction of multi-word terms, from machine-readable special language corpora. The method, (C-value/NC-value), combines linguistic and statistical information. The first part, C-value, enhances the common statistical measure of frequency of occurrence for term extraction, making it sensitive to a particular type of multi-word terms, the nested terms. The second part, NC-value, gives: 1) a method for the extraction of term context words (words that tend to appear with terms); 2) the incorporation of information from term context words to the extraction of terms. © 2000 Springer-Verlag.

引用

页码：115 / 130

页数：15

共 29 条

[1]

Ananiadou S., Towards a Methodology For Automatic Term Recognition, (1988)

[2]

Ananiadou S., A methodology for automatic term recognition, Proc. 15th International Conference On Computational Linguistics, pp. 1034-1038, (1994)

[3]

Bourigault D., Surface grammatical analysis for the extraction of terminological noun phrases, Proc. 14th International Conference On Computational Lingustics, pp. 977-981, (1992)

[4]

Brill E., A rule-based part of speech tagger, Proc. 3rd Conference of Applied Natural Language Processing, (1992)

[5]

Brill E., A Corpus-Based Approach to Language Learning, (1993)

[6]

Dagan I., Church K., Termight: Identifying and translating technical terminology, Proc. 7th Conference of the European Chapter of the Association For Computational Linguistics, pp. 34-40, (1995)

[7]

Dagan I., Pereira F., Lee L., Similarity-based estimation of word cooccurence probabilities, Proc. 32nd Annual Meeting of the Association For Computational Linguistics, pp. 272-278, (1994)

[8]

Daille B., Gaussier E., Lange J.-M., Towards automatic extraction of monolingual and bilingual terminology, Proc. 15th International Conference On Computational Linguistics, pp. 515-521, (1994)

[9]

Damerau F.J., Generating and evaluating domain-oriented multi-word terms from texts, Information Processing & Management, 29, 4, pp. 433-447, (1993)

[10]

Dunning T., Accurate methods for the statistics of surprise and coincidence, Computational Linguistics, 19, 1, pp. 61-74, (1993)

← 1 2 3 →