Automatic terminological collocations extraction from large corpus

被引:0
|
作者
Suarez, Octavio Santana [1 ]
Aguiar, Jose Perez [1 ]
Berriel, Isabel Sanchez [2 ]
Rodriguez, Virginia Gutierrez [2 ]
机构
[1] Univ Las Palmas Gran Canaria, Edificio Dept Informat & Matemat, Las Palmas Gran Canaria 35017, Spain
[2] Univ La Laguna, Edificio Fis & Matemat,Campus Univ Anchieta, San Cristobal la Laguna 38271, Spain
来源
PROCESAMIENTO DEL LENGUAJE NATURAL | 2011年 / 47期
关键词
automatic extraction of collocations; terminology; computational linguistics; text mining;
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
The automatic systems which deal with term's extractions constitute an important tool when they make reference to the labor of compilation of lexemes, which is restricted to a specific field or specialty. The textual analysis that are realized for this type of software must include strategies that could detect collocations in the field in which is done. In this topic is studied the viability of the use from extensive textual's corpus, that have not contain linguistic information, as happen with those textual's corpus that could be compiled from internet. The internet is used like a source of information for the recompilation of terminology's collocations. With that purpose is analyzed the behavior of different indicators based on the frequencies registered for a collection of economic terms in a Spanish corpus of 300.000 words.
引用
收藏
页码:145 / 152
页数:8
相关论文
共 50 条
  • [31] Semi-Automatic Construction of Thyroid Cancer Intervention Corpus from Biomedical Abstracts
    Kongburan, Wutthipong
    Padungweang, Praisan
    Krathu, Worarat
    Chan, Jonathan H.
    2016 EIGHTH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTATIONAL INTELLIGENCE (ICACI), 2016, : 150 - 157
  • [32] The BioLexicon: a large-scale terminological resource for biomedical text mining
    Paul Thompson
    John McNaught
    Simonetta Montemagni
    Nicoletta Calzolari
    Riccardo del Gratta
    Vivian Lee
    Simone Marchi
    Monica Monachini
    Piotr Pezik
    Valeria Quochi
    CJ Rupp
    Yutaka Sasaki
    Giulia Venturi
    Dietrich Rebholz-Schuhmann
    Sophia Ananiadou
    BMC Bioinformatics, 12
  • [33] Corpus-based Contrastive Analysis of Keywords and Collocations across Sister Specialized Subcorpora in the Maritime Transport Field
    Losey Leon, Maria Araceli
    CURRENT WORK IN CORPUS LINGUISTICS: WORKING WITH TRADITIONALLY- CONCEIVED CORPORA AND BEYOND (CILC2015), 2015, 198 : 526 - 534
  • [34] Large-scale automatic extraction of side effects associated with targeted anticancer drugs from full-text oncological articles
    Xu, Rong
    Wang, QuanQiu
    JOURNAL OF BIOMEDICAL INFORMATICS, 2015, 55 : 64 - 72
  • [35] Terminological performance of information science descriptors of the SIBI/USP Controlled Vocabulary in manual, automatic and semi-automatic indexing processes
    Alves Lima, Vania Mara
    Casari Boccato, Vera Regina
    PERSPECTIVAS EM CIENCIA DA INFORMACAO, 2009, 14 (01): : 131 - 151
  • [36] Improving Classification of Tweets Using Linguistic Information from a Large External Corpus
    Hammer, Hugo Lewi
    Yazidi, Anis
    Bai, Aleksander
    Engelstad, Paal
    INDUSTRIAL NETWORKS AND INTELLIGENT SYSTEMS, INISCOM 2016, 2017, 188 : 122 - 134
  • [37] Automatic extraction of urban outdoor perception from geolocated free texts
    Santos, Frances A.
    Silva, Thiago H.
    Loureiro, Antonio A. F.
    Villas, Leandro A.
    SOCIAL NETWORK ANALYSIS AND MINING, 2020, 10 (01)
  • [38] Deep Text Mining for Automatic Keyphrase Extraction from Text Documents
    Abulaish, Muhammad
    Jahiruddin
    Dey, Lipika
    JOURNAL OF INTELLIGENT SYSTEMS, 2011, 20 (04) : 327 - 351
  • [39] Framework for automatic information extraction from research papers on nanocrystal devices
    Dieb, Thaer M.
    Yoshioka, Masaharu
    Hara, Shinjiro
    Newton, Marcus C.
    BEILSTEIN JOURNAL OF NANOTECHNOLOGY, 2015, 6 : 1872 - 1882
  • [40] Evaluation of Automatic Hypernym Extraction from Technical Corpora in English and Dutch
    Lefever, Els
    Van de Kauter, Marjan
    Hoste, Veeronique
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 490 - 497