An Automated Term Definition Extraction System Using the Web Corpus in the Chinese Language

被引:0
|
作者
Leu, Fang-Yie [1 ]
Ko, Chih-Chieh [1 ]
机构
[1] Tunghai Univ, Dept Comp Sci, Taichung 407, Taiwan
基金
俄罗斯基础研究基金会;
关键词
definitions; web corpus; information extraction; Chinese language; text mining;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes a system, named Del-Explorer, which analyzes the type of given Chinese terms, extracts term definitions from the Web, and selects answers from noisy Web pages. DefExplorer tillers out invalid data with a semantic approach. Two types of candidate sets, common and domain specific, are employed to cluster similar candidates into groups. Different approaches are also deployed to evaluate candidates' importance which is the key factor for selecting the best answers from retrieved candidates. Experimental results show that DefExplorer can effectively extract term definitions from the Web, especially for the definitions of out-of-vocabulary terms.
引用
收藏
页码:505 / 525
页数:21
相关论文
共 9 条
  • [1] An automated term definition extraction using the web corpus in chinese language
    Leu, Fang-Yie
    Ko, Chih-Chieh
    PROCEEDINGS OF THE 2007 IEEE INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING (NLP-KE'07), 2007, : 435 - +
  • [2] Web text corpus extraction system for linguistic tasks
    Cadavid Rengifo, Hector Fabio
    Gomez Perdomo, Jonatan
    INGENIERIA E INVESTIGACION, 2009, 29 (03): : 54 - 60
  • [3] Automated system for construction specification review using natural language processing
    Moon, Seonghyeon
    Lee, Gitaek
    Chi, Seokho
    ADVANCED ENGINEERING INFORMATICS, 2022, 51
  • [4] Using Google to Search Language Patterns in Web-Corpus: EFL Writing Pedagogy
    Kvashnina, Olga S.
    Sumtsova, Olga V.
    INTERNATIONAL JOURNAL OF EMERGING TECHNOLOGIES IN LEARNING, 2018, 13 (03): : 173 - 179
  • [5] A Chinese language expert system using Bayesian learning
    Wu, YY
    Zhang, JJ
    8TH WORLD MULTI-CONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL XIV, PROCEEDINGS: COMPUTER AND INFORMATION SYSTEMS, TECHNOLOGIES AND APPLICATIONS, 2004, : 90 - 95
  • [6] Using a natural language understanding system to generate semantic web content
    Java, Akshay
    Nirneburg, Sergei
    McShane, Marjorie
    Finin, Timothy
    English, Jesse
    Joshi, Anupam
    INTERNATIONAL JOURNAL ON SEMANTIC WEB AND INFORMATION SYSTEMS, 2007, 3 (04) : 50 - 74
  • [7] Intelligent and Adaptive Web Data Extraction System Using Convolutional and Long Short-Term Memory Deep Learning Networks
    Patnaik, Sudhir Kumar
    Babu, C. Narendra
    Bhave, Mukul
    BIG DATA MINING AND ANALYTICS, 2021, 4 (04) : 279 - 297
  • [8] An Automated Stock Recommendation System from Stock Investment Research using Domain Specific Information Extraction
    Tapjinda, Tayida
    Vechpanich, Potsawee
    Leelasupakul, Nutchaya
    Prompoon, Nakornthip
    Patanothai, Chate
    PROCEEDINGS OF THE 2015 12TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER SCIENCE AND SOFTWARE ENGINEERING (JCSSE), 2015, : 30 - 35
  • [9] Using natural language processing to extract structured epilepsy data from unstructured clinic letters: development and validation of the ExECT (extraction of epilepsy clinical text) system
    Fonferko-Shadrach, Beata
    Lacey, Arron S.
    Roberts, Angus
    Akbari, Ashley
    Thompson, Simon
    Ford, David V.
    Lyons, Ronan A.
    Rees, Mark I.
    Pickrell, William Owen
    BMJ OPEN, 2019, 9 (04):