Automatic extraction of acronym definitions from the Web

被引:31
作者
Sanchez, David [1 ]
Isern, David [1 ]
机构
[1] Univ Rovira & Virgili, Dept Comp Sci & Math, Intelligent Technol Adv Knowledge Acquisit ITAKA, Tarragona, Catalonia, Spain
关键词
Acronyms; Information extraction; Web mining;
D O I
10.1007/s10489-009-0197-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Acronyms are widely used to abbreviate and stress important concepts. The discovery of the definitions associated to an acronym is an important matter in order to support language processing and knowledge-related tasks as information retrieval, ontology mapping or question answering. Acronyms represent a very dynamic and unbounded topic that is constantly evolving. Manual attempts to compose a global scale dictionary of acronym-definition pairs result in an overwhelming amount of work and limited results. Attending these shortcomings, this paper presents an automatic and unsupervised methodology to generate acronyms and extract their potential definitions from the Web. The method has been designed to minimise the set of constraints, offering a domain and -partially- language independent solution, and to exploit the Web in order to create large and general acronym-definition sets. Results have been manually evaluated against the largest manually built acronym repository: Acronym Finder. The evaluation shows that the proposed approach is able to improve the coverage of manual attempts maintaining a high precision.
引用
收藏
页码:311 / 327
页数:17
相关论文
共 48 条
  • [1] ADAR E, 2002, S RAD SIMPLE ROBUST
  • [2] AGIRRE E, 2000, P ONT LEARN WORKSH, P73
  • [3] [Anonymous], 2004, WWW '04, DOI DOI 10.1145/988672.988687
  • [4] [Anonymous], 2001, P 12 EUR C MACH LEAR, DOI DOI 10.1007/3-540-44795-4_42
  • [5] [Anonymous], SIGKDD EXPLORATIONS
  • [6] Brill E., 2001, P TEXT RETRIEVAL C T, P183
  • [7] BRILL E, 2003, P 4 INT C COMP LING, P360
  • [8] Carmel D., 2002, Proceedings of SIGIR 2002. Twenty-Fifth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P283, DOI 10.1145/564376.564427
  • [9] CASTELLS P, 2003, WEB SEMANTICA SISTEM, P195
  • [10] Integrating query expansion and conceptual relevance feedback for personalized Web information retrieval
    Chang, CH
    Hsu, CC
    [J]. COMPUTER NETWORKS AND ISDN SYSTEMS, 1998, 30 (1-7): : 621 - 623