Multi-lingual geoparsing based on machine translation

被引:2
作者
Chen, Xu [1 ,2 ]
Gelernter, Judith [2 ]
Zhang, Han [2 ]
Liu, Jin [1 ]
机构
[1] Wuhan Univ, State Key Lab Software Engn, Comp Sch, Wuhan, Hubei, Peoples R China
[2] Carnegie Mellon Univ, Sch Comp Sci, Language Technol Inst, Pittsburgh, PA 15213 USA
来源
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE | 2019年 / 96卷
基金
中国国家自然科学基金;
关键词
Named entities recognition; Location; Geoparse; Multi-lingual; Machine translation; Word Alignment; NAMED ENTITY RECOGNITION;
D O I
10.1016/j.future.2017.07.057
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Our method for multi-lingual geoparsing uses monolingual tools and resources along with machine translation and alignment to return location words in many languages. Not only does our method save the time and cost of developing geoparsers for each language separately, but also it allows the possibility of a wide range of having a wide range of language capabilities within a single interface. We evaluated our method in our LanguageBridge prototype on location named entities using newswire, broadcast news and telephone conversations in English, Arabic and Chinese data from the Linguistic Data Consortium (LDC). Our results for geoparsing Chinese and Arabic text using our multi-lingual geoparsing method are comparable to our results for geoparsing English text with our English tools. Furthermore, our experiments using our tools on machine translation approach in accuracy results on results from the same data that was translated manually, further showing the robustness of locations to machine translation. (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:667 / 677
页数:11
相关论文
共 28 条
[1]   Robust multilingual Named Entity Recognition with shallow semi-supervised features [J].
Agerri, Rodrigo ;
Rigau, German .
ARTIFICIAL INTELLIGENCE, 2016, 238 :63-82
[2]  
Al-Onaizan Y, 2002, 40TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, P400
[3]   Learning dependency transduction models from unannotated examples [J].
Alshawi, H ;
Douglas, S .
PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY OF LONDON SERIES A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2000, 358 (1769) :1357-1370
[4]  
[Anonymous], P C N AM CHAPT ASS C
[5]  
[Anonymous], 2012, Proceedings of the NAACL-HLT Workshop on the Induction of Linguistic Structure
[6]  
[Anonymous], P 2 WORKSH STAT MACH, DOI [10.3115/1626355.1626366, DOI 10.3115/1626355.1626366]
[7]  
Birch Alexandra, 2008, P 2008 C EMPIRICAL M, P745
[8]  
Bracewell David B., 2008, Engineering Letters, V16, P160
[9]   Improved Named Entity Recognition using Machine Translation-based Cross-lingual Information [J].
Dandapat, Sandipan ;
Way, Andy .
COMPUTACION Y SISTEMAS, 2016, 20 (03) :495-504
[10]   Named Entity Recognition with Word Embeddings and Wikipedia Categories for a Low-Resource Language [J].
Das, Arjun ;
Ganguly, Debasis ;
Garain, Utpal .
ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2017, 16 (03)