A Bank Information Extraction System Based on Named Entity Recognition with CRFs from Noisy Customer Order Texts in Turkish

被引:4
作者
Emekligil, Erdem [1 ]
Arslan, Secil [1 ]
Agin, Onur [1 ]
机构
[1] Yapi Kredi Technol, R&D & Special Projects Dept, Istanbul, Turkey
来源
KNOWLEDGE ENGINEERING AND SEMANTIC WEB, KESW 2016 | 2016年 / 649卷
关键词
Named entity recognition; Turkish; Conditional random fields; Noisy Text; Banking applications;
D O I
10.1007/978-3-319-45880-9_8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Each day hundred thousands of customer transactions arrive at banks operation center via fax channel. The information required to complete each transaction (money transfer, salary payment, tax payment etc.) is extracted manually by operators from the image of customer orders. Our information extraction system uses CRFs (Conditional Random Fields) for obtaining the required named entities for each transaction type from noisy text of customer orders. The difficulty of the problem arouses from the fact that every customer order has different formats, image resolution of orders are so low that OCR-ed (Optical Character Recognition) texts are highly noisy and Turkish is still challenging for the natural language processing techniques due to structure of the language. This paper mentions the difficulties of our problem domain and provides details of the methodology developed for extracting entities such as client name, organization name, bank account number, IBAN number, amount, currency and explanation.
引用
收藏
页码:93 / 102
页数:10
相关论文
共 16 条
  • [1] [Anonymous], 2004, Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA), DOI 10.3115/1567594.1567618
  • [2] Çelikkaya G, 2013, I C APPL INF COMM TE, P154
  • [3] Eken B., 2015, P 4 INT C SOFTW ENG
  • [4] Guthrie D, 2006, LREC, P1
  • [5] Klinger R., 2007, P 2 BIOCR CHALL EV W
  • [6] Kucuk D., 2014, P 5 WORKSHOP LANGUAG, P71, DOI DOI 10.3115/V1/W14-1309
  • [7] Lafferty J.D., 2001, 2014 P 18 INT C MACH
  • [8] Nadeau D, 2007, LINGUIST INVESTIG, V30, P3
  • [9] Seker G. A., 2012, Proceedings of 24th International Conference on Computational Linguistics (COLING 2012), P2459
  • [10] Sha F, 2003, HLT-NAACL 2003: HUMAN LANGUAGE TECHNOLOGY CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE MAIN CONFERENCE, P213