TMD-NER: Turkish multi-domain named entity recognition for informal texts

被引:0
作者
Selim F. Yilmaz
Furkan B. Mutlu
Ismail Balaban
Suleyman S. Kozat
机构
[1] Imperial College London,Department of Electrical and Electronic Engineering
[2] Bilkent University,Department of Electrical and Electronics Engineering
[3] Middle East Technical University,Department of Statistics
来源
Signal, Image and Video Processing | 2024年 / 18卷
关键词
Named entity recognition; Turkish language; Bidirectional long short-term memory; Conditional random fields;
D O I
暂无
中图分类号
学科分类号
摘要
We examine named entity recognition (NER), an essential and commonly used first step in many natural language processing tasks, including chatbots and language translation. We focus on the application of NER to texts that have a lot of noise, such as tweets, which is difficult due to the casual and unstructured language often used in these mediums. In this study, we make use of the largest available labeled data sets for Turkish NER, specifically targeting three informal platforms, namely Twitter, Facebook and Donanimhaber. We choose Turkish as a representative agglutinative language, which has a significantly different structure than other well-known languages such as English, French, and German. We emphasize that the methodologies and insights gained from this study can be extended to other agglutinative languages, like Finnish, Hungarian, Japanese, and Korean. We apply NER to these datasets using 16 different named entity tags through a framework that employs bidirectional long short-term memory (BiLSTM) networks followed by conditional random fields (CRF), known together as the BiLSTM-CRF model. Our experiments show an F1 score of 84% on a combined dataset, which indicates that deep learning models can also be effectively used for business applications in informal settings in agglutinative languages such as Turkish.
引用
收藏
页码:2255 / 2263
页数:8
相关论文
共 35 条
[1]  
Chen X(2020)Lipreading with densenet and resbi-lstm SIViP 14 981-989
[2]  
Du J(2021)A natural language-inspired multilabel video streaming source identification method based on deep neural networks SIViP 15 1161-1168
[3]  
Zhang H(2011)Automatic rule learning exploiting morphological features for named entity recognition in Turkish J. Inf. Sci. 37 137-151
[4]  
Shi Y(2003)A statistical information extraction system for Turkish Nat. Lang. Eng. 9 181-210
[5]  
Tatar S(2022)Achieving online regression performance of LSTMS with simple RNNS IEEE Trans. Neural Netw. Learn. Syst. 33 7632-7643
[6]  
Cicekli I(2019)LSTM-CRF neural network with gated self attention for Chinese NER IEEE Access 7 136694-136703
[7]  
Tür G(2007)Zemberek, an open source NLP framework for Turkic languages Structure 10 1-5
[8]  
Hakkani-Tür D(2021)Multimodal analysis of personality traits on videos of self-presentation and induced behavior J. Multimodal User Interfaces 15 337-358
[9]  
Oflazer K(2011)Natural language processing (almost) from scratch J. Mach. Learn. Res. 12 2493-2537
[10]  
Vural NM(1975)Efficient string matching: an aid to bibliographic search Commun. ACM 18 333-340