Named Entity Recognition Utilized to Enhance Text Classification While Preserving Privacy

被引:4
|
作者
Kutbi, Mohammed [1 ]
机构
[1] Saudi Elect Univ, Comp Sci Dept, Jeddah 23445, Saudi Arabia
关键词
Named entities; preprocessing; text classification; privacy;
D O I
10.1109/ACCESS.2023.3325895
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recent development in Natural Language Processing (NLP) techniques has encouraged NLP-based application in various field including business, legal and health. An important process for all NLP projects is text preprocessing which is a process that modifies text data before using them in a machine learning model. Usually text preprocessing process includes cleaning, filtering, removing and replacing some texts to increase model accuracy, robustness, reduce data size or preserve privacy. Named entities recognizer (NER) is an NLP tool which finds Named Entities in text such as: names, organization, addresses, numbers and date. In this work, we create a preproccessing approach that uses NER to find named entities and, then, replace them with their type i.e. location, person or organization name to improve accuracy and preserve privacy instead of removing them or letting them become noise to our data. Experiments for text classification task using our approach have been conducted on several datasets some of which were collected in-house. Experiments indicate that using this approach enhances classifier accuracy and reduces feature representation's dimensionality while, also, preserve privacy.
引用
收藏
页码:117576 / 117581
页数:6
相关论文
共 50 条
  • [41] HDCNN-CRF for Biomedical Text Named Entity Recognition
    Gao, Mingyuan
    Wei, Hao
    Chen, Fei
    Qu, Wen
    Lu, Mingyu
    PROCEEDINGS OF 2019 IEEE 10TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE (ICSESS 2019), 2019, : 191 - 194
  • [42] Chinese Named Entity Recognition for Hazard And Operability Analysis Text
    Li, FangGuo
    Zhang, BeiKe
    Gao, Dong
    PROCEEDINGS OF THE 32ND 2020 CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2020), 2020, : 374 - 378
  • [43] A comprehensive study of named entity recognition in Chinese clinical text
    Lei, Jianbo
    Tang, Buzhou
    Lu, Xueqin
    Gao, Kaihua
    Jiang, Min
    Xu, Hua
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2014, 21 (05) : 808 - 814
  • [44] Named Entity Recognition Algorithms Comparison For Judicial Text Data
    Aibek, Kuralbayev
    Bobur, Mukhsimbayev
    Abay, Bekbaganbetov
    Hajiyev, Fuad
    2020 IEEE 14TH INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES (AICT2020), 2020,
  • [45] Novelty detection for text documents using named entity recognition
    Ng, Kok Wah
    Tsai, Flora S.
    Chen, Lihui
    Goh, Kiat Chong
    2007 6TH INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATIONS & SIGNAL PROCESSING, VOLS 1-4, 2007, : 1663 - +
  • [46] Named Entity Recognition and Normalization in Tweets Towards Text Summarization
    Jabeen, Saima
    Shah, Sajid
    Latif, Asma
    2013 EIGHTH INTERNATIONAL CONFERENCE ON DIGITAL INFORMATION MANAGEMENT (ICDIM), 2013, : 223 - 227
  • [47] Persian Automatic Text Summarization Based on Named Entity Recognition
    Khademi, Mohammad Ebrahim
    Fakhredanesh, Mohammad
    IRANIAN JOURNAL OF SCIENCE AND TECHNOLOGY-TRANSACTIONS OF ELECTRICAL ENGINEERING, 2020,
  • [48] Named Entity Recognition in Vietnamese Text Using Label Propagation
    Huong Thanh Le
    Rathany Chan Sam
    Hoan Cong Nguyen
    Thuy Thanh Nguyen
    2013 INTERNATIONAL CONFERENCE OF SOFT COMPUTING AND PATTERN RECOGNITION (SOCPAR), 2013, : 366 - 370
  • [49] Named entity recognition method in health preserving field based on BERT
    Zhang, Qiang
    Sun, Yong
    Zhang, Linlin
    Jiao, Yanfei
    Tian, Yue
    PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE OF INFORMATION AND COMMUNICATION TECHNOLOGY, 2021, 183 : 212 - 220
  • [50] Entity-to-Text based Data Augmentation for various Named Entity Recognition Tasks
    Hu, Xuming
    Jiang, Yong
    Liu, Aiwei
    Huang, Zhongqiang
    Xie, Pengjun
    Huang, Fei
    Wen, Lijie
    Yu, Philip S.
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 9072 - 9087