Nbias: A natural language processing framework for BIAS identification in text

被引:23
作者
Raza, Shaina [1 ]
Garg, Muskan [2 ]
Reji, Deepak John [3 ]
Bashir, Syed Raza [4 ]
Ding, Chen [4 ]
机构
[1] Vector Inst Artificial Intelligence, Toronto, ON, Canada
[2] Mayo Clin, Artificial Intelligence & Informat, Rochester, MN USA
[3] Environm Resources Management, Bengaluru, Karnataka, India
[4] Toronto Metropolitan Univ, Toronto, ON, Canada
关键词
Bias detection; Dataset; Token classification; Nbias;
D O I
10.1016/j.eswa.2023.121542
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Bias in textual data can lead to skewed interpretations and outcomes when the data is used. These biases could perpetuate stereotypes, discrimination, or other forms of unfair treatment. An algorithm trained on biased data may end up making decisions that disproportionately impact a certain group of people. Therefore, it is crucial to detect and remove these biases to ensure the fair and ethical use of data. To this end, we develop a comprehensive and robust framework Nbias that consists of four main layers: data, corpus construction, model development and an evaluation layer. The dataset is constructed by collecting diverse data from various domains, including social media, healthcare, and job hiring portals. As such, we applied a transformer-based token classification model that is able to identify bias words/phrases through a unique named entity BIAS. In the evaluation procedure, we incorporate a blend of quantitative and qualitative measures to gauge the effectiveness of our models. We achieve accuracy improvements ranging from 1% to 8% compared to baselines. We are also able to generate a robust understanding of the model functioning. The proposed approach is applicable to a variety of biases and contributes to the fair and ethical use of textual data.
引用
收藏
页数:16
相关论文
共 59 条
[11]  
Ding L., 2021, AAAI C ART INT
[12]   Measuring and Mitigating Unintended Bias in Text Classification [J].
Dixon, Lucas ;
Li, John ;
Sorensen, Jeffrey ;
Thain, Nithum ;
Vasserman, Lucy .
PROCEEDINGS OF THE 2018 AAAI/ACM CONFERENCE ON AI, ETHICS, AND SOCIETY (AIES'18), 2018, :67-73
[13]   A rule-based named-entity recognition method for knowledge extraction of evidence based dietary recommendations [J].
Eftimov, Tome ;
Seljak, Barbara Korousic ;
Korosec, Peter .
PLOS ONE, 2017, 12 (06)
[14]  
Epure E. V., 2022, arXiv, DOI 10.48550/arXiv.2108.11857
[15]   A Multidimensional Dataset Based on Crowdsourcing for Analyzing and Detecting News Bias [J].
Faerber, Michael ;
Burkard, Victoria ;
Jatowt, Adam ;
Lim, Sora .
CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, 2020, :3007-3014
[16]   A high-quality feature selection method based on frequent and correlated items for text classification [J].
Farghaly, Heba Mamdouh ;
Abd El-Hafeez, Tarek .
SOFT COMPUTING, 2023, 27 (16) :11259-11274
[17]   A new feature selection method based on frequent and associated itemsets for text classification [J].
Farghaly, Heba Mamdouh ;
Abd El-Hafeez, Tarek .
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (25)
[18]  
Farmakiotou D., 2000, P WORKSH COMP LEX MU
[19]   Few-shot classification in Named Entity Recognition Task [J].
Fritzler, Alexander ;
Logacheva, Varvara ;
Kretov, Maksim .
SAC '19: PROCEEDINGS OF THE 34TH ACM/SIGAPP SYMPOSIUM ON APPLIED COMPUTING, 2019, :993-1000
[20]   A Survey on Bias in Deep NLP [J].
Garrido-Munoz, Ismael ;
Montejo-Raez, Arturo ;
Martinez-Santiago, Fernando ;
Urena-Lopez, L. Alfonso .
APPLIED SCIENCES-BASEL, 2021, 11 (07)