Nbias: A natural language processing framework for BIAS identification in text

被引:23
作者
Raza, Shaina [1 ]
Garg, Muskan [2 ]
Reji, Deepak John [3 ]
Bashir, Syed Raza [4 ]
Ding, Chen [4 ]
机构
[1] Vector Inst Artificial Intelligence, Toronto, ON, Canada
[2] Mayo Clin, Artificial Intelligence & Informat, Rochester, MN USA
[3] Environm Resources Management, Bengaluru, Karnataka, India
[4] Toronto Metropolitan Univ, Toronto, ON, Canada
关键词
Bias detection; Dataset; Token classification; Nbias;
D O I
10.1016/j.eswa.2023.121542
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Bias in textual data can lead to skewed interpretations and outcomes when the data is used. These biases could perpetuate stereotypes, discrimination, or other forms of unfair treatment. An algorithm trained on biased data may end up making decisions that disproportionately impact a certain group of people. Therefore, it is crucial to detect and remove these biases to ensure the fair and ethical use of data. To this end, we develop a comprehensive and robust framework Nbias that consists of four main layers: data, corpus construction, model development and an evaluation layer. The dataset is constructed by collecting diverse data from various domains, including social media, healthcare, and job hiring portals. As such, we applied a transformer-based token classification model that is able to identify bias words/phrases through a unique named entity BIAS. In the evaluation procedure, we incorporate a blend of quantitative and qualitative measures to gauge the effectiveness of our models. We achieve accuracy improvements ranging from 1% to 8% compared to baselines. We are also able to generate a robust understanding of the model functioning. The proposed approach is applicable to a variety of biases and contributes to the fair and ethical use of textual data.
引用
收藏
页数:16
相关论文
共 59 条
[1]  
Alabi JO, 2020, PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), P2754
[2]  
Alex Bea, 2010, P 4 LINGUISTIC ANNO, P29
[3]  
Alistair J, 2021, MIMIC-III Clinical Database
[4]  
Author's Name, 2023, Classifying job posts via NLP
[5]  
Bolukbasi T, 2016, ADV NEUR IN, V29
[6]  
Cai Y., 2022, 2022 IEEE 9 INT C DA, P1
[7]   Semantics derived automatically from language corpora contain human-like biases [J].
Caliskan, Aylin ;
Bryson, Joanna J. ;
Narayanan, Arvind .
SCIENCE, 2017, 356 (6334) :183-186
[8]  
Dawkins H, 2021, FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, P4214
[9]  
Dev S, 2021, Arxiv, DOI arXiv:2108.03362
[10]  
Devinney H, 2022, PROCEEDINGS OF 2022 5TH ACM CONFERENCE ON FAIRNESS, ACCOUNTABILITY, AND TRANSPARENCY, FACCT 2022, P2083, DOI 10.1145/3531146.3534627