Demographic Representation in 3 Leading Artificial Intelligence Text-to-Image Generators

被引:42
作者
Ali, Rohaid [1 ]
Tang, Oliver Y. [1 ,2 ]
Connolly, Ian D. [3 ]
Abdulrazeq, Hael F. [1 ]
Mirza, Fatima N. [4 ]
Lim, Rachel K. [4 ]
Johnston, Benjamin R. [5 ]
Groff, Michael W. [5 ]
Williamson, Theresa [3 ]
Svokos, Konstantina [1 ]
Libby, Tiffany J. [4 ]
Shin, John H. [3 ]
Gokaslan, Ziya L. [1 ]
Doberstein, Curtis E. [1 ]
Zou, James [6 ]
Asaad, Wael F. [1 ,7 ,8 ,9 ]
机构
[1] Brown Univ, Warren Alpert Med Sch, Dept Neurosurg, 593 Eddy St,APC6, Providence, RI 02903 USA
[2] Univ Pittsburgh, Med Ctr, Dept Neurosurg, Pittsburgh, PA USA
[3] Massachusetts Gen Hosp, Dept Radiol, Boston, MA USA
[4] Brown Univ, Warren Alpert Med Sch, Dept Dermatol, Providence, RI USA
[5] Brigham & Womens Hosp, Dept Neurosurg, Boston, MA USA
[6] Stanford Univ, Dept Biomed Data Sci & Courtesy Comp Sci & Elect E, Stanford, CA USA
[7] Rhode Isl Hosp, Norman Prince Neurosci Inst, Dept Neurosci, Providence, RI USA
[8] Brown Univ, Dept Neurosci, Providence, RI USA
[9] Brown Univ, Carney Inst Brain Sci, Dept Neurosci, Providence, RI USA
关键词
D O I
10.1001/jamasurg.2023.5695
中图分类号
R61 [外科手术学];
学科分类号
摘要
IMPORTANCE The progression of artificial intelligence (AI) text-to-image generators raises concerns of perpetuating societal biases, including profession-based stereotypes. OBJECTIVE To gauge the demographic accuracy of surgeon representation by 3 prominent AI text-to-image models compared to real-world attending surgeons and trainees. DESIGN, SETTING, AND PARTICIPANTS The study used a cross-sectional design, assessing the latest release of 3 leading publicly available AI text-to-image generators. Seven independent reviewers categorized AI-produced images. A total of 2400 images were analyzed, generated across 8 surgical specialties within each model. An additional 1200 images were evaluated based on geographic prompts for 3 countries. The study was conducted in May 2023. The 3 AI text-to-image generators were chosen due to their popularity at the time of this study. The measure of demographic characteristics was provided by the Association of American Medical Colleges subspecialty report, which references the American Medical Association master file for physician demographic characteristics across 50 states. Given changing demographic characteristics in trainees compared to attending surgeons, the decision was made to look into both groups separately. Race (non-White, defined as any race other than non-Hispanic White, and White) and gender (female and male) were assessed to evaluate known societal biases. EXPOSURES Images were generated using a prompt template, "a photo of the face of a [blank]", with the blank replaced by a surgical specialty. Geographic-based prompting was evaluated by specifying the most populous countries on 3 continents (the US, Nigeria, and China). MAIN OUTCOMES AND MEASURES The study compared representation of female and non-White surgeons in each model with real demographic data using chi(2), Fisher exact, and proportion tests. RESULTS There was a significantly higher mean representation of female (35.8% vs 14.7%; P < .001) and non-White (37.4% vs 22.8%; P < .001) surgeons among trainees than attending surgeons. DALL-E 2 reflected attending surgeons' true demographic data for female surgeons (15.9% vs 14.7%; P = .39) and non-White surgeons (22.6% vs 22.8%; P = .92) but underestimated trainees' representation for both female (15.9% vs 35.8%; P < .001) and non-White (22.6% vs 37.4%; P < .001) surgeons. In contrast, Midjourney and Stable Diffusion had significantly lower representation of images of female (0% and 1.8%, respectively; P < .001) and non-White (0.5% and 0.6%, respectively; P < .001) surgeons than DALL-E 2 or true demographic data. Geographic-based prompting increased non-White surgeon representation but did not alter female representation for all models in prompts specifying Nigeria and China. CONCLUSION AND RELEVANCE In this study, 2 leading publicly available text-to-image generators amplified societal biases, depicting over 98% surgeons as White and male. While 1 of the models depicted comparable demographic characteristics to real attending surgeons, all 3 models underestimated trainee representation. The study suggests the need for guardrails and robust feedback systems to minimize AI text-to-image generators magnifying stereotypes in professions such as surgery.
引用
收藏
页码:87 / 95
页数:9
相关论文
共 22 条
[1]   What Does DALL-E 2 Know About Radiology? [J].
Adams, Lisa C. ;
Busch, Felix ;
Truhn, Daniel ;
Makowski, Marcus R. ;
Aerts, Hugo J. W. L. ;
Bressem, Keno K. .
JOURNAL OF MEDICAL INTERNET RESEARCH, 2023, 25
[2]  
Ali Rohaid, 2023, Neurosurgery, V93, P1090, DOI [10.1227/neu.0000000000002551, 10.1227/neu.0000000000002551]
[3]  
American Association of Medical Colleges, 2022, Physician Specialty Data Report
[4]   Easily Accessible Text-to-Image Generation Amplifies Demographic Stereotypes at Large Scale [J].
Bianchi, Federico ;
Kalluri, Pratyusha ;
Durmus, Esin ;
Ladhak, Faisal ;
Cheng, Myra ;
Nozza, Debora ;
Hashimoto, Tatsunori ;
Jurafsky, Dan ;
Zou, James ;
Caliskan, Aylin .
PROCEEDINGS OF THE 6TH ACM CONFERENCE ON FAIRNESS, ACCOUNTABILITY, AND TRANSPARENCY, FACCT 2023, 2023, :1493-1504
[5]  
Brotherton Sarah E, 2022, JAMA, V328, P1123, DOI [10.1001/jama.2022.13081, 10.1001/jama.2022.13081]
[6]  
Buolamwini J., 2018, C FAIRNESS ACCOUNTAB, P77
[7]  
Dastin J., 2018, REUTERS
[8]   Reducing Implicit Bias: Association of Women Surgeons #HeForShe Task Force Best Practice Recommendations [J].
DiBrito, Sandra R. ;
Lopez, Carla M. ;
Jones, Christian ;
Mathur, Aarti .
JOURNAL OF THE AMERICAN COLLEGE OF SURGEONS, 2019, 228 (03) :303-309
[10]   The Chicago face database: A free stimulus set of faces and norming data [J].
Ma, Debbie S. ;
Correll, Joshua ;
Wittenbrink, Bernd .
BEHAVIOR RESEARCH METHODS, 2015, 47 (04) :1122-1135