Measuring Gender: A Machine Learning Approach to Social Media Demographics and Author Profiling

被引:0
|
作者
Kovacs, Erik-Robert [1 ]
Cotfas, Liviu-Adrian [1 ]
Delcea, Camelia [1 ]
机构
[1] Bucharest Univ Econ Studies, Dept Econ Informat & Cybernet, Bucharest 010552, Romania
来源
COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2023 | 2023年 / 14162卷
关键词
author profiling; gender identification; ensemble methods; social media analysis; COVID-19; SENTIMENT ANALYSIS; TWITTER; NETWORKS; TWEETS;
D O I
10.1007/978-3-031-41456-5_26
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Social media has become a preeminent medium of communication during the early 21(st) century, facilitating dialogue between the political sphere, businesses, scientific experts, and everyday people. Researchers in the social sciences are focusing their attention on social media as a central site of social discourse, but such approaches are hampered by the lack of demographic data that could help them connect phenomena originating in social media spaces to their larger social context. Computational social science methods which use machine learning and deep learning natural language processing (NLP) tools for the task of author profiling (AP) can serve as an essential complement to such research. One of the major demographic categories of interest concerning social media is the gender distribution of users. We propose an ensemble of multiple machine learning classifiers able to distinguish whether a user is anonymous with an F1 score of 90.24%, then predict the gender of the user based on their name, obtaining an F1 score of 89.22%. We apply the classification pipeline to a set of approximately 44,000,000 posts related to COVID-19 extracted from the social media platform Twitter, comparing our results to a benchmark classifier trained on the PAN18 Author Profiling dataset, showing the validity of the proposed approach. An n-gram analysis on the text of the tweets to further compare the two methods has been performed.
引用
收藏
页码:337 / 349
页数:13
相关论文
共 50 条
  • [21] Sentiment Analysis of Social Media Networks Using Machine Learning
    Abd El-Jawad, Mohammed H.
    Hodhod, Rania
    Omar, Yasser M. K.
    2018 14TH INTERNATIONAL COMPUTER ENGINEERING CONFERENCE (ICENCO), 2018, : 174 - 176
  • [22] Using Social Media to Understand Primary Discussions in Gastrointestinal Cancers Machine Learning Approach
    Han, Claire J.
    Ning, Xia
    Lee, Young Ji
    Tounkara, Fode
    Kalady, Matthew F.
    Noonan, Anne M.
    Von Ah, Diane
    CANCER NURSING, 2024,
  • [23] A Deep Attentive Multimodal Learning Approach for Disaster Identification From Social Media Posts
    Hossain, Eftekhar
    Hoque, Mohammed Moshiul
    Hoque, Enamul
    Islam, Md Saiful
    IEEE ACCESS, 2022, 10 : 46538 - 46551
  • [24] Gender Identification in Social Media Using Transfer Learning
    Francisco Sotelo, Aquilino
    Gomez-Adorno, Helena
    Esquivel-Flores, Oscar
    Bel-Enguix, Gemma
    PATTERN RECOGNITION (MCPR 2020), 2020, 12088 : 293 - 303
  • [25] Detection of Bot Accounts in a Twitter Corpus: Author Profiling of Social Media Users as Human vs. Nonhuman
    Diaz Torres, Maria Jose
    Rico-Sulayes, Antonio
    LENGUA Y HABLA, 2021, 25 : 76 - 86
  • [26] Author Profiling on Social Media using New Weighting Schemes that Emphasize Personal Information
    Ortega Mendoza, Rosa Maria
    Franco Arcega, Anilu
    Montes y Gomez, Manuel
    COMPUTACION Y SISTEMAS, 2019, 23 (02): : 501 - 510
  • [27] Sentiment Analysis with Machine Learning Methods on Social Media
    Basarslan, Muhammet Sinan
    Kayaalp, Fatih
    ADCAIJ-ADVANCES IN DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE JOURNAL, 2020, 9 (03): : 5 - 15
  • [28] Detecting suicidality on social media: Machine learning at rescue
    Rabani, Syed Tanzeel
    Khanday, Akib Mohi Ud Din
    Khan, Qamar Rayees
    Hajam, Umar Ayoub
    Imran, Ali Shariq
    Kastrati, Zenun
    EGYPTIAN INFORMATICS JOURNAL, 2023, 24 (02) : 291 - 302
  • [29] Automated Twitter Author Clustering with Unsupervised Learning for Social Media Forensics
    Shao, Sicong
    Tunc, Cihan
    Al-Shawi, Amany
    Hariri, Salim
    2019 IEEE/ACS 16TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA 2019), 2019,
  • [30] Extracting Mental Health Indicators From English and Spanish Social Media: A Machine Learning Approach
    Villa-Perez, Miryam Elizabeth
    Trejo, Luis A.
    Moin, Maisha Binte
    Stroulia, Eleni
    IEEE ACCESS, 2023, 11 : 128135 - 128152