Toward Identifying Features for Automatic Gender Detection: A Corpus Creation and Analysis

被引:11
作者
Alanazi, Saad Awadh [1 ]
机构
[1] Jouf Univ, Coll Comp & Informat Sci, Dept Comp Sci, Sakakah 72441, Saudi Arabia
关键词
Automatic gender detection; feature extraction; Saudi dialects; IDENTIFICATION; EMOTION; CONSOLATION; LAUGHTER; SARCASM; DIALECT; COLOR;
D O I
10.1109/ACCESS.2019.2932026
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The current paper aims to construct an inventory of stylometric and psychometric features for the automatic identification of the author's gender. These features are derived from an analysis of a manually developed Saudi Dialect Twitter Corpus (SDTwittC), consisting of four million words. Given that the study seeks to provide machine learning algorithms with the accurate set of features in solving the gender identification problem, word-based, character-based, syntactic, and function words are all considered during the selection stage. The word-based features constitute the largest category and they represent the possible gender discriminators from sociological, psychological and lexical perspectives. The results show that Saudi males use different styles that separate them from their female counterparts in terms of politeness (greeting, thanking, apology, congratulation, encouragement, best wishes etc), impoliteness (profanity and sarcasm), uses of intensifiers, hedges, color, emotion, reason, emoji among many others.
引用
收藏
页码:111931 / 111943
页数:13
相关论文
共 83 条
[1]  
Abbasi A, 2006, LECT NOTES COMPUT SC, V3975, P60
[2]  
Al-Shboul Y., 2016, INDONESIAN J APPL LI, V6, P79, DOI [10.17509/ijal.v6i1.2664, DOI 10.17509/IJAL.V6I1.2664]
[3]   Using Aspect-Based Sentiment Analysis to Evaluate Arabic News Affect on Readers [J].
AL-Smadi, Mohammed ;
Al-Ayyoub, Mahmoud ;
Al-Sarhan, Huda ;
Jararweh, Yaser .
2015 IEEE/ACM 8TH INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING (UCC), 2015, :436-441
[4]   Using Big Data Analytics For Authorship Authentication of Arabic Tweets [J].
Albadarneh, Jafar ;
Talafha, Bashar ;
Al-Ayyoub, Mahmoud ;
Zaqaibeh, Belal ;
Al-Smadi, Mohammad ;
Jararweh, Yaser ;
Benkhelifa, Elhadj .
2015 IEEE/ACM 8TH INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING (UCC), 2015, :448-452
[5]  
Allami H., 2011, THEORY PRACTICE LANG, V11, P1607
[6]  
Alowibdi Jalal S., 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), P739
[7]  
Alsmearat K, 2015, I C COMP SYST APPLIC
[8]   Author gender identification from Arabic text [J].
Alsmearat, Kholoud ;
Al-Ayyoub, Mahmoud ;
Al-Shalabi, Riyad ;
Kanaan, Ghassan .
JOURNAL OF INFORMATION SECURITY AND APPLICATIONS, 2017, 35 :85-95
[9]  
[Anonymous], ARXIV170600906
[10]  
[Anonymous], 2008, INTRO SOCIOLINGUISTI