Text based personality prediction from multiple social media data sources using pre-trained language model and model averaging

被引:45
作者
Christian, Hans [1 ]
Suhartono, Derwin [2 ]
Chowanda, Andry [2 ]
Zamli, Kamal Z. [3 ]
机构
[1] Bina Nusantara Univ, Comp Sci Dept, BINUS, Grad Program,Comp Sci, Jakarta 11480, Indonesia
[2] Bina Nusantara Univ, Comp Sci Dept, Sch Comp Sci, Jakarta 11480, Indonesia
[3] Univ Malaysia Pahang, Coll Comp & Appl Sci, Fac Comp, Pahang 26600, Malaysia
关键词
Personality prediction; Natural language processing; Social media; Deep learning; BERT; Language model; BEHAVIOR; TRAITS; SYSTEM; USER;
D O I
10.1186/s40537-021-00459-1
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The ever-increasing social media users has dramatically contributed to significant growth as far as the volume of online information is concerned. Often, the contents that these users put in social media can give valuable insights on their personalities (e.g., in terms of predicting job satisfaction, specific preferences, as well as the success of professional and romantic relationship) and getting it without the hassle of taking formal personality test. Termed personality prediction, the process involves extracting the digital content into features and mapping it according to a personality model. Owing to its simplicity and proven capability, a well-known personality model, called the big five personality traits, has often been adopted in the literature as the de facto standard for personality assessment. To date, there are many algorithms that can be used to extract embedded contextualized word from textual data for personality prediction system; some of them are based on ensembled model and deep learning. Although useful, existing algorithms such as RNN and LSTM suffers from the following limitations. Firstly, these algorithms take a long time to train the model owing to its sequential inputs. Secondly, these algorithms also lack the ability to capture the true (semantic) meaning of words; therefore, the context is slightly lost. To address these aforementioned limitations, this paper introduces a new prediction using multi model deep learning architecture combined with multiple pre-trained language model such as BERT, RoBERTa, and XLNet as features extraction method on social media data sources. Finally, the system takes the decision based on model averaging to make prediction. Unlike earlier work which adopts a single social media data with open and close vocabulary extraction method, the proposed work uses multiple social media data sources namely Facebook and Twitter and produce a predictive model for each trait using bidirectional context feature combine with extraction method. Our experience with the proposed work has been encouraging as it has outperformed similar existing works in the literature. More precisely, our results achieve a maximum accuracy of 86.2% and 0.912 f1 measure score on the Facebook dataset; 88.5% accuracy and 0.882 f1 measure score on the Twitter dataset.
引用
收藏
页数:20
相关论文
共 40 条
  • [1] Big Five Traits: A Critical Review
    Abood, Najm
    [J]. GADJAH MADA INTERNATIONAL JOURNAL OF BUSINESS, 2019, 21 (02) : 159 - 186
  • [2] Transformer models for text-based emotion detection: a review of BERT-based approaches
    Acheampong, Francisca Adoma
    Nunoo-Mensah, Henry
    Chen, Wenyu
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 2021, 54 (08) : 5789 - 5829
  • [3] Adi Gabriel Yakub N. N., 2018, Procedia Computer Science, V135, P473, DOI 10.1016/j.procs.2018.08.199
  • [4] Comparative Analysis of Feature Selection Algorithms for Computational Personality Prediction From Social Media
    Al Marouf, Ahmed
    Hasan, Md. Kamrul
    Mahmud, Hasan
    [J]. IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2020, 7 (03) : 587 - 599
  • [5] Alam F., 2013, AAAI WORKSHOP, P6
  • [6] Aung ZMM, 2019, 2019 20TH IEEE/ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING (SNPD), P34, DOI [10.1109/SNPD.2019.8935692, 10.1109/snpd.2019.8935692]
  • [7] Ben-Porat O, 2020, J ARTIF INTELL RES, V68, P413
  • [8] Cross-platform personality exploration system for online social networks: Facebook vs. Twitter
    Bin Tareaf, Raad
    Berger, Philipp
    Hennig, Patrick
    Meinel, Christoph
    [J]. WEB INTELLIGENCE, 2020, 18 (01) : 35 - 51
  • [9] Christian H., 2016, ComTech Comput Math Eng Appl, V7, P285, DOI 10.21512/comtech.v7i4.3746
  • [10] Cui B., SURVEY ANAL MACHINE