Threshold-based Naive Bayes classifier

被引:11
作者
Romano, Maurizio [1 ]
Contu, Giulia [1 ]
Mola, Francesco [1 ]
Conversano, Claudio [1 ]
机构
[1] Univ Cagliari, Dept Econ & Business Sci, Cagliari, Italy
关键词
Naive Bayes; Booking; com; Customer satisfaction; Sentiment analysis; Natural language processing; Word of mouth; ONLINE REVIEWS; QUALITY;
D O I
10.1007/s11634-023-00536-8
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
The Threshold-based Naive Bayes (Tb-NB) classifier is introduced as a (simple) improved version of the original Naive Bayes classifier. Tb-NB extracts the sentiment from a Natural Language text corpus and allows the user not only to predict how much a sentence is positive (negative) but also to quantify a sentiment with a numeric value. It is based on the estimation of a single threshold value that concurs to define a decision rule that classifies a text into a positive (negative) opinion based on its content. One of the main advantage deriving from Tb-NB is the possibility to utilize its results as the input of post-hoc analysis aimed at observing how the quality associated to the different dimensions of a product or a service or, in a mirrored fashion, the different dimensions of customer satisfaction evolve in time or change with respect to different locations. The effectiveness of Tb-NB is evaluated analyzing data concerning the tourism industry and, specifically, hotel guests' reviews from all hotels located in the Sardinian region and available on Booking.com. Moreover, Tb-NB is compared with other popular classifiers used in sentiment analysis in terms of model accuracy, resistance to noise and computational efficiency.
引用
收藏
页码:325 / 361
页数:37
相关论文
共 41 条
[1]   ROLE OF PRODUCT-RELATED CONVERSATIONS IN DIFFUSION OF A NEW PRODUCT [J].
ARNDT, J .
JOURNAL OF MARKETING RESEARCH, 1967, 4 (03) :291-295
[2]  
Bachtiar F. A., 2020, CEUR workshop proceedings, P105
[3]   CRITICAL QUESTIONS FOR BIG DATA Provocations for a cultural, technological, and scholarly phenomenon [J].
Boyd, Danah ;
Crawford, Kate .
INFORMATION COMMUNICATION & SOCIETY, 2012, 15 (05) :662-679
[4]  
Brownlee J., 2017, Deep Learning for Natural Language Processing: Develop Deep Learning Models for Your Natural Language Problems
[5]  
Buttle F. A., 1998, Journal of Strategic Marketing, V6, P241, DOI [10.1080/096525498346658, DOI 10.1080/096525498346658]
[6]  
Chai C.P., 2019, SURVEY PRACTICE, V12, P1, DOI DOI 10.1017/S1351324920000534
[7]   Distinguishing between facts and opinions for sentiment analysis: Survey and challenges [J].
Chaturvedi, Iti ;
Cambria, Erik ;
Welsch, Roy E. ;
Herrera, Francisco .
INFORMATION FUSION, 2018, 44 :65-77
[8]   The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation [J].
Chicco, Davide ;
Totsch, Niklas ;
Jurman, Giuseppe .
BIODATA MINING, 2021, 14 (01) :1-22
[9]   The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation [J].
Chicco, Davide ;
Jurman, Giuseppe .
BMC GENOMICS, 2020, 21 (01)
[10]  
Esuli A., 2006, P 11 M EUROPEAN CHAP, P193