Smoothed Generalized Dirichlet: A Novel Count-Data Model for Detecting Emotional States

被引:2
作者
Najar F. [1 ]
Bouguila N. [1 ]
机构
[1] Concordia University, Concordia Institute for Information and Systems Engineering, Montreal, H3G 1M8, QC
来源
IEEE Transactions on Artificial Intelligence | 2022年 / 3卷 / 05期
关键词
Count-data modeling; emotion prediction; generalized multinomial prior; smoothed Dirichlet simplex;
D O I
10.1109/TAI.2021.3120043
中图分类号
学科分类号
摘要
In this article, we propose novel approaches to deal with the problem of burstiness, the challenge of count-data sparseness, and the curse of dimensionality. We introduce a smoothed generalized Dirichlet distribution that is a smoothed variant of the generalized Dirichlet distribution and a generalization of the smoothed Dirichlet. We provide different learning methods based on mixture models and agglomerative clustering-based geometrical information: Kullback-Leibler divergence, Fisher metric, and Bhattacharyya distance. Moreover, we show that the new smoothed generalized Dirichlet could be considered as a prior to the multinomial, which generates a new distribution for count data that we call the smoothed generalized Dirichlet multinomial. In particular, we present an approximation based on Taylor series expansion for better performance and optimized running time in the case of high-dimensional count data. The proposed models are evaluated through two emotion detection applications: disaster-tweet-related emotions and pain intensity estimation. Experiments show the efficiency and the robustness of our approaches when dealing with texts, videos, and images. © 2020 IEEE.
引用
收藏
页码:685 / 698
页数:13
相关论文
共 73 条
[1]  
Katz S.M., Distributionofcontent words and phrasesintext and language modelling, Natural Lang. Eng., 2, 1, pp. 15-59, (1996)
[2]  
Church K.W., Gale W.A., Poisson mixtures, Natural Lang. Eng., 1, 2, pp. 163-190, (1995)
[3]  
Madsen R.E., Kauchak D., Elkan C., Modeling word burstiness using the Dirichlet distribution, Proc. 22nd Int. Conf. Mach. Learn., pp. 545-552, (2005)
[4]  
Bouguila N., Count data modeling and classification using finite mixtures of distributions, IEEE Trans. Neural Netw., 22, 2, pp. 186-198, (2011)
[5]  
Nallapati R., Minka T., Robertson S., The Smoothed-Dirichlet distribution: A new building block for generative models, CIIR Tech. Rep., (2006)
[6]  
Najar F., Bouguila N., Happiness analysis with fisher information of Dirichlet-multinomial mixture model, Proc. Can. Conf. Artif. Intell., pp. 438-444, (2020)
[7]  
Zamzami N., Bouguila N., A novel scaled Dirichlet-based statistical framework for count data modeling: Unsupervised learning and exponential approximation, Pattern Recognit., 95, pp. 36-47, (2019)
[8]  
Koochemeshkian P., Zamzami N., Bouguila N., Flexible distribution-based regression models for count data: Application to medical diagnosis, Cybern. Syst., 51, 4, pp. 442-466, (2020)
[9]  
Najar F., Zamzami N., Bouguila N., Fake news detection using Bayesian inference, Proc. IEEE 20th Int. Conf. Inf. Reuse Integr. Data Sci., pp. 389-394, (2019)
[10]  
Bouguila N., A data-driven mixture kernel for count data classification using support vector machines, Proc. IEEE Workshop Mach. Learn. Signal Process., pp. 26-31, (2008)