Sequence encoding incorporated CNN model for Email document sentiment classification

被引：20

作者：

Liu, Sisi ^{[1
]}

Lee, Ickjai ^{[1
]}

机构：

[1] James Cook Univ, Coll Sci & Engn, Discipline Comp Sci & Informat Technol, POB 6811, Cairns, Qld 4870, Australia

来源：

APPLIED SOFT COMPUTING | 2021年 / 102卷

关键词：

Sentiment analysis; CNN model; Sequence encoding; Graph-based position encoding;

D O I：

10.1016/j.asoc.2021.107104

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Document sentiment classification is an area of study that has been developed for decades. However, sentiment classification of Email data is rather a specialized field that has not yet been thoroughly studied. Compared to typical social media and review data, Email data has characteristics of length variance, duplication caused by reply and forward messages, and implicitness in sentiment indicators. Due to these characteristics, existing techniques are incapable of fully capturing the complex syntactic and relational structure among words and phrases in Email documents. In this study, we introduce a dependency graph-based position encoding technique enhanced with weighted sentiment features, and incorporate it into the feature representation process. We combine encoded sentiment sequence features with traditional word embedding features as input for a revised deep CNN model for Email sentiment classification. Experiments are conducted on three sets of real Email data with adequate label conversion processes. Empirical results indicate that our proposed SSE CNN model obtained the highest accuracy rate of 88.6%, 74.3% and 82.1% for three experimental Email datasets over other comparative state-of-the-art algorithms. Furthermore, our performance evaluations on the preprocessing and sentiment sequence encoding justify the effectiveness of Email preprocessing and sentiment sequence encoding with dependency-graph based position and SWN features on the improvement of Email document sentiment classification. (C) 2021 Elsevier B.V. All rights reserved.

引用

页数：14

共 54 条

[1] The Lifetime of Email Messages: A Large-Scale Analysis of Email Revisitation [J].

Alrashed, Tarfah ;

Awadallah, Ahmed Hassan ;

Dumais, Susan .

CHIIR'18: PROCEEDINGS OF THE 2018 CONFERENCE ON HUMAN INFORMATION INTERACTION & RETRIEVAL, 2018, :120-129

[2]

[Anonymous], 2005, Proceedings of HLT/EMNLP on Interactive Demonstrations

[3]

[Anonymous], 2012, Regular expressions cookbook: detailed solutions in eight programming languages

[4]

Baccianella S, 2010, LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION

[5]

Bhatia P., 2015, ARXIV PREPRINT ARXIV

[6] Large-Scale Machine Learning with Stochastic Gradient Descent [J].

Bottou, Leon .

COMPSTAT'2010: 19TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL STATISTICS, 2010, :177-186

[7] Random forests [J].

Breiman, L .

MACHINE LEARNING, 2001, 45 (01) :5-32

[8]

Burney B.S. Aqil, 2012, INT J COMPUT APPL, V975, P8887

[9] LIBSVM: A Library for Support Vector Machines [J].

Chang, Chih-Chung ;

Lin, Chih-Jen .

ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)

[10]

Chen H., 2016, C EMP METH NAT LANG, P1650

← 1 2 3 4 5 6 →