A convolutional neural network-based linguistic steganalysis for synonym substitution steganography

被引:63
作者
Xiang, Lingyun [1 ,2 ,3 ]
Guo, Guoqing [2 ]
Yu, Jingming [2 ]
Sheng, Victor S. [4 ]
Yang, Peng [5 ]
机构
[1] Changsha Univ Sci & Technol, Hunan Prov Key Lab Intelligent Proc Big Data Tran, Changsha 410114, Hunan, Peoples R China
[2] Changsha Univ Sci & Echnol, Sch Comp & Commun Engn, Changsha 410114, Hunan, Peoples R China
[3] Changsha Univ Sci & Technol, Hunan Prov Key Lab Smart Roadway & Cooperat Vehic, Changsha 410114, Hunan, Peoples R China
[4] Univ Cent Arkansas, Dept Comp Sci, Conway, AR 72035 USA
[5] Hunan Branch CNCERT CC, Changsha 410004, Hunan, Peoples R China
基金
中国国家自然科学基金;
关键词
steganalysis; steganography; synonym substitution; word embedding; convolutional neural network; NATURAL-LANGUAGE WATERMARKING;
D O I
10.3934/mbe.2020055
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
In this paper, a linguistic steganalysis method based on two-level cascaded convolutional neural networks (CNNs) is proposed to improve the system's ability to detect stego texts, which are generated via synonym substitutions. The first-level network, sentence-level CNN, consists of one convolutional layer with multiple convolutional kernels in different window sizes, one pooling layer to deal with variable sentence lengths, and one fully connected layer with dropout as well as a softmax output, such that two final steganographic features are obtained for each sentence. The unmodified and modified sentences, along with their words, are represented in the form of pre-trained dense word embeddings, which serve as the input of the network. Sentence-level CNN provides the representation of a sentence, and can thus be utilized to predict whether a sentence is unmodified or has been modified by synonym substitutions. In the second level, a text-level CNN exploits the predicted representations of sentences obtained from the sentence-level CNN to determine whether the detected text is a stego text or cover text. Experimental results indicate that the proposed sentence-level CNN can effectively extract sentence features for sentence-level steganalysis tasks and reaches an average accuracy of 82.245%. Moreover, the proposed steganalysis method achieves greatly improved detection performance when distinguishing stego texts from cover texts.
引用
收藏
页码:1041 / 1058
页数:18
相关论文
共 41 条
[1]  
[Anonymous], 2013, NIPS
[2]  
[Anonymous], ARXIV14042188
[3]  
[Anonymous], 1986, P 8 ANN C COGNITIVE, DOI DOI 10.1109/69.917563
[4]   A neural probabilistic language model [J].
Bengio, Y ;
Ducharme, R ;
Vincent, P ;
Jauvin, C .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (06) :1137-1155
[5]  
Bolshakov IA, 2004, LECT NOTES COMPUT SC, V3200, P180
[6]  
Boureau YL, 2011, IEEE I CONF COMP VIS, P2651, DOI 10.1109/ICCV.2011.6126555
[7]  
Chang CY, 2014, COMPUT LINGUIST, V40, P403, DOI [10.1162/coli_a_00176, 10.1162/COLI_a_00176]
[8]   A novel online incremental and decremental learning algorithm based on variable support vector machine [J].
Chen, Yuantao ;
Xiong, Jie ;
Xu, Weihong ;
Zuo, Jingwen .
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2019, 22 (Suppl 3) :S7435-S7445
[9]   Steganalysis against substitution-based linguistic steganography based on context clusters [J].
Chen, Zhili ;
Huang, Liusheng ;
Miao, Haibo ;
Yang, Wei ;
Meng, Peng .
COMPUTERS & ELECTRICAL ENGINEERING, 2011, 37 (06) :1071-1081
[10]   Detection of substitution-based linguistic steganography by relative frequency analysis [J].
Chen, Zhili ;
Huang, Liusheng ;
Yang, Wei .
DIGITAL INVESTIGATION, 2011, 8 (01) :68-77