Detection of substitution-based linguistic steganography by relative frequency analysis

被引:19
作者
Chen, Zhili [1 ,2 ]
Huang, Liusheng [1 ,2 ]
Yang, Wei [1 ,2 ]
机构
[1] USTC, NHPCC, Sch CS & Tech, Hefei 230027, Peoples R China
[2] USTC, Suzhou Inst Adv Study, Suzhou 215123, Peoples R China
基金
中国国家自然科学基金;
关键词
Information hiding; Linguistic steganography; Relative frequency analysis; Detection; Substitution-based; Synonym-substitution;
D O I
10.1016/j.diin.2011.03.001
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Linguistic steganography hides information in natural language texts. Because of the increasing in importance and quantity of natural language texts, linguistic steganography plays a more and more important role in Information Security (IS) area today. Substitution-based linguistic steganography is one of the most commonly used linguistic steganography methods, which is of considerable security and favorable simpleness. In this paper, we propose a straightforward method based on Relative Frequency Analysis (RFA), which makes use of the frequency characteristics of the testing texts (the texts being tested), to detect substitution-based linguistic steganography. We formally prove several properties about relative frequency which can be used in the detection process and propose a detection scheme. And then as an example, an existent synonym-substitution system T-Lex is examined and the detection experiment is carried out. In the experiment with pure literature texts, the accuracy, precision and recall of the detection are found to be as high as 98.64%, 97.77% and 99.55%, respectively, when the substitution count is 90, while in the experiment with balanced texts, the highest detection accuracy is 95%, which indicates that the detection scheme is promising. (C) 2011 Elsevier Ltd. All rights reserved.
引用
收藏
页码:68 / 77
页数:10
相关论文
共 23 条
[1]  
[Anonymous], LEX DAT ENGL LANG
[2]  
Atallah M., 2000, New Security Paradigms Workshop, P51, DOI DOI 10.1145/366173.366190
[3]  
BENNETT K, 2004, 200413 CERIAS
[4]  
BERGMAIR R, 2004, LINGUISTIC STEGANOGR
[5]  
Bolshakov IA, 2004, LECT NOTES COMPUT SC, V3200, P180
[6]  
Bolshakov IA, 2004, LECT NOTES COMPUT SC, V3136, P312
[7]  
CALVO H, 2004, P CIC2004 13 C INT C, P231
[8]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[9]  
CHAPMAN M, 1997, HIDING HIDDEN SOFTWA
[10]  
CHAPMAN M, 2001, LECT NOTES COMPUTER, V2200, P156