A network-based CNN model to identify the hidden information in text data

被引:8
作者
Liu, Yanyan [1 ]
Li, Keping [1 ]
Yan, Dongyang [1 ]
Gu, Shuang [1 ]
机构
[1] Beijing Jiaotong Univ, State Key Lab Rail Traff Control & Safety, Beijing 100044, Peoples R China
基金
中国国家自然科学基金;
关键词
Text data; Hidden information detection; Network model; Random walk; CNN; BAYESIAN NETWORKS; COMPLEX NETWORKS; LANGUAGE;
D O I
10.1016/j.physa.2021.126744
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
With the development of the internet and big data, the missing or hidden information identification of text data has become an imperative task. At present, the challenge in the hidden information study is judging whether there is hidden information and where it exists. In this paper, hidden information refers to the words that do not appear in a sentence, however, they have certain correlations with the existing words or sentence and have a great influence on the comprehension of a sentence or part of the text data. This paper focuses on discovering the key and influential hidden information in the text data. A keyword-based hidden information extraction framework is proposed in this paper to search hidden entities, with the assumption that the importance of hidden objects is reflected by the keywords in the text data. A network-based Convolution Neural Network (CNN) model is developed to identify the hidden information related to keywords. The model is based on the results of CNN, and cosine similarity is used to judge whether there is hidden information in the source text data or not. We primarily form the word co-occurrence network of text, select the words with the highest degree as keywords, and generate random walk paths on the network. Besides, we use the random walk path where the last word is the keyword to train CNN. In the experimental section, the proposed model is applied to the dataset in 20Newgroups. The results show that the proposed model can effectively identify the hidden information associated with the keywords in the source text data, and the detection accuracy of keywords can reach 98%-99% achieved by CNN. (C) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页数:16
相关论文
共 48 条
[1]   Text Authorship Identified Using the Dynamics of Word Co-Occurrence Networks [J].
Akimushkin, Camilo ;
Amancio, Diego Raphael ;
Oliveira, Osvaldo Novais, Jr. .
PLOS ONE, 2017, 12 (01)
[2]   Using metrics from complex networks to evaluate machine translation [J].
Amancio, D. R. ;
Nunes, M. G. V. ;
Oliveira, O. N., Jr. ;
Pardo, T. A. S. ;
Antiqueira, L. ;
Costa, L. da F. .
PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2011, 390 (01) :131-142
[3]   Unveiling the relationship between complex networks metrics and word senses [J].
Amancio, Diego R. ;
Oliveira, Osvaldo N., Jr. ;
Costa, Luciano da F. .
EPL, 2012, 98 (01)
[4]  
[Anonymous], 2009, P 2009 WORKSH GRAPH
[5]  
[Anonymous], 2017, ARXIV160502115
[6]   Strong correlations between text quality and complex networks features [J].
Antiqueira, L. ;
Nunes, M. G. V. ;
Oliveira, O. N., Jr. ;
Costa, L. da F. .
PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2007, 373 :811-820
[7]   Sentiment analysis and spam detection in short informal text using learning classifier systems [J].
Arif, Muhammad Hassan ;
Li, Jianxin ;
Iqbal, Muhammad ;
Liu, Kaixu .
SOFT COMPUTING, 2018, 22 (21) :7281-7291
[8]   How to predict crime - informatics-inspired approach from link prediction [J].
Assouli, Nora ;
Benahmed, Khelifa ;
Gasbaoui, Brahim .
PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2021, 570
[9]   LSMD: A fast and robust local community detection starting from low degree nodes in social networks [J].
Bouyer, Asgarali ;
Roghani, Hamid .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2020, 113 :41-57
[10]   Analysis of hidden node problem in LTE networks deployed in unlicensed spectrum [J].
Campos, Pablo ;
Hernandez-Solana, Angela ;
Valdovinos-Bardaji, Antonio .
COMPUTER NETWORKS, 2020, 177