A Hierarchical Information Compression Approach for Knowledge Discovery From Social Multimedia

被引：0

作者：

Liu, Zheng ^{[1
]}

Weng, Yu ^{[1
]}

Xu, Ruiyang ^{[1
]}

Chaomurilige ^{[1
]}

Gao, Honghao ^{[2
]}

机构：

[1] Minzu Univ China, Key Lab Ethn Language Intelligent Anal & Secur Go, Minist Educ, Beijing 100081, Peoples R China

[2] Shanghai Univ, Sch Comp Engn & Sci, Shanghai 200444, Peoples R China

来源：

IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS | 2024年 / 11卷 / 06期

关键词：

Cross-document learning; deduplication; key feature distillation; knowledge discovery;

D O I：

10.1109/TCSS.2024.3440997

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Knowledge discovery is an ongoing research endeavor aimed at uncovering valuable insights and patterns from large volumes of data in massive social systems (MSSs). Although recent advances in deep learning have made significant progress in knowledge discovery, the "data dimensionality reduction" problem still poses practical challenges. To address this, we have introduced a hierarchical information compression (IC) approach, which emphasizes the elimination of redundant and irrelevant features and the generation of high-quality knowledge representation, aiming to enhance the information density of the knowledge discovery process. Our approach consists of coarse-grained and fine-grained stages for data compression. In the coarse-grained stage, our method employs the key feature distiller based on the Siamese network to effectively identify a substantial number of irrelevant features and latent redundancies within coarse-grained data blocks. Moving on to the fine-grained stage, our model further compresses the internal features of the data, extracting the most crucial knowledge and facilitating data compression by cross-block learning. By implementing these two stages, the approach achieves both inter and innerblock IC while preserving essential knowledge. To validate the performance of our proposed model, we conducted several experiments using WikiSum, a large knowledge corpus based on English Wikipedia in MSSs. The experimental results demonstrate that our model achieved a 2.38% increase on recall-oriented understudy for gisting evaluation (ROUGE)-2 and an improvement of over 7% on the informativeness and conciseness metrics, as evidenced by the improved scores obtained from both automatic and human evaluations. The experimental results prove that our model can effectively select the most pertinent and meaningful content and reduce the redundancy to generate better knowledge representation.

引用

页码：7754 / 7765

页数：12

共 37 条

[1] Comparative Analysis of Feature Selection Algorithms for Computational Personality Prediction From Social Media [J].

Al Marouf, Ahmed ;

Hasan, Md. Kamrul ;

Mahmud, Hasan .

IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2020, 7 (03) :587-599

[2]

Amplayo RK, 2021, 16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), P2662

[3] Explainable Machine Learning for Data Extraction Across Computational Social System [J].

Bhuyan, Hemanta Kumar ;

Chakraborty, Chinmay .

IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024, 11 (03) :3131-3145

[4]

Carbonell J., 1998, Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P335, DOI 10.1145/290941.291025

[5]

Chi Z., 2018, Journal of Intelligent Learning Systems and Applications, V10, P121

[6] A Web Knowledge-Driven Multimodal Retrieval Method in Computational Social Systems: Unsupervised and Robust Graph Convolutional Hashing [J].

Duan, Youxiang ;

Chen, Ning ;

Bashir, Ali Kashif ;

Alshehri, Mohammad Dahman ;

Liu, Lei ;

Zhang, Peiying ;

Yu, Keping .

IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024, 11 (03) :3146-3156

[7]

Gao TY, 2021, 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), P6894

[8]

Ghalandari DG, 2020, 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), P1302

[9]

Goodrich Ben, 2018, Generating Wikipedia by Summarizing Long Sequences

[10]

Han X, 2018, 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), P2236

← 1 2 3 4 →