Sparse multiple instance learning as document classification

被引:0
作者
Shengye Yan
Xiaodong Zhu
Guoqing Liu
Jianxin Wu
机构
[1] NUIST,B
[2] Minieye,DAT, CICAEET, School of Information and Control
[3] Youjia Innovation LLC,National Key Laboratory for Novel Software Technology
[4] Nanjing University,undefined
来源
Multimedia Tools and Applications | 2017年 / 76卷
关键词
Sparse multiple instance learning; Low witness rate; Structural representation; Document classification;
D O I
暂无
中图分类号
学科分类号
摘要
This work focuses on multiple instance learning (MIL) with sparse positive bags (which we name as sparse MIL). A structural representation is presented to encode both instances and bags. This representation leads to a non-i.i.d. MIL algorithm, miStruct, which uses a structural similarity to compare bags. Furthermore, MIL with this representation is shown to be equivalent to a document classification problem. Document classification also suffers from the fact that only few paragraphs/words are useful in revealing the category of a document. By using the TF-IDF representation which has excellent empirical performance in document classification, the miDoc method is proposed. The proposed methods achieve significantly higher accuracies and AUC (area under the ROC curve) than the state-of-the-art in a large number of sparse MIL problems, and the document classification analogy explains their efficacy in sparse MIL problems.
引用
收藏
页码:4553 / 4570
页数:17
相关论文
共 50 条
[21]   DEFENDING ACTIVE LEARNING AGAINST ADVERSARIAL INPUTS IN AUTOMATED DOCUMENT CLASSIFICATION [J].
Pi, Lei ;
Lu, Zhuo ;
Sagduyu, Yalin ;
Chen, Su .
2016 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP), 2016, :257-261
[22]   The impact of deep learning on document classification using semantically rich representations [J].
Kastrati, Zenun ;
Imran, Ali Shariq ;
Yayilgan, Sule Yildirim .
INFORMATION PROCESSING & MANAGEMENT, 2019, 56 (05) :1618-1632
[23]   Genetic Programming based Transfer Learning for Document Classification with Self-taught and Ensemble Learning [J].
Fu, Wenlong ;
Xue, Bing ;
Gao, Xiaoying ;
Zhang, Mengjie .
2019 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2019, :2260-2267
[24]   Scalable document classification [J].
Lee, Jae-Moon ;
Calvo, Rafael A. .
INTELLIGENT DATA ANALYSIS, 2005, 9 (04) :365-380
[25]   Supervised topic models with word order structure for document classification and retrieval learning [J].
Jameel, Shoaib ;
Lam, Wai ;
Bing, Lidong .
INFORMATION RETRIEVAL JOURNAL, 2015, 18 (04) :283-330
[26]   Supervised topic models with word order structure for document classification and retrieval learning [J].
Shoaib Jameel ;
Wai Lam ;
Lidong Bing .
Information Retrieval Journal, 2015, 18 :283-330
[27]   Domain Adaptation for Document Classification by Alternately Using Semi-supervised Learning and Feature Weighted Learning [J].
Shinnou, Hiroyuki ;
Komiya, Kanako ;
Sasaki, Minoru .
COMPUTATIONAL LINGUISTICS, PACLING 2017, 2018, 781 :205-216
[28]   Long Document Classification From Local Word Glimpses via Recurrent Attention Learning [J].
He, Jun ;
Wang, Liqun ;
Liu, Liu ;
Feng, Jiao ;
Wu, Hao .
IEEE ACCESS, 2019, 7 :40707-40718
[29]   Discriminative learning of generative models: large margin multinomial mixture models for document classification [J].
Hui Jiang ;
Zhenyu Pan ;
Pingzhao Hu .
Pattern Analysis and Applications, 2015, 18 :535-551
[30]   Discriminative learning of generative models: large margin multinomial mixture models for document classification [J].
Jiang, Hui ;
Pan, Zhenyu ;
Hu, Pingzhao .
PATTERN ANALYSIS AND APPLICATIONS, 2015, 18 (03) :535-551