Metadata Generation for Multi-Text Classification in Structured Data

被引:0
作者
Trejo, Karla [1 ]
Garcia, Pere [1 ]
Puyol-Gruart, Josep [1 ]
机构
[1] IIIA CSIC, UAB Campus, E-08193 Bellaterra, Catalonia, Spain
来源
ARTIFICIAL INTELLIGENCE RESEARCH AND DEVELOPMENT | 2019年 / 319卷
关键词
text analysis; text mining; data formatting; multi-text classification; topology; metadata; structured data;
D O I
10.3233/FAIA190154
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
dIn today's information-saturated world, text analysis has become an indispensable resource to extract useful data from massive amounts of texts. A large portion of this information is unstructured. Hence, it has created a need for methodologies -Named Entity Recognition (NER), Part-of-Speech (PoS) Tagging, N-grams, Term Frequency - Inverse Document Frequency (TF-IDF)- which can read and understand information based on their meaning, context and linguistic cohesion. However, these approaches on their own fall short if applied in already structured data. The idea of generating metadata which can simultaneously provide situational information from structured text data is proposed in this paper. The abstraction of text as a "group of concepts" can boost the relevance of a word in a collection of documents, which allows a more refined separation of classes and a better performance in multi-text classification tasks.
引用
收藏
页码:417 / 421
页数:5
相关论文
共 50 条
[31]   LiDA: Language-Independent Data Augmentation for Text Classification [J].
Sujana, Yudianto ;
Kao, Hung-Yu .
IEEE ACCESS, 2023, 11 :10894-10901
[32]   Text classification algorithms for mining unstructured data: a SWOT analysis [J].
Kumar A. ;
Dabas V. ;
Hooda P. .
International Journal of Information Technology, 2020, 12 (4) :1159-1169
[33]   Query generation for retrieving data from distributed semistructured documents using a metadata interface [J].
Choe, Guija ;
Nam, Young-Kwang ;
Goguen, Joseph ;
Wang, Guilian .
COMPUTER LANGUAGES SYSTEMS & STRUCTURES, 2009, 35 (04) :422-434
[34]   Predicting metro incident duration using structured data and unstructured text logs [J].
Zhao, Yangyang ;
Ma, Zhenliang ;
Peng, Hui ;
Cheng, Zhanhong .
TRANSPORTMETRICA A-TRANSPORT SCIENCE, 2024,
[35]   An Efficient Approach for Building Compressed Full-text Index for Structured Data [J].
Liang, Jun ;
Xiao, Lin ;
Zhang, Di .
ICCIT: 2009 FOURTH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCES AND CONVERGENCE INFORMATION TECHNOLOGY, VOLS 1 AND 2, 2009, :59-+
[36]   Truth Discovery of Multi-Source Text Data [J].
Chang, Chen ;
Cao, Jianjun ;
Feng, Qin ;
Weng, Nianfeng ;
Shang, Yuling .
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2019, E102D (11) :2249-2252
[37]   A DATA-DRIVEN TEXT SIMILARITY MEASURE BASED ON CLASSIFICATION ALGORITHMS [J].
Cho, Su Gon ;
Kim, Seoung Bum .
INTERNATIONAL JOURNAL OF INDUSTRIAL ENGINEERING-THEORY APPLICATIONS AND PRACTICE, 2017, 24 (03) :328-339
[38]   INVESTIGATING TERM WEIGHTING SCHEMES ON THE CLASSIFICATION PERFORMANCE FOR THE IMBALANCED TEXT DATA [J].
Al Manei, Afra ;
Al Hasani, Iman ;
Wesonga, Ronald .
ADVANCES AND APPLICATIONS IN STATISTICS, 2022, 78 :63-82
[39]   Text Classification Using Ensemble Features Selection and Data Mining Techniques [J].
Shravankumar, B. ;
Ravi, Vadlamani .
SWARM, EVOLUTIONARY, AND MEMETIC COMPUTING, SEMCCO 2014, 2015, 8947 :176-186
[40]   Freedom versus Standardization: Structured Data Generation in a Peer Production Community [J].
Hall, Andrew ;
McRoberts, Sarah ;
Thebault-Spieker, Jacob ;
Lin, Yilun ;
Sen, Shilad ;
Hecht, Brent ;
Terveen, Loren .
PROCEEDINGS OF THE 2017 ACM SIGCHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS (CHI'17), 2017, :6352-6362