Metadata Generation for Multi-Text Classification in Structured Data

被引:0
作者
Trejo, Karla [1 ]
Garcia, Pere [1 ]
Puyol-Gruart, Josep [1 ]
机构
[1] IIIA CSIC, UAB Campus, E-08193 Bellaterra, Catalonia, Spain
来源
ARTIFICIAL INTELLIGENCE RESEARCH AND DEVELOPMENT | 2019年 / 319卷
关键词
text analysis; text mining; data formatting; multi-text classification; topology; metadata; structured data;
D O I
10.3233/FAIA190154
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
dIn today's information-saturated world, text analysis has become an indispensable resource to extract useful data from massive amounts of texts. A large portion of this information is unstructured. Hence, it has created a need for methodologies -Named Entity Recognition (NER), Part-of-Speech (PoS) Tagging, N-grams, Term Frequency - Inverse Document Frequency (TF-IDF)- which can read and understand information based on their meaning, context and linguistic cohesion. However, these approaches on their own fall short if applied in already structured data. The idea of generating metadata which can simultaneously provide situational information from structured text data is proposed in this paper. The abstraction of text as a "group of concepts" can boost the relevance of a word in a collection of documents, which allows a more refined separation of classes and a better performance in multi-text classification tasks.
引用
收藏
页码:417 / 421
页数:5
相关论文
共 50 条
[21]   An Ensemble of Statistical Metadata and CNN Classification of Class Imbalanced Skin Lesion Data [J].
Nayak S. ;
Vincent S. ;
Sumathi K. ;
Kumar O.P. ;
Pathan S. .
International Journal of Electronics and Telecommunications, 2022, 68 (02) :251-257
[22]   Operationalizing Data Governance via Multi-level Metadata Management [J].
van Helvoirt, Stefhan ;
Weigand, Hans .
OPEN AND BIG DATA MANAGEMENT AND INNOVATION, I3E 2015, 2015, 9373 :160-172
[23]   Automatic Classification and Visualization of Text Data on Rare Diseases [J].
Rei, Luis ;
Costa, Joao Pita ;
Draksler, Tanja Zdolsek .
JOURNAL OF PERSONALIZED MEDICINE, 2024, 14 (05)
[24]   Power of expression in the electronic patient record: structured data or narrative text? [J].
Lovis, C ;
Baud, RH ;
Planche, P .
INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2000, 58 :101-110
[25]   TEXT CLASSIFICATION USING MODIFIED MULTI CLASS ASSOCIATION RULE [J].
Kamaruddin, Siti Sakira ;
Yusof, Yuhanis ;
Husni, Husniza ;
Al Refai, Mohammad Hayel .
JURNAL TEKNOLOGI, 2016, 78 (8-2) :163-170
[26]   Ranking in Multi Label Classification of Text Documents Using Quantifiers [J].
Jindal, Rajni ;
Taneja, Shweta .
PROCEEDINGS 5TH IEEE INTERNATIONAL CONFERENCE ON CONTROL SYSTEM, COMPUTING AND ENGINEERING (ICCSCE 2015), 2015, :162-166
[27]   Classification of structured validation data using stateless and stateful features [J].
Schwenk, G. ;
Pabst, R. ;
Mueller, K. R. .
COMPUTER COMMUNICATIONS, 2019, 138 :54-66
[28]   Predicting Project's Uncertainty Risk in the Bidding Process by Integrating Unstructured Text Data and Structured Numerical Data Using Text Mining [J].
Lee, JeeHee ;
Yi, June-Seong .
APPLIED SCIENCES-BASEL, 2017, 7 (11)
[29]   Multi-proximity based embedding scheme for learning vector quantization-based classification of biochemical structured data [J].
Bohnsack, Katrin Sophie ;
Voigt, Julius ;
Kaden, Marika ;
Heinke, Florian ;
Villmann, Thomas .
NEUROCOMPUTING, 2023, 554
[30]   Combining text mining and data mining for bug report classification [J].
Zhou, Yu ;
Tong, Yanxiang ;
Gu, Ruihang ;
Gall, Harald .
JOURNAL OF SOFTWARE-EVOLUTION AND PROCESS, 2016, 28 (03) :150-176