Metadata Generation for Multi-Text Classification in Structured Data

被引:0
作者
Trejo, Karla [1 ]
Garcia, Pere [1 ]
Puyol-Gruart, Josep [1 ]
机构
[1] IIIA CSIC, UAB Campus, E-08193 Bellaterra, Catalonia, Spain
来源
ARTIFICIAL INTELLIGENCE RESEARCH AND DEVELOPMENT | 2019年 / 319卷
关键词
text analysis; text mining; data formatting; multi-text classification; topology; metadata; structured data;
D O I
10.3233/FAIA190154
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
dIn today's information-saturated world, text analysis has become an indispensable resource to extract useful data from massive amounts of texts. A large portion of this information is unstructured. Hence, it has created a need for methodologies -Named Entity Recognition (NER), Part-of-Speech (PoS) Tagging, N-grams, Term Frequency - Inverse Document Frequency (TF-IDF)- which can read and understand information based on their meaning, context and linguistic cohesion. However, these approaches on their own fall short if applied in already structured data. The idea of generating metadata which can simultaneously provide situational information from structured text data is proposed in this paper. The abstraction of text as a "group of concepts" can boost the relevance of a word in a collection of documents, which allows a more refined separation of classes and a better performance in multi-text classification tasks.
引用
收藏
页码:417 / 421
页数:5
相关论文
共 50 条
  • [21] Operationalizing Data Governance via Multi-level Metadata Management
    van Helvoirt, Stefhan
    Weigand, Hans
    OPEN AND BIG DATA MANAGEMENT AND INNOVATION, I3E 2015, 2015, 9373 : 160 - 172
  • [22] Automatic Classification and Visualization of Text Data on Rare Diseases
    Rei, Luis
    Costa, Joao Pita
    Draksler, Tanja Zdolsek
    JOURNAL OF PERSONALIZED MEDICINE, 2024, 14 (05):
  • [23] Power of expression in the electronic patient record: structured data or narrative text?
    Lovis, C
    Baud, RH
    Planche, P
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2000, 58 : 101 - 110
  • [24] TEXT CLASSIFICATION USING MODIFIED MULTI CLASS ASSOCIATION RULE
    Kamaruddin, Siti Sakira
    Yusof, Yuhanis
    Husni, Husniza
    Al Refai, Mohammad Hayel
    JURNAL TEKNOLOGI, 2016, 78 (8-2): : 163 - 170
  • [25] Ranking in Multi Label Classification of Text Documents Using Quantifiers
    Jindal, Rajni
    Taneja, Shweta
    PROCEEDINGS 5TH IEEE INTERNATIONAL CONFERENCE ON CONTROL SYSTEM, COMPUTING AND ENGINEERING (ICCSCE 2015), 2015, : 162 - 166
  • [26] Classification of structured validation data using stateless and stateful features
    Schwenk, G.
    Pabst, R.
    Mueller, K. R.
    COMPUTER COMMUNICATIONS, 2019, 138 : 54 - 66
  • [27] Predicting Project's Uncertainty Risk in the Bidding Process by Integrating Unstructured Text Data and Structured Numerical Data Using Text Mining
    Lee, JeeHee
    Yi, June-Seong
    APPLIED SCIENCES-BASEL, 2017, 7 (11):
  • [28] Multi-proximity based embedding scheme for learning vector quantization-based classification of biochemical structured data
    Bohnsack, Katrin Sophie
    Voigt, Julius
    Kaden, Marika
    Heinke, Florian
    Villmann, Thomas
    NEUROCOMPUTING, 2023, 554
  • [29] Combining text mining and data mining for bug report classification
    Zhou, Yu
    Tong, Yanxiang
    Gu, Ruihang
    Gall, Harald
    JOURNAL OF SOFTWARE-EVOLUTION AND PROCESS, 2016, 28 (03) : 150 - 176
  • [30] LiDA: Language-Independent Data Augmentation for Text Classification
    Sujana, Yudianto
    Kao, Hung-Yu
    IEEE ACCESS, 2023, 11 : 10894 - 10901