Metadata Generation for Multi-Text Classification in Structured Data

被引:0
|
作者
Trejo, Karla [1 ]
Garcia, Pere [1 ]
Puyol-Gruart, Josep [1 ]
机构
[1] IIIA CSIC, UAB Campus, E-08193 Bellaterra, Catalonia, Spain
来源
ARTIFICIAL INTELLIGENCE RESEARCH AND DEVELOPMENT | 2019年 / 319卷
关键词
text analysis; text mining; data formatting; multi-text classification; topology; metadata; structured data;
D O I
10.3233/FAIA190154
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
dIn today's information-saturated world, text analysis has become an indispensable resource to extract useful data from massive amounts of texts. A large portion of this information is unstructured. Hence, it has created a need for methodologies -Named Entity Recognition (NER), Part-of-Speech (PoS) Tagging, N-grams, Term Frequency - Inverse Document Frequency (TF-IDF)- which can read and understand information based on their meaning, context and linguistic cohesion. However, these approaches on their own fall short if applied in already structured data. The idea of generating metadata which can simultaneously provide situational information from structured text data is proposed in this paper. The abstraction of text as a "group of concepts" can boost the relevance of a word in a collection of documents, which allows a more refined separation of classes and a better performance in multi-text classification tasks.
引用
收藏
页码:417 / 421
页数:5
相关论文
共 50 条
  • [1] Text Mining using Metadata for Generation of Side information
    Bhanuse, Shraddha S.
    Kamble, Shailesh D.
    Kakde, Sandeep M.
    1ST INTERNATIONAL CONFERENCE ON INFORMATION SECURITY & PRIVACY 2015, 2016, 78 : 807 - 814
  • [2] Prognosis Essay Scoring and Article Relevancy Using Multi-Text Features and Machine Learning
    Mehmood, Arif
    On, Byung-Won
    Lee, Ingyu
    Choi, Gyu Sang
    SYMMETRY-BASEL, 2017, 9 (01):
  • [3] Metadata-Induced Contrastive Learning for Zero-Shot Multi-Label Text Classification
    Zhang, Yu
    Shen, Zhihong
    Wu, Chieh-Han
    Xie, Boya
    Hao, Junheng
    Wang, Ye-Yi
    Wang, Kuansan
    Han, Jiawei
    PROCEEDINGS OF THE ACM WEB CONFERENCE 2022 (WWW'22), 2022, : 3162 - 3173
  • [4] Masader: Metadata Sourcing for Arabic Text and Speech Data Resources
    Alyafeai, Zaid
    Masoud, Maraim
    Ghaleb, Mustafa
    Al-Shaibani, Maged S.
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6340 - 6351
  • [5] A Metadata Classification Assisted Scientific Data Extraction Architecture
    Chang, Yue-Shan
    Cheng, Hsiang-Tai
    ADVANCES IN GRID AND PERVASIVE COMPUTING, PROCEEDINGS, 2010, 6104 : 679 - 688
  • [6] Classification of Metadata Categories in Data Warehousing - A Generic Approach
    Gabriel, Roland
    Hoppe, Tobias
    Pastwa, Alexander
    AMCIS 2010 PROCEEDINGS, 2010,
  • [7] Proposed Architecture for Automatic Conversion of Unstructured Text Data into Structured Text Data on the Web
    Madhusudhan, Ch.
    Rao, K. Mrithyunjaya
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2013, 13 (12): : 110 - 116
  • [8] MotifClass: Weakly Supervised Text Classification with Higher-order Metadata Information
    Zhang, Yu
    Garg, Shweta
    Meng, Yu
    Chen, Xiusi
    Han, Jiawei
    WSDM'22: PROCEEDINGS OF THE FIFTEENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2022, : 1357 - 1367
  • [9] Naive Bayesian classification of structured data
    Flach, PA
    Lachiche, N
    MACHINE LEARNING, 2004, 57 (03) : 233 - 269
  • [10] Naive Bayesian Classification of Structured Data
    Peter A. Flach
    Nicolas Lachiche
    Machine Learning, 2004, 57 : 233 - 269