Nested Dolls: Towards Unsupervised Clustering of Web Tables

被引:0
作者
Khan, Rituparna [1 ]
Gubanov, Michael [1 ]
机构
[1] Florida State Univ, Dept Comp Sci, Tallahassee, FL 32306 USA
来源
2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) | 2018年
基金
美国国家科学基金会;
关键词
Web-search; Large-scale Data Management; Big Data; Data Fusion; Data Integration; Data Cleaning; Summarization; Human-Computer Interaction; Machine Learning; Natural Language Processing (NLP);
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Here we discuss our initial efforts towards unsupervised clustering of a large-scale Web tables dataset. We improve our previous approach of weakly-supervised clustering, where an operator would provide a few descriptive keywords to generate an entity-identifying classifier, which is applied to the corpora to form a cohesive entity-centric cluster [1]. Here, we make a next step towards fully unsupervised algorithm by automatically generating these descriptive keywords. These keywords then can be used to generate high-precision training data and train a classifier to form a cluster. Here, we describe and evaluate this new unsupervised keyword generation algorithm and apply it to a large-scale Web tables corpus to form initial small high-precision clusters.
引用
收藏
页码:5357 / 5359
页数:3
相关论文
共 50 条
  • [21] Emergent unsupervised clustering paradigms with potential application to bioinformatics
    Miller, David J.
    Wang, Yue
    Kesidis, George
    FRONTIERS IN BIOSCIENCE-LANDMARK, 2008, 13 : 677 - 690
  • [22] Hypercluster: a flexible tool for parallelized unsupervised clustering optimization
    Lili Blumenberg
    Kelly V. Ruggles
    BMC Bioinformatics, 21
  • [23] Blanket Clusterer: A Tool for Automating the Clustering in Unsupervised Learning
    Bogdanoski, Konstantin
    Mishev, Kostadin
    Trajanov, Dimitar
    DELTA: PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON DEEP LEARNING THEORY AND APPLICATIONS, 2022, : 125 - 131
  • [24] A Novel Unsupervised Spectral Clustering for Pure-Tone Audiograms towards Hearing Aid Filter Bank Design and Initial Configurations
    Elkhouly, Abeer
    Andrew, Allan Melvin
    Rahim, Hasliza A.
    Abdulaziz, Nidhal
    Abdulmalek, Mohamedfareq
    Mohd Yasin, Mohd Najib
    Jusoh, Muzammil
    Sabapathy, Thennarasan
    Siddique, Shafiquzzaman
    APPLIED SCIENCES-BASEL, 2022, 12 (01):
  • [25] Towards the ubiquitous Web
    Hotho, Andreas
    Stumme, Gerd
    SEMANTIC WEB, 2010, 1 (1-2) : 117 - 119
  • [26] Regression on imperfect class labels derived by unsupervised clustering
    Brondum, Rasmus Froberg
    Michaelsen, Thomas Yssing
    Bogsted, Martin
    BRIEFINGS IN BIOINFORMATICS, 2021, 22 (02) : 2012 - 2019
  • [27] Unsupervised view and rate invariant clustering of video sequences
    Turaga, Pavan
    Veeraraghavan, Ashok
    Chellappa, Rama
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2009, 113 (03) : 353 - 371
  • [28] Hypercluster: a flexible tool for parallelized unsupervised clustering optimization
    Blumenberg, Lili
    Ruggles, Kelly V.
    BMC BIOINFORMATICS, 2020, 21 (01)
  • [29] Web Usage Classification and Clustering Approach for Web Search Personalization
    Vijayalakshmi, K.
    Jena, Sudarson
    6TH INTERNATIONAL CONFERENCE ON COMPUTER & COMMUNICATION TECHNOLOGY (ICCCT-2015), 2015, : 376 - 383
  • [30] An Interactive Approach to Region of Interest Selection in Cytologic Analysis of Uveal Melanoma Based on Unsupervised Clustering
    Chen, Haomin
    Liu, T. Y. Alvin
    Correa, Zelia
    Unberath, Mathias
    OPHTHALMIC MEDICAL IMAGE ANALYSIS, OMIA 2020, 2020, 12069 : 114 - 124