Nested Dolls: Towards Unsupervised Clustering of Web Tables

被引:0
|
作者
Khan, Rituparna [1 ]
Gubanov, Michael [1 ]
机构
[1] Florida State Univ, Dept Comp Sci, Tallahassee, FL 32306 USA
来源
2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) | 2018年
基金
美国国家科学基金会;
关键词
Web-search; Large-scale Data Management; Big Data; Data Fusion; Data Integration; Data Cleaning; Summarization; Human-Computer Interaction; Machine Learning; Natural Language Processing (NLP);
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Here we discuss our initial efforts towards unsupervised clustering of a large-scale Web tables dataset. We improve our previous approach of weakly-supervised clustering, where an operator would provide a few descriptive keywords to generate an entity-identifying classifier, which is applied to the corpora to form a cohesive entity-centric cluster [1]. Here, we make a next step towards fully unsupervised algorithm by automatically generating these descriptive keywords. These keywords then can be used to generate high-precision training data and train a classifier to form a cluster. Here, we describe and evaluate this new unsupervised keyword generation algorithm and apply it to a large-scale Web tables corpus to form initial small high-precision clusters.
引用
收藏
页码:5357 / 5359
页数:3
相关论文
共 50 条
  • [1] Scalable Spam Classifier for Web Tables
    Villasenor, Santiago
    Nguyen, Tom
    Kola, Anusha
    Soderman, Sean
    Gubanov, Michael
    2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 4849 - 4851
  • [2] Towards a Hybrid Imputation Approach Using Web Tables
    Ahmadov, Ahmad
    Thiele, Maik
    Eberius, Julian
    Lehner, Wolfgang
    Wrembel, Robert
    2015 IEEE/ACM 2ND INTERNATIONAL SYMPOSIUM ON BIG DATA COMPUTING (BDC), 2015, : 21 - 30
  • [3] CORAZON: a web server for data normalization and unsupervised clustering based on expression profiles
    Ramos, Thais A. R.
    Maracaja-Coutinho, Vinicius
    Ortega, J. Miguel
    do Rego, Thais G.
    BMC RESEARCH NOTES, 2020, 13 (01)
  • [4] CORAZON: a web server for data normalization and unsupervised clustering based on expression profiles
    Thaís A. R. Ramos
    Vinicius Maracaja-Coutinho
    J. Miguel Ortega
    Thaís G. do Rêgo
    BMC Research Notes, 13
  • [5] Implementation of Unsupervised k-Means Clustering Algorithm within Amazon Web Services Lambda
    Deese, Anthony S.
    2018 18TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2018, : 626 - 632
  • [6] Identifying Web Tables: Supporting a Neglected Type of Content on the Web
    Galkin, Mikhail
    Mouromtsev, Dmitry
    Auer, Soeren
    KNOWLEDGE ENGINEERING AND SEMANTIC WEB, KESW 2015, 2015, 518 : 48 - 62
  • [7] Metamorphic Exploration of an Unsupervised Clustering Program
    Yang, Sen
    Towey, Dave
    Zhou, Zhi Quan
    2019 IEEE/ACM 4TH INTERNATIONAL WORKSHOP ON METAMORPHIC TESTING (MET 2019), 2019, : 48 - 54
  • [8] Syntactic clustering of the Web
    Broder, AZ
    Glassman, SC
    Manasse, MS
    Zweig, G
    COMPUTER NETWORKS AND ISDN SYSTEMS, 1997, 29 (8-13): : 1157 - 1166
  • [9] Integration of HTML']HTML Tables in Web Pages
    Akbar, Memen
    Azizah, Fazat Nur
    Saptawati, G. A. Putri
    2015 INTERNATIONAL CONFERENCE ON DATA AND SOFTWARE ENGINEERING (ICODSE), 2015, : 132 - 137
  • [10] Quantum spectral clustering algorithm for unsupervised learning
    Qingyu Li
    Yuhan Huang
    Shan Jin
    Xiaokai Hou
    Xiaoting Wang
    Science China Information Sciences, 2022, 65