Nested Dolls: Towards Unsupervised Clustering of Web Tables

被引:0
|
作者
Khan, Rituparna [1 ]
Gubanov, Michael [1 ]
机构
[1] Florida State Univ, Dept Comp Sci, Tallahassee, FL 32306 USA
来源
2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) | 2018年
基金
美国国家科学基金会;
关键词
Web-search; Large-scale Data Management; Big Data; Data Fusion; Data Integration; Data Cleaning; Summarization; Human-Computer Interaction; Machine Learning; Natural Language Processing (NLP);
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Here we discuss our initial efforts towards unsupervised clustering of a large-scale Web tables dataset. We improve our previous approach of weakly-supervised clustering, where an operator would provide a few descriptive keywords to generate an entity-identifying classifier, which is applied to the corpora to form a cohesive entity-centric cluster [1]. Here, we make a next step towards fully unsupervised algorithm by automatically generating these descriptive keywords. These keywords then can be used to generate high-precision training data and train a classifier to form a cluster. Here, we describe and evaluate this new unsupervised keyword generation algorithm and apply it to a large-scale Web tables corpus to form initial small high-precision clusters.
引用
收藏
页码:5357 / 5359
页数:3
相关论文
共 50 条
  • [11] Unsupervised training of Bayesian networks for data clustering
    Pham, Duc Truong
    Ruz, Gonzalo A.
    PROCEEDINGS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2009, 465 (2109): : 2927 - 2948
  • [12] Optimization of unsupervised affinity propagation clustering method
    Alameddine, Jihan
    Chehdi, Kacem
    Cariou, Claude
    IMAGE AND SIGNAL PROCESSING FOR REMOTE SENSING XXV, 2019, 11155
  • [13] Quantum spectral clustering algorithm for unsupervised learning
    Li, Qingyu
    Huang, Yuhan
    Jin, Shan
    Hou, Xiaokai
    Wang, Xiaoting
    SCIENCE CHINA-INFORMATION SCIENCES, 2022, 65 (10)
  • [14] Clustering and evolutionary approach for longitudinal web traffic analysis
    Morichetta, Andrea
    Mellia, Marco
    PERFORMANCE EVALUATION, 2019, 135
  • [15] Predicting climate types for the Continental United States using unsupervised clustering techniques
    Sathiaraj, D.
    Huang, X.
    Chen, J.
    ENVIRONMETRICS, 2019, 30 (04)
  • [17] Combination clustering for Web correlation
    Takahashi, K
    Miura, T
    Shioya, I
    2005 IEEE PACIFIC RIM CONFERENCE ON COMMUNICATIONS, COMPUTERS AND SIGNAL PROCESSING (PACRIM), 2005, : 434 - 437
  • [18] Unsupervised Behavioural Mining and Clustering for Malware Family Identification
    Khanh Huu The Dam
    Given-Wilson, Thomas
    Legay, Axel
    36TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2021, 2021, : 374 - 383
  • [19] Glaucoma monitoring using manifold learning and unsupervised clustering
    Yousefi, Siamak
    Elze, Tobias
    Pasquale, Louis R.
    Boland, Michael
    2018 INTERNATIONAL CONFERENCE ON IMAGE AND VISION COMPUTING NEW ZEALAND (IVCNZ), 2018,
  • [20] Unsupervised clustering of Roman potsherds via Variational Autoencoders
    Parisotto, Simone
    Leone, Ninetta
    Schonlieb, Carola-Bibiane
    Launaro, Alessandro
    JOURNAL OF ARCHAEOLOGICAL SCIENCE, 2022, 142