Nested Dolls: Towards Unsupervised Clustering of Web Tables

被引:0
作者
Khan, Rituparna [1 ]
Gubanov, Michael [1 ]
机构
[1] Florida State Univ, Dept Comp Sci, Tallahassee, FL 32306 USA
来源
2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) | 2018年
基金
美国国家科学基金会;
关键词
Web-search; Large-scale Data Management; Big Data; Data Fusion; Data Integration; Data Cleaning; Summarization; Human-Computer Interaction; Machine Learning; Natural Language Processing (NLP);
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Here we discuss our initial efforts towards unsupervised clustering of a large-scale Web tables dataset. We improve our previous approach of weakly-supervised clustering, where an operator would provide a few descriptive keywords to generate an entity-identifying classifier, which is applied to the corpora to form a cohesive entity-centric cluster [1]. Here, we make a next step towards fully unsupervised algorithm by automatically generating these descriptive keywords. These keywords then can be used to generate high-precision training data and train a classifier to form a cluster. Here, we describe and evaluate this new unsupervised keyword generation algorithm and apply it to a large-scale Web tables corpus to form initial small high-precision clusters.
引用
收藏
页码:5357 / 5359
页数:3
相关论文
共 50 条
[11]   Unsupervised training of Bayesian networks for data clustering [J].
Pham, Duc Truong ;
Ruz, Gonzalo A. .
PROCEEDINGS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2009, 465 (2109) :2927-2948
[12]   Quantum spectral clustering algorithm for unsupervised learning [J].
Li, Qingyu ;
Huang, Yuhan ;
Jin, Shan ;
Hou, Xiaokai ;
Wang, Xiaoting .
SCIENCE CHINA-INFORMATION SCIENCES, 2022, 65 (10)
[13]   Optimization of unsupervised affinity propagation clustering method [J].
Alameddine, Jihan ;
Chehdi, Kacem ;
Cariou, Claude .
IMAGE AND SIGNAL PROCESSING FOR REMOTE SENSING XXV, 2019, 11155
[14]   Clustering and evolutionary approach for longitudinal web traffic analysis [J].
Morichetta, Andrea ;
Mellia, Marco .
PERFORMANCE EVALUATION, 2019, 135
[15]   Predicting climate types for the Continental United States using unsupervised clustering techniques [J].
Sathiaraj, D. ;
Huang, X. ;
Chen, J. .
ENVIRONMETRICS, 2019, 30 (04)
[17]   Combination clustering for Web correlation [J].
Takahashi, K ;
Miura, T ;
Shioya, I .
2005 IEEE PACIFIC RIM CONFERENCE ON COMMUNICATIONS, COMPUTERS AND SIGNAL PROCESSING (PACRIM), 2005, :434-437
[18]   Unsupervised Behavioural Mining and Clustering for Malware Family Identification [J].
Khanh Huu The Dam ;
Given-Wilson, Thomas ;
Legay, Axel .
36TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2021, 2021, :374-383
[19]   Hypercluster: a flexible tool for parallelized unsupervised clustering optimization [J].
Blumenberg, Lili ;
Ruggles, Kelly V. .
BMC BIOINFORMATICS, 2020, 21 (01)
[20]   Unsupervised view and rate invariant clustering of video sequences [J].
Turaga, Pavan ;
Veeraraghavan, Ashok ;
Chellappa, Rama .
COMPUTER VISION AND IMAGE UNDERSTANDING, 2009, 113 (03) :353-371