Nested Dolls: Towards Unsupervised Clustering of Web Tables

被引:0
作者
Khan, Rituparna [1 ]
Gubanov, Michael [1 ]
机构
[1] Florida State Univ, Dept Comp Sci, Tallahassee, FL 32306 USA
来源
2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) | 2018年
基金
美国国家科学基金会;
关键词
Web-search; Large-scale Data Management; Big Data; Data Fusion; Data Integration; Data Cleaning; Summarization; Human-Computer Interaction; Machine Learning; Natural Language Processing (NLP);
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Here we discuss our initial efforts towards unsupervised clustering of a large-scale Web tables dataset. We improve our previous approach of weakly-supervised clustering, where an operator would provide a few descriptive keywords to generate an entity-identifying classifier, which is applied to the corpora to form a cohesive entity-centric cluster [1]. Here, we make a next step towards fully unsupervised algorithm by automatically generating these descriptive keywords. These keywords then can be used to generate high-precision training data and train a classifier to form a cluster. Here, we describe and evaluate this new unsupervised keyword generation algorithm and apply it to a large-scale Web tables corpus to form initial small high-precision clusters.
引用
收藏
页码:5357 / 5359
页数:3
相关论文
共 50 条
[31]   SJClust: Towards a Framework for Integrating Similarity Join Algorithms and Clustering [J].
Ribeiro, Leonardo Andrade ;
Cuzzocrea, Alfredo ;
Alves Bezerra, Karen Aline ;
do Nascimento, Ben Hur Bahia .
PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS, VOL 1 (ICEIS), 2016, :75-80
[32]   A framework for dynamic topic clustering on the web [J].
Dichev, C ;
Dicheva, D ;
Radenski, A .
IC'2001: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INTERNET COMPUTING, VOLS I AND II, 2001, :983-989
[33]   County-level phenomapping to identify disparities in cardiovascular outcomes: An unsupervised clustering analysis Short title: Unsupervised clustering of counties and risk of cardiovascular mortality [J].
Segar, Matthew W. ;
Rao, Shreya ;
Navar, Ann Marie ;
Michos, Erin D. ;
Lewis, Alana ;
Correa, Adolfo ;
Sims, Mario ;
Khera, Amit ;
Hughes, Amy E. ;
Pandey, Ambarish .
AMERICAN JOURNAL OF PREVENTIVE CARDIOLOGY, 2020, 4
[34]   Unsupervised incremental acquisition of a thematic corpus from the Web [J].
Duclaye, F ;
Yvon, F ;
Collin, O .
2003 INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, PROCEEDINGS, 2003, :752-757
[35]   Unsupervised Learning of Image Segmentation Based on Differentiable Feature Clustering [J].
Kim, Wonjik ;
Kanezaki, Asako ;
Tanaka, Masayuki .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 :8055-8068
[36]   Unsupervised Clustering at the Service of Automatic Anomaly Detection in Industry 4.0 [J].
Molinie, Dylan ;
Madani, Kurosh ;
Amarger, Veronique .
ADVANCES IN COMPUTATIONAL INTELLIGENCE, IWANN 2023, PT II, 2023, 14135 :435-450
[37]   SingletSeeker: an unsupervised clustering approach for automated singlet discrimination in cytometry [J].
Colasurdo, Mark ;
Ferrer-Font, Laura ;
Middlebrook, Aaron ;
Konecny, Andrew J. ;
Prlic, Martin ;
Spidlen, Josef .
CYTOMETRY PART B-CLINICAL CYTOMETRY, 2024,
[38]   Unsupervised Clustering and Active Learning of Hyperspectral Images With Nonlinear Diffusion [J].
Murphy, James M. ;
Maggioni, Mauro .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2019, 57 (03) :1829-1845
[39]   Geological Domaining with Unsupervised Clustering and Ensemble Support Vector Classification [J].
Kasimcan Koruk ;
Julian M. Ortiz .
Mining, Metallurgy & Exploration, 2023, 40 :2537-2549
[40]   Automatic Inspection for Wafer Defect Pattern Recognition with Unsupervised Clustering [J].
Li, Katherine Shu-Min ;
Chen, Leon Li-Yang ;
Cheng, Ken Chau-Cheung ;
Liao, Peter Yi-Yu ;
Wang, Sying-Jyan ;
Huang, Andrew Yi-An ;
Tsai, Nova ;
Chou, Leon ;
Han, Gus Chang-Hung ;
Chen, Jwu E. ;
Liang, Hsing-Chung ;
Hsu, Chun-Lung .
2021 IEEE EUROPEAN TEST SYMPOSIUM (ETS 2021), 2021,