Nested Dolls: Towards Unsupervised Clustering of Web Tables

被引:0
作者
Khan, Rituparna [1 ]
Gubanov, Michael [1 ]
机构
[1] Florida State Univ, Dept Comp Sci, Tallahassee, FL 32306 USA
来源
2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) | 2018年
基金
美国国家科学基金会;
关键词
Web-search; Large-scale Data Management; Big Data; Data Fusion; Data Integration; Data Cleaning; Summarization; Human-Computer Interaction; Machine Learning; Natural Language Processing (NLP);
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Here we discuss our initial efforts towards unsupervised clustering of a large-scale Web tables dataset. We improve our previous approach of weakly-supervised clustering, where an operator would provide a few descriptive keywords to generate an entity-identifying classifier, which is applied to the corpora to form a cohesive entity-centric cluster [1]. Here, we make a next step towards fully unsupervised algorithm by automatically generating these descriptive keywords. These keywords then can be used to generate high-precision training data and train a classifier to form a cluster. Here, we describe and evaluate this new unsupervised keyword generation algorithm and apply it to a large-scale Web tables corpus to form initial small high-precision clusters.
引用
收藏
页码:5357 / 5359
页数:3
相关论文
共 50 条
[41]   Geological Domaining with Unsupervised Clustering and Ensemble Support Vector Classification [J].
Koruk, Kasimcan ;
Ortiz, Julian M. .
MINING METALLURGY & EXPLORATION, 2023, 40 (6) :2537-2549
[42]   Automatic Inspection for Wafer Defect Pattern Recognition with Unsupervised Clustering [J].
Li, Katherine Shu-Min ;
Chen, Leon Li-Yang ;
Cheng, Ken Chau-Cheung ;
Liao, Peter Yi-Yu ;
Wang, Sying-Jyan ;
Huang, Andrew Yi-An ;
Tsai, Nova ;
Chou, Leon ;
Han, Gus Chang-Hung ;
Chen, Jwu E. ;
Liang, Hsing-Chung ;
Hsu, Chun-Lung .
2021 IEEE EUROPEAN TEST SYMPOSIUM (ETS 2021), 2021,
[43]   Unsupervised Clustering on PMU Data for Event Characterization on Smart Grid [J].
Klinginsmith, Eric ;
Barella, Richard ;
Zhao, Xinghui ;
Wallace, Scott .
PROCEEDINGS OF THE 2016 5TH INTERNATIONAL CONFERENCE ON SMART CITIES AND GREEN ICT SYSTEMS (SMARTGREENS 2016), 2016, :233-240
[44]   Recommender System Based on Unsupervised Clustering and Supervised Deep Learning [J].
Sahni, Dheeraj Kumar ;
Khurana, Dhiraj ;
Kumar, Yogesh .
INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2024, 33 (08)
[45]   Unsupervised Clustering of Depressed Individuals Based on Hippocampal Substructure Volumes [J].
Westlin, Christiana ;
Whitfield-Gabrieli, Susan ;
Barrett, Lisa Feldman .
BIOLOGICAL PSYCHIATRY, 2020, 87 (09) :S459-S460
[46]   An Approach for Clustering of Seismic Events using Unsupervised Machine Learning [J].
Karmenova, Markhaba ;
Tlebaldinova, Aizhan ;
Krak, Iurii ;
Denissova, Natalya ;
Popova, Galina ;
Zhantassova, Zheniskul ;
Ponkina, Elena ;
Gyorok, Gyorgy .
ACTA POLYTECHNICA HUNGARICA, 2022, 19 (05) :7-22
[47]   TVS Based Technique for Efficient Web Document Clustering in Web Search [J].
Rajasekaran, R. Thalapathi ;
Ramesh, R. ;
Menaka, R. ;
Vanisri, A. .
BIOSCIENCE BIOTECHNOLOGY RESEARCH COMMUNICATIONS, 2020, 13 (03) :68-73
[48]   Analysis of Unsupervised Machine Learning Techniques for an Efficient Customer Segmentation using Clustering Ensemble and Spectral Clustering [J].
Hicham, Nouri ;
Karim, Sabri .
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (10) :122-130
[49]   Structured digital tables on the Semantic Web: toward a structured digital literature [J].
Cheung, Kei-Hoi ;
Samwald, Matthias ;
Auerbach, Raymond K. ;
Gerstein, Mark B. .
MOLECULAR SYSTEMS BIOLOGY, 2010, 6
[50]   Web search result refinement by document clustering [J].
Tsui, Ming Hei ;
Lim, Bresley ;
Shi, Daming .
2007 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS, VOLS 1-8, 2007, :2224-2229