Nested Dolls: Towards Unsupervised Clustering of Web Tables

被引:0
作者
Khan, Rituparna [1 ]
Gubanov, Michael [1 ]
机构
[1] Florida State Univ, Dept Comp Sci, Tallahassee, FL 32306 USA
来源
2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) | 2018年
基金
美国国家科学基金会;
关键词
Web-search; Large-scale Data Management; Big Data; Data Fusion; Data Integration; Data Cleaning; Summarization; Human-Computer Interaction; Machine Learning; Natural Language Processing (NLP);
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Here we discuss our initial efforts towards unsupervised clustering of a large-scale Web tables dataset. We improve our previous approach of weakly-supervised clustering, where an operator would provide a few descriptive keywords to generate an entity-identifying classifier, which is applied to the corpora to form a cohesive entity-centric cluster [1]. Here, we make a next step towards fully unsupervised algorithm by automatically generating these descriptive keywords. These keywords then can be used to generate high-precision training data and train a classifier to form a cluster. Here, we describe and evaluate this new unsupervised keyword generation algorithm and apply it to a large-scale Web tables corpus to form initial small high-precision clusters.
引用
收藏
页码:5357 / 5359
页数:3
相关论文
共 50 条
[21]   Regression on imperfect class labels derived by unsupervised clustering [J].
Brondum, Rasmus Froberg ;
Michaelsen, Thomas Yssing ;
Bogsted, Martin .
BRIEFINGS IN BIOINFORMATICS, 2021, 22 (02) :2012-2019
[22]   Emergent unsupervised clustering paradigms with potential application to bioinformatics [J].
Miller, David J. ;
Wang, Yue ;
Kesidis, George .
FRONTIERS IN BIOSCIENCE-LANDMARK, 2008, 13 :677-690
[23]   Hypercluster: a flexible tool for parallelized unsupervised clustering optimization [J].
Lili Blumenberg ;
Kelly V. Ruggles .
BMC Bioinformatics, 21
[24]   Blanket Clusterer: A Tool for Automating the Clustering in Unsupervised Learning [J].
Bogdanoski, Konstantin ;
Mishev, Kostadin ;
Trajanov, Dimitar .
DELTA: PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON DEEP LEARNING THEORY AND APPLICATIONS, 2022, :125-131
[25]   Glaucoma monitoring using manifold learning and unsupervised clustering [J].
Yousefi, Siamak ;
Elze, Tobias ;
Pasquale, Louis R. ;
Boland, Michael .
2018 INTERNATIONAL CONFERENCE ON IMAGE AND VISION COMPUTING NEW ZEALAND (IVCNZ), 2018,
[26]   Unsupervised clustering of Roman potsherds via Variational Autoencoders [J].
Parisotto, Simone ;
Leone, Ninetta ;
Schonlieb, Carola-Bibiane ;
Launaro, Alessandro .
JOURNAL OF ARCHAEOLOGICAL SCIENCE, 2022, 142
[27]   A Novel Unsupervised Spectral Clustering for Pure-Tone Audiograms towards Hearing Aid Filter Bank Design and Initial Configurations [J].
Elkhouly, Abeer ;
Andrew, Allan Melvin ;
Rahim, Hasliza A. ;
Abdulaziz, Nidhal ;
Abdulmalek, Mohamedfareq ;
Mohd Yasin, Mohd Najib ;
Jusoh, Muzammil ;
Sabapathy, Thennarasan ;
Siddique, Shafiquzzaman .
APPLIED SCIENCES-BASEL, 2022, 12 (01)
[28]   Towards the ubiquitous Web [J].
Hotho, Andreas ;
Stumme, Gerd .
SEMANTIC WEB, 2010, 1 (1-2) :117-119
[29]   Web Usage Classification and Clustering Approach for Web Search Personalization [J].
Vijayalakshmi, K. ;
Jena, Sudarson .
6TH INTERNATIONAL CONFERENCE ON COMPUTER & COMMUNICATION TECHNOLOGY (ICCCT-2015), 2015, :376-383
[30]   An Interactive Approach to Region of Interest Selection in Cytologic Analysis of Uveal Melanoma Based on Unsupervised Clustering [J].
Chen, Haomin ;
Liu, T. Y. Alvin ;
Correa, Zelia ;
Unberath, Mathias .
OPHTHALMIC MEDICAL IMAGE ANALYSIS, OMIA 2020, 2020, 12069 :114-124