Clustering-Based Predictive Analytics to Improve Scientific Data Discovery

被引:1
作者
Devarakonda, Ranjeet [1 ]
Kumar, Jitendra [1 ]
Prakash, Giri [1 ]
机构
[1] Oak Ridge Natl Lab, Environm Sci Div, Oak Ridge, TN 37830 USA
来源
2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) | 2020年
关键词
clustering; content-based filtering; collaborative filtering; data recommended system; data discovery;
D O I
10.1109/BigData50022.2020.9377797
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Given the sheer volume of scientific data archived within the data-intensive projects at the US Department of Energy's Oak Ridge National Laboratory, finding precisely what data we are looking for may not be a trivial task; conversely, we may also miss a more prominent data product. To address such issues, we propose improving the data discovery system and using data analytics methods to comprehend what specific users might be interested in based on their physiological state, search patterns, and past data usage history. This work's primary goal is to prune the complexity, increase the visibility of popular data products, and direct users toward the data that best meet their needs. The proposed algorithm constructs a user profile based on the user's explicit or implicit interactions with the system, such as items they are currently looking at on-site and the key metadata mappings related to the data set. The pattern is then used to build a training data set, which will help find relevant data to recommend to the user.
引用
收藏
页码:5658 / 5661
页数:4
相关论文
共 50 条
[31]   An Interactive Clustering-Based Visualization Tool for Air Quality Data Analysis [J].
Ashouri, Mahsa ;
Phoa, Frederick Kin Hing ;
Chen, Chun-Houh ;
Shmueli, Galit .
AEROSOL AND AIR QUALITY RESEARCH, 2023, 23 (12)
[32]   Efficient clustering-based data aggregation techniques for wireless sensor networks [J].
Woo-Sung Jung ;
Keun-Woo Lim ;
Young-Bae Ko ;
Sang-Joon Park .
Wireless Networks, 2011, 17 :1387-1400
[33]   Dynamic Clustering-Based Estimation of Missing Values in Mixed Type Data [J].
Ayuyev, Vadim V. ;
Jupin, Joseph ;
Harris, Philip W. ;
Obradovic, Zoran .
DATA WAREHOUSING AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2009, 5691 :366-+
[34]   Clustering-Based Data Gathering in Wireless Sensor Network with Mobile Collector [J].
Liu, Wenjun ;
Fan, Jianxi ;
Zhang, Shuikui ;
Wang, Yan ;
Wang, Xi .
INDUSTRIAL INSTRUMENTATION AND CONTROL SYSTEMS II, PTS 1-3, 2013, 336-338 :261-264
[35]   Fast clustering-based anonymization approaches with time constraints for data streams [J].
Guo, Kun ;
Zhang, Qishan .
KNOWLEDGE-BASED SYSTEMS, 2013, 46 :95-108
[36]   CDNM: Clustering-Based Data Normalization Method For Automated Vulnerability Detection [J].
Wu, Tongshuai ;
Chen, Liwei ;
Du, Gewangzi ;
Zhu, Chenguang ;
Cui, Ningning ;
Shi, Gang .
COMPUTER JOURNAL, 2024, 67 (04) :1538-1549
[37]   Clustering-based Binary-class Classification for Imbalanced Data Sets [J].
Chen, Chao ;
Shyu, Mei-Ling .
2011 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IRI), 2011, :384-389
[38]   Clustering-based KPI Data Association Analysis Method in Cellular Networks [J].
Guo, Xingyu ;
Yu, Peng ;
Li, Wenjing ;
Qiu, Xuesong .
NOMS 2016 - 2016 IEEE/IFIP NETWORK OPERATIONS AND MANAGEMENT SYMPOSIUM, 2016, :1101-1104
[39]   Understanding time use via data mining: A clustering-based framework [J].
Rosales-Salas, Jorge ;
Maldonado, Sebastian ;
Seret, Alex .
INTELLIGENT DATA ANALYSIS, 2018, 22 (03) :597-616
[40]   An improved clustering-based collaborative filtering recommendation algorithm [J].
Liu Xiaojun .
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2017, 20 (02) :1281-1288