Clustering-Based Predictive Analytics to Improve Scientific Data Discovery

被引:0
作者
Devarakonda, Ranjeet [1 ]
Kumar, Jitendra [1 ]
Prakash, Giri [1 ]
机构
[1] Oak Ridge Natl Lab, Environm Sci Div, Oak Ridge, TN 37830 USA
来源
2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) | 2020年
关键词
clustering; content-based filtering; collaborative filtering; data recommended system; data discovery;
D O I
10.1109/BigData50022.2020.9377797
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Given the sheer volume of scientific data archived within the data-intensive projects at the US Department of Energy's Oak Ridge National Laboratory, finding precisely what data we are looking for may not be a trivial task; conversely, we may also miss a more prominent data product. To address such issues, we propose improving the data discovery system and using data analytics methods to comprehend what specific users might be interested in based on their physiological state, search patterns, and past data usage history. This work's primary goal is to prune the complexity, increase the visibility of popular data products, and direct users toward the data that best meet their needs. The proposed algorithm constructs a user profile based on the user's explicit or implicit interactions with the system, such as items they are currently looking at on-site and the key metadata mappings related to the data set. The pattern is then used to build a training data set, which will help find relevant data to recommend to the user.
引用
收藏
页码:5658 / 5661
页数:4
相关论文
共 50 条
  • [21] CLUSTERING-BASED SUBSET ENSEMBLE LEARNING METHOD FOR IMBALANCED DATA
    Hu, Xiao-Sheng
    Zhang, Run-Jing
    PROCEEDINGS OF 2013 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOLS 1-4, 2013, : 35 - 39
  • [22] Detecting Data Accuracy Issues in Textual Geographical Data by a Clustering-based Approach
    Pellegrino, Maria Angela
    Postiglione, Luca
    Scarano, Vittorio
    CODS-COMAD 2021: PROCEEDINGS OF THE 3RD ACM INDIA JOINT INTERNATIONAL CONFERENCE ON DATA SCIENCE & MANAGEMENT OF DATA (8TH ACM IKDD CODS & 26TH COMAD), 2021, : 208 - 212
  • [23] A Clustering-based Recommendation System
    Wu, Shaofei
    PROCEEDINGS OF 2008 INTERNATIONAL PRE-OLYMPIC CONGRESS ON COMPUTER SCIENCE, VOL I: COMPUTER SCIENCE AND ENGINEERING, 2008, : 328 - 330
  • [24] Automated Clustering for Data Analytics
    Byrnes, Paul E.
    JOURNAL OF EMERGING TECHNOLOGIES IN ACCOUNTING, 2019, 16 (02) : 43 - 58
  • [25] Clustering-Based Federated Learning for Enhancing Data Privacy in Internet of Vehicles
    Jin, Zilong
    Wang, Jin
    Zhang, Lejun
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2024, 18 (06): : 1462 - 1477
  • [26] Clustering-based Safety Grouping Strategy for Bipartite Graph Data Publishing
    Luo, Yongcheng
    Le, Jiajin
    Jiang, Yaqian
    Chen, Dehua
    INFORMATION-AN INTERNATIONAL INTERDISCIPLINARY JOURNAL, 2012, 15 (12A): : 5387 - 5394
  • [27] Unsupervised Clustering-Based Analysis of the Measured Eye-Tracking Data
    Ivanova, Lenka
    Laco, Miroslav
    Benesova, Wanda
    FOURTEENTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2021), 2022, 12084
  • [28] Efficient clustering-based data aggregation techniques for wireless sensor networks
    Jung, Woo-Sung
    Lim, Keun-Woo
    Ko, Young-Bae
    Park, Sang-Joon
    WIRELESS NETWORKS, 2011, 17 (05) : 1387 - 1400
  • [29] ClubCF: A Clustering-Based Collaborative Filtering Approach for Big Data Application
    Hu, Rong
    Dou, Wanchun
    Liu, Jianxun
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2014, 2 (03) : 302 - 313
  • [30] Dynamic Clustering-Based Estimation of Missing Values in Mixed Type Data
    Ayuyev, Vadim V.
    Jupin, Joseph
    Harris, Philip W.
    Obradovic, Zoran
    DATA WAREHOUSING AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2009, 5691 : 366 - +