Runtime Data Layout Scheduling for Machine Learning Dataset

被引:5
|
作者
You, Yang [1 ]
Demmel, James [1 ]
机构
[1] Univ Calif Berkeley, Div Comp Sci, Berkeley, CA 94720 USA
来源
2017 46TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP) | 2017年
关键词
parallel auto-tuning; machine learning;
D O I
10.1109/ICPP.2017.54
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Machine Learning (ML) approaches are widely-used classification/regression methods for data mining applications. However, the time-consuming training process greatly limits the efficiency of ML approaches. We use the example of SVM (traditional ML algorithm) and DNN (state-of-the-art ML algorithm) to illustrate the idea in this paper. For SVM, a major performance bottleneck of current tools is that they use a unified data storage format because the data formats can have a significant influence on the complexity of storage and computation, memory bandwidth, and the efficiency of parallel processing. To address the problem above, we study the factors influencing the algorithm's performance and conduct auto-tuning to speed up SVM training. DNN training is even slower than SVM. For example, using a 8-core CPUs to train AlexNet model by CIFAR-10 dataset costs 8.2 hours. CIFAR-10 is only 170 MB, which is not efficient for distributed processing. Moreover, due to the algorithm limitation, only a small batch of data can be processed at each iteration. We focus on finding the right algorithmic parameters and using auto-tuning techniques to make the algorithm run faster. For SVM training, our implementation achieves 1.7-16.3x speedup (6.8x on average) against the non-adaptive case (using the worst data format) for various datasets. For DNN training on CIFAR-10 dataset, we reduce the time from 8.2 hours to only roughly 1 minute. We use the benchmark of dollars per speedup to help the users to select the right deep learning hardware.
引用
收藏
页码:452 / 461
页数:10
相关论文
共 50 条
  • [41] Machine learning for pyrimidine corrosion inhibitor small dataset
    Herowati, Wise
    Prabowo, Wahyu Aji Eko
    Akrom, Muhamad
    Setiyanto, Noor Ageng
    Kurniawan, Achmad Wahid
    Hidayat, Novianto Nur
    Sutojo, Totok
    Rustad, Supriadi
    THEORETICAL CHEMISTRY ACCOUNTS, 2024, 143 (08)
  • [42] Machine learning for Gravity Spy: Glitch classification and dataset
    Bahaadini, S.
    Noroozi, V.
    Rohani, N.
    Coughlin, S.
    Zevin, M.
    Smith, J. R.
    Kalogera, V.
    Katsaggelos, A.
    INFORMATION SCIENCES, 2018, 444 : 172 - 186
  • [43] Dataset for machine learning of microstructures for 9% Cr steels
    Rozman, Kyle A.
    Dogan, Omer N.
    Chinn, Richard
    Jablonksi, Paul D.
    Detrois, Martin
    Gao, Michael C.
    DATA IN BRIEF, 2022, 45
  • [44] HelmetML: A dataset of helmet images for machine learning applications
    Patil, Kailas
    Jadhav, Rohini
    Suryawanshi, Yogesh
    Chumchu, Prawit
    Khare, Gaurav
    Shinde, Tanishk
    DATA IN BRIEF, 2024, 56
  • [45] Quantifying Dataset Quality in Radio Frequency Machine Learning
    Clark, William H.
    Michaels, Alan J.
    2021 IEEE MILITARY COMMUNICATIONS CONFERENCE (MILCOM 2021), 2021,
  • [46] MQTTset, a New Dataset for Machine Learning Techniques on MQTT
    Vaccari, Ivan
    Chiola, Giovanni
    Aiello, Maurizio
    Mongelli, Maurizio
    Cambiaso, Enrico
    SENSORS, 2020, 20 (22) : 1 - 17
  • [47] Empirical Analysis on Cancer Dataset with Machine Learning Algorithms
    Vital, T. PanduRanga
    Krishna, M. Murali
    Narayana, G. V. L.
    Suneel, P.
    Ramarao, P.
    SOFT COMPUTING IN DATA ANALYTICS, SCDA 2018, 2019, 758 : 789 - 801
  • [48] A dataset of attributes from papers of a machine learning conference
    Vallejo-Huanga, Diego
    Morillo, Paulina
    Ferri, Cesar
    DATA IN BRIEF, 2019, 24
  • [49] Dry fruit image dataset for machine learning applications
    Meshram, Vishal
    Choudhary, Chetan
    Kale, Atharva
    Rajput, Jaideep
    Meshram, Vidula
    Dhumane, Amol
    DATA IN BRIEF, 2023, 49
  • [50] ModelSet: A labelled dataset of software models for machine learning
    Lopez, Jose Antonio Hernandez
    Izquierdo, Javier Luis Canovas
    Cuadrado, Jesus Sanchez
    SCIENCE OF COMPUTER PROGRAMMING, 2024, 231