Big data in multi-block data analysis: An approach to parallelizing Partial Least Squares Mode B algorithm

被引:2
|
作者
Martinez-Ruiz, Alba [1 ]
Montanola-Sales, Cristina [2 ,3 ]
机构
[1] Univ Catolica Santisima Concepcion, Alonso Ribera 2850, Concepcion, Chile
[2] URL, IQS, Via Augusta,390, Barcelona 08017, Spain
[3] CNS, BSC, Jordi Girona 29, Barcelona 08034, Spain
关键词
Computer science; Computational mathematics; VARIABLES;
D O I
10.1016/j.heliyon.2019.e01451
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Partial Least Squares (PLS) Mode B is a multi-block method and a tightly coupled algorithm for estimating structural equation models (SEMs). Describing key aspects of parallel computing, we approach the parallelization of the PLS Mode B algorithm to operate on large distributed data. We show the scalability and performance of the algorithm at a very fine-grained level thanks to the versatility of pbdR, a R-project library for parallel computing. We vary several factors under different data distribution schemes in a supercomputing environment. Shorter elapsed times are obtained for the square-blocking factor 16 x 16 using a grid of processors as square as possible and non-square blocking factors 1000 x 4 and 10000 x 4 using an one-column grid of processors. Depending on the configuration, distributing data in a larger number of cores allows reaching speedups of up to 121 over the CPU implementation. Moreover, we show that SEMs can be estimated with big data sets using current state-of-the-art algorithms for multi-block data analysis.
引用
收藏
页数:29
相关论文
共 50 条
  • [31] Understanding cold maceration in red winemaking: A batch processing and multi-block data analysis approach
    Aleixandre-Tudo, Jose Luis
    du Toit, Wessel
    LWT-FOOD SCIENCE AND TECHNOLOGY, 2019, 111 : 147 - 157
  • [32] Multi-way partial least squares modeling of water quality data
    Singh, Kunwar P.
    Malik, Amrita
    Basant, Nikita
    Saxena, Puneet
    ANALYTICA CHIMICA ACTA, 2007, 584 (02) : 385 - 396
  • [33] Integrating Partial Least Squares Correlation and Correspondence Analysis for Nominal Data
    Beaton, Derek
    Filbey, Francesca
    Abdi, Herve
    NEW PERSPECTIVES IN PARTIAL LEAST SQUARES AND RELATED METHODS, 2013, 56 : 81 - 94
  • [34] Use of correspondence analysis partial least squares on linear and unimodal data
    Frisvad, JC
    Norsker, M
    JOURNAL OF CHEMOMETRICS, 1996, 10 (5-6) : 677 - 685
  • [35] Use of correspondence analysis partial least squares on linear and unimodal data
    Department of Biotechnology, Building 221, Technical University of Denmark, DK-2800 Lyngby, Denmark
    J. Chemometr., 5-6 (677-685):
  • [36] Partial Least Squares Regression Analysis: Example of Motor Fitness Data
    Serbetar, Ivan
    CROATIAN JOURNAL OF EDUCATION-HRVATSKI CASOPIS ZA ODGOJ I OBRAZOVANJE, 2012, 14 (04): : 917 - 932
  • [37] Data Analysis of Roadway Attributes through Partial Least Squares Regression
    Li, Weiguo
    Zhang, Hanjie
    Du, Xiaoping
    Qian, Kun
    Li, Cuiying
    2010 2ND IEEE INTERNATIONAL CONFERENCE ON INFORMATION AND FINANCIAL ENGINEERING (ICIFE), 2010, : 466 - 468
  • [38] Application of partial least squares regression in data analysis of mining subsidence
    FENG Zun-de~(1
    2. Xuzhou Normal University
    TransactionsofNonferrousMetalsSocietyofChina, 2005, (S1) : 156 - 158
  • [39] Multi-block PLS discriminant analysis for the joint analysis of metabolomic and epidemiological data
    Marion Brandolini-Bunlon
    Mélanie Pétéra
    Pierrette Gaudreau
    Blandine Comte
    Stéphanie Bougeard
    Estelle Pujos-Guillot
    Metabolomics, 2019, 15
  • [40] Multi-block PLS discriminant analysis for the joint analysis of metabolomic and epidemiological data
    Brandolini-Bunlon, Marion
    Petera, Melanie
    Gaudreau, Pierrette
    Comte, Blandine
    Bougeard, Stephanie
    Pujos-Guillot, Estelle
    METABOLOMICS, 2019, 15 (10)