Regularization Methods for High-Dimensional Data as a Tool for Seafood Traceability

被引:0
作者
Yokochi, Clara [1 ]
Bispo, Regina [1 ,2 ]
Ricardo, Fernando [3 ]
Calado, Ricardo [3 ]
机构
[1] Univ NOVA Lisboa, NOVA Sch Sci & Technol, NOVAMath Ctr Math & Applicat, Caparica, Portugal
[2] Univ NOVA Lisboa, NOVA Sch Sci & Technol, Dept Math, Caparica, Portugal
[3] Univ Aveiro, ECOMARE, Dept Biol, CESAM Ctr Environm & Marine Studies, Santiago Univ Campus, Aveiro, Portugal
关键词
Elastic net; LASSO; Regularization; Ridge regression; Traceability; REGRESSION; SELECTION; MODELS; PATHS;
D O I
10.1007/s42519-023-00341-8
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Seafood traceability, needed to regulate food safety, control fisheries, combat fraud, and prevent jeopardizing public health from harvesting in polluted locations, depends heavily on the prediction of the geographic origin of seafood. When the available datasets to study traceability are high-dimensional, standard classic statistical models fail. Under these circumstances, proper alternative methods are needed to predict accurately the geographic origin of seafood. In this study, we propose an analytical approach combining the use of regularization methods and resampling techniques to overcome the high-dimensionality problem. In particular, we analyze comparatively the Ridge regression, LASSO and Elastic net penalty-based approaches. These methods were applied to predict the origin of the saltwater clam Ruditapes philippinarum, a non-indigenous and commercially very relevant marine bivalve species that occurs commonly in European estuaries. Further, the resampling method of Monte Carlo Cross-Validation was implemented to overcome challenges related to the small sample size. The results of the three methods were compared. For fully reproducibility, an R Markdown file and the used dataset are provided. We conclude highlighting the insights that this methodology may bring to model a multi-categorical response based on high-dimensional dataset, with highly correlated explanatory variables, and combat the mislabeling of geographic origin of seafood.
引用
收藏
页数:21
相关论文
共 30 条
  • [1] A sparse version of the ridge logistic regression for large-scale text categorization
    Aseervatham, Sujeevan
    Antoniadis, Anestis
    Gaussier, Eric
    Burlet, Michel
    Denneulin, Yves
    [J]. PATTERN RECOGNITION LETTERS, 2011, 32 (02) : 101 - 106
  • [2] Transparency in food supply chains: A review of enabling technology solutions
    Astill, Jake
    Dara, Rozita A.
    Campbell, Malcolm
    Farber, Jeffrey M.
    Fraser, Evan D. G.
    Sharif, Shayan
    Yada, Rickey Y.
    [J]. TRENDS IN FOOD SCIENCE & TECHNOLOGY, 2019, 91 (240-247) : 240 - 247
  • [3] Trace elemental fingerprinting of shells and soft tissues can identify the time of blue mussel (Mytilus edulis) harvesting
    Bennion, Matthew
    Morrison, Liam
    Shelley, Roseanne
    Graham, Conor
    [J]. FOOD CONTROL, 2021, 121
  • [4] Trace element fingerprinting of blue mussel (Mytilus edulis) shells and soft tissues successfully reveals harvesting locations
    Bennion, Matthew
    Morrison, Liam
    Brophy, Deirdre
    Carlsson, Jens
    Abrahantes, Jose Cortinas
    Grahama, Conor T.
    [J]. SCIENCE OF THE TOTAL ENVIRONMENT, 2019, 685 : 50 - 58
  • [5] DeWitt Peter, 2019, CRAN
  • [6] FAO, 2021, 14.4.1 Fish stocks sustainability | Sustainable Development Goals | Food and Agriculture Organization of the United Nations
  • [7] Regularization Paths for Generalized Linear Models via Coordinate Descent
    Friedman, Jerome
    Hastie, Trevor
    Tibshirani, Rob
    [J]. JOURNAL OF STATISTICAL SOFTWARE, 2010, 33 (01): : 1 - 22
  • [8] PRROC: computing and visualizing precision-recall and receiver operating characteristic curves in R
    Grau, Jan
    Grosse, Ivo
    Keilwagen, Jens
    [J]. BIOINFORMATICS, 2015, 31 (15) : 2595 - 2597
  • [9] Hastie T., 2016, An Introduction to glmnet
  • [10] RIDGE REGRESSION - BIASED ESTIMATION FOR NONORTHOGONAL PROBLEMS
    HOERL, AE
    KENNARD, RW
    [J]. TECHNOMETRICS, 1970, 12 (01) : 55 - &