Data pre-processing to improve the mining of large feed databases

被引:1
作者
Maroto-Molina, F. [1 ]
Gomez-Cabrera, A. [2 ]
Guerrero-Ginel, J. E. [2 ]
Garrido-Varo, A. [2 ]
Sauvant, D. [3 ]
Tran, G. [4 ]
Heuze, V. [4 ]
Perez-Marin, D. C. [2 ]
机构
[1] Univ Cordoba, Serv Informac Alimentos, Cordoba 14014, Spain
[2] Univ Cordoba, Dept Anim Prod, ETS Ingn Agron & Montes, Cordoba 14014, Spain
[3] AgroParisTech, UMR Physiol Nutr & Alimentat 791, F-75231 Paris 05, France
[4] AgroParisTech, Assoc Francaise Zootechnie, F-75231 Paris 05, France
关键词
chemical composition; nutritive value; data integration; outlier mining; QUALITY;
D O I
10.1017/S1751731113000293
中图分类号
S8 [畜牧、 动物医学、狩猎、蚕、蜂];
学科分类号
0905 ;
摘要
The information stored in animal feed databases is highly variable, in terms of both provenance and quality; therefore, data pre-processing is essential to ensure reliable results. Yet, pre-processing at best tends to be unsystematic; at worst, it may even be wholly ignored. This paper sought to develop a systematic approach to the various stages involved in pre-processing to improve feed database outputs. The database used contained analytical and nutritional data on roughly 20 000 alfalfa samples. A range of techniques were examined for integrating data from different sources, for detecting duplicates and, particularly, for detecting outliers. Special attention was paid to the comparison of univariate and multivariate solutions. Major issues relating to the heterogeneous nature of data contained in this database were explored, the observed outliers were characterized and ad hoc routines were designed for error control. Finally, a heuristic diagram was designed to systematize the various aspects involved in the detection and management of outliers and errors.
引用
收藏
页码:1128 / 1136
页数:9
相关论文
共 34 条
  • [1] ABREU JM, 2000, INTAKE NUTR VALUE ME
  • [2] ASYMPTOTIC THEORY OF CERTAIN GOODNESS OF FIT CRITERIA BASED ON STOCHASTIC PROCESSES
    ANDERSON, TW
    DARLING, DA
    [J]. ANNALS OF MATHEMATICAL STATISTICS, 1952, 23 (02): : 193 - 212
  • [3] [Anonymous], 1980, IDENTIFICATION OUTLI, DOI DOI 10.1007/978-94-015-3994-4
  • [4] [Anonymous], 2011, Pei. data mining concepts and techniques
  • [5] Breunig M. M., 2000, LOF IDENTIFYING DENS
  • [6] Chauvenet W., 1960, A Manual of Spherical and Practical Astronomy V. II. 1863. Reprint of 1891, V5th
  • [7] Gizzi G., 2004, Variability in feed composition and its impact on animal production
  • [8] Can lignin be accurately measured?
    Hatfield, R
    Fukushima, RS
    [J]. CROP SCIENCE, 2005, 45 (03) : 832 - 839
  • [9] Mining class outliers: concepts, algorithms and applications in CRM
    He, ZY
    Xu, XF
    Huang, JZX
    Deng, SC
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2004, 27 (04) : 681 - 697
  • [10] Real-world data is dirty: Data cleansing and the merge/purge problem
    Hernandez, MA
    Stolfo, SJ
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (01) : 9 - 37