Appropriate Data Quality Checks Improve the Reliability of Values Predicted from Milk Mid-Infrared Spectra

The use of abnormal milk mid-infrared (MIR) spectrum strongly affects prediction quality, even if the prediction equations used are accurate. So, this record must be detected after or before the prediction process to avoid erroneous spectral extrapolation or the use of poor-quality spectral data by...

Full description

Bibliographic Details
Main Authors: Lei Zhang, Chunfang Li, Frédéric Dehareng, Clément Grelet, Frédéric Colinet, Nicolas Gengler, Yves Brostaux, Hélène Soyeurt
Format: Article
Language:English
Published: MDPI AG 2021-02-01
Series:Animals
Subjects:
Online Access:https://www.mdpi.com/2076-2615/11/2/533
_version_ 1797396032417431552
author Lei Zhang
Chunfang Li
Frédéric Dehareng
Clément Grelet
Frédéric Colinet
Nicolas Gengler
Yves Brostaux
Hélène Soyeurt
author_facet Lei Zhang
Chunfang Li
Frédéric Dehareng
Clément Grelet
Frédéric Colinet
Nicolas Gengler
Yves Brostaux
Hélène Soyeurt
author_sort Lei Zhang
collection DOAJ
description The use of abnormal milk mid-infrared (MIR) spectrum strongly affects prediction quality, even if the prediction equations used are accurate. So, this record must be detected after or before the prediction process to avoid erroneous spectral extrapolation or the use of poor-quality spectral data by dairy herd improvement (DHI) organizations. For financial or practical reasons, adapting the quality protocol currently used to improve the accuracy of fat and protein contents is unfeasible. This study proposed three different statistical methods that would be easy to implement by DHI organizations to solve this issue: the deletion of 1% of the extreme high and low predictive values (M1), the deletion of records based on the Global-H (GH) distance (M2), and the deletion of records based on the absolute fat residual value (M3). Additionally, the combinations of these three methods were investigated. A total of 346,818 milk samples were analyzed by MIR spectrometry to predict the contents of fat, protein, and fatty acids. Then, the same traits were also predicted externally using their corresponded standardized MIR spectra. The interest in cleaning procedures was assessed by estimating the root mean square differences (RMSDs) between those internal and external predicted phenotypes. All methods allowed for a decrease in the RMSD, with a gain ranging from 0.32% to 41.39%. Based on the obtained results, the “M1 and M2” combination should be preferred to be more parsimonious in the data loss, as it had the higher ratio of RMSD gain to data loss. This method deleted the records based on the 2% extreme predictions and a GH threshold set at 5. However, to ensure the lowest RMSD, the “M2 or M3” combination, considering a GH threshold of 5 and an absolute fat residual difference set at 0.30 g/dL of milk, was the most relevant. Both combinations involved M2 confirming the high interest of calculating the GH distance for all samples to predict. However, if it is impossible to estimate the GH distance due to a lack of relevant information to compute this statistical parameter, the obtained results recommended the use of M1 combined with M3. The limitation used in M3 must be adapted by the DHI, as this will depend on the spectral data and the equation used. The methodology proposed in this study can be generalized for other MIR-based phenotypes.
first_indexed 2024-03-09T00:45:27Z
format Article
id doaj.art-2b07751b24b64f1babb3bee97de4518e
institution Directory Open Access Journal
issn 2076-2615
language English
last_indexed 2024-03-09T00:45:27Z
publishDate 2021-02-01
publisher MDPI AG
record_format Article
series Animals
spelling doaj.art-2b07751b24b64f1babb3bee97de4518e2023-12-11T17:33:09ZengMDPI AGAnimals2076-26152021-02-0111253310.3390/ani11020533Appropriate Data Quality Checks Improve the Reliability of Values Predicted from Milk Mid-Infrared SpectraLei Zhang0Chunfang Li1Frédéric Dehareng2Clément Grelet3Frédéric Colinet4Nicolas Gengler5Yves Brostaux6Hélène Soyeurt7TERRA Teaching and Research Centre, University of Liège—Gembloux Agro-Bio Tech, 5030 Gembloux, BelgiumHebei Livestock Breeding Station, Shijiazhuang 050000, ChinaValorisation of Agricultural Products Department, Walloon Agricultural Research Centre, 5030 Gembloux, BelgiumValorisation of Agricultural Products Department, Walloon Agricultural Research Centre, 5030 Gembloux, BelgiumTERRA Teaching and Research Centre, University of Liège—Gembloux Agro-Bio Tech, 5030 Gembloux, BelgiumTERRA Teaching and Research Centre, University of Liège—Gembloux Agro-Bio Tech, 5030 Gembloux, BelgiumTERRA Teaching and Research Centre, University of Liège—Gembloux Agro-Bio Tech, 5030 Gembloux, BelgiumTERRA Teaching and Research Centre, University of Liège—Gembloux Agro-Bio Tech, 5030 Gembloux, BelgiumThe use of abnormal milk mid-infrared (MIR) spectrum strongly affects prediction quality, even if the prediction equations used are accurate. So, this record must be detected after or before the prediction process to avoid erroneous spectral extrapolation or the use of poor-quality spectral data by dairy herd improvement (DHI) organizations. For financial or practical reasons, adapting the quality protocol currently used to improve the accuracy of fat and protein contents is unfeasible. This study proposed three different statistical methods that would be easy to implement by DHI organizations to solve this issue: the deletion of 1% of the extreme high and low predictive values (M1), the deletion of records based on the Global-H (GH) distance (M2), and the deletion of records based on the absolute fat residual value (M3). Additionally, the combinations of these three methods were investigated. A total of 346,818 milk samples were analyzed by MIR spectrometry to predict the contents of fat, protein, and fatty acids. Then, the same traits were also predicted externally using their corresponded standardized MIR spectra. The interest in cleaning procedures was assessed by estimating the root mean square differences (RMSDs) between those internal and external predicted phenotypes. All methods allowed for a decrease in the RMSD, with a gain ranging from 0.32% to 41.39%. Based on the obtained results, the “M1 and M2” combination should be preferred to be more parsimonious in the data loss, as it had the higher ratio of RMSD gain to data loss. This method deleted the records based on the 2% extreme predictions and a GH threshold set at 5. However, to ensure the lowest RMSD, the “M2 or M3” combination, considering a GH threshold of 5 and an absolute fat residual difference set at 0.30 g/dL of milk, was the most relevant. Both combinations involved M2 confirming the high interest of calculating the GH distance for all samples to predict. However, if it is impossible to estimate the GH distance due to a lack of relevant information to compute this statistical parameter, the obtained results recommended the use of M1 combined with M3. The limitation used in M3 must be adapted by the DHI, as this will depend on the spectral data and the equation used. The methodology proposed in this study can be generalized for other MIR-based phenotypes.https://www.mdpi.com/2076-2615/11/2/533milk-component predictionmid-infrared spectrumMahalanobis distancequality-assurance systemHolstein cow
spellingShingle Lei Zhang
Chunfang Li
Frédéric Dehareng
Clément Grelet
Frédéric Colinet
Nicolas Gengler
Yves Brostaux
Hélène Soyeurt
Appropriate Data Quality Checks Improve the Reliability of Values Predicted from Milk Mid-Infrared Spectra
Animals
milk-component prediction
mid-infrared spectrum
Mahalanobis distance
quality-assurance system
Holstein cow
title Appropriate Data Quality Checks Improve the Reliability of Values Predicted from Milk Mid-Infrared Spectra
title_full Appropriate Data Quality Checks Improve the Reliability of Values Predicted from Milk Mid-Infrared Spectra
title_fullStr Appropriate Data Quality Checks Improve the Reliability of Values Predicted from Milk Mid-Infrared Spectra
title_full_unstemmed Appropriate Data Quality Checks Improve the Reliability of Values Predicted from Milk Mid-Infrared Spectra
title_short Appropriate Data Quality Checks Improve the Reliability of Values Predicted from Milk Mid-Infrared Spectra
title_sort appropriate data quality checks improve the reliability of values predicted from milk mid infrared spectra
topic milk-component prediction
mid-infrared spectrum
Mahalanobis distance
quality-assurance system
Holstein cow
url https://www.mdpi.com/2076-2615/11/2/533
work_keys_str_mv AT leizhang appropriatedataqualitychecksimprovethereliabilityofvaluespredictedfrommilkmidinfraredspectra
AT chunfangli appropriatedataqualitychecksimprovethereliabilityofvaluespredictedfrommilkmidinfraredspectra
AT fredericdehareng appropriatedataqualitychecksimprovethereliabilityofvaluespredictedfrommilkmidinfraredspectra
AT clementgrelet appropriatedataqualitychecksimprovethereliabilityofvaluespredictedfrommilkmidinfraredspectra
AT fredericcolinet appropriatedataqualitychecksimprovethereliabilityofvaluespredictedfrommilkmidinfraredspectra
AT nicolasgengler appropriatedataqualitychecksimprovethereliabilityofvaluespredictedfrommilkmidinfraredspectra
AT yvesbrostaux appropriatedataqualitychecksimprovethereliabilityofvaluespredictedfrommilkmidinfraredspectra
AT helenesoyeurt appropriatedataqualitychecksimprovethereliabilityofvaluespredictedfrommilkmidinfraredspectra