Machine learning evaluation for identification of M-proteins in human serum.

Serum electrophoresis (SPEP) is a method used to analyze the distribution of the most important proteins in the blood. The major clinical question is the presence of monoclonal fraction(s) of antibodies (M-protein/paraprotein), which is essential for the diagnosis and follow-up of hematological dise...

Full description

Bibliographic Details
Main Authors: Alexandros Sopasakis, Maria Nilsson, Mattias Askenmo, Fredrik Nyholm, Lillemor Mattsson Hultén, Victoria Rotter Sopasakis
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2024-01-01
Series:PLoS ONE
Online Access:https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0299600&type=printable
_version_ 1797216988394684416
author Alexandros Sopasakis
Maria Nilsson
Mattias Askenmo
Fredrik Nyholm
Lillemor Mattsson Hultén
Victoria Rotter Sopasakis
author_facet Alexandros Sopasakis
Maria Nilsson
Mattias Askenmo
Fredrik Nyholm
Lillemor Mattsson Hultén
Victoria Rotter Sopasakis
author_sort Alexandros Sopasakis
collection DOAJ
description Serum electrophoresis (SPEP) is a method used to analyze the distribution of the most important proteins in the blood. The major clinical question is the presence of monoclonal fraction(s) of antibodies (M-protein/paraprotein), which is essential for the diagnosis and follow-up of hematological diseases, such as multiple myeloma. Recent studies have shown that machine learning can be used to assess protein electrophoresis by, for example, examining protein glycan patterns to follow up tumor surgery. In this study we compared 26 different decision tree algorithms to identify the presence of M-proteins in human serum by using numerical data from serum protein capillary electrophoresis. For the automated detection and clustering of data, we used an anonymized data set consisting of 67,073 samples. We found five methods with superior ability to detect M-proteins: Extra Trees (ET), Random Forest (RF), Histogram Grading Boosting Regressor (HGBR), Light Gradient Boosting Method (LGBM), and Extreme Gradient Boosting (XGB). Additionally, we implemented a game theoretic approach to disclose which features in the data set that were indicative of the resulting M-protein diagnosis. The results verified the gamma globulin fraction and part of the beta globulin fraction as the most important features of the electrophoresis analysis, thereby further strengthening the reliability of our approach. Finally, we tested the algorithms for classifying the M-protein isotypes, where ET and XGB showed the best performance out of the five algorithms tested. Our results show that serum capillary electrophoresis combined with decision tree algorithms have great potential in the application of rapid and accurate identification of M-proteins. Moreover, these methods would be applicable for a variety of blood analyses, such as hemoglobinopathies, indicating a wide-range diagnostic use. However, for M-protein isotype classification, combining machine learning solutions for numerical data from capillary electrophoresis with gel electrophoresis image data would be most advantageous.
first_indexed 2024-04-24T11:54:42Z
format Article
id doaj.art-a970f38f35b1497ba275787fc5abfa6e
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2024-04-24T11:54:42Z
publishDate 2024-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-a970f38f35b1497ba275787fc5abfa6e2024-04-09T05:31:53ZengPublic Library of Science (PLoS)PLoS ONE1932-62032024-01-01194e029960010.1371/journal.pone.0299600Machine learning evaluation for identification of M-proteins in human serum.Alexandros SopasakisMaria NilssonMattias AskenmoFredrik NyholmLillemor Mattsson HulténVictoria Rotter SopasakisSerum electrophoresis (SPEP) is a method used to analyze the distribution of the most important proteins in the blood. The major clinical question is the presence of monoclonal fraction(s) of antibodies (M-protein/paraprotein), which is essential for the diagnosis and follow-up of hematological diseases, such as multiple myeloma. Recent studies have shown that machine learning can be used to assess protein electrophoresis by, for example, examining protein glycan patterns to follow up tumor surgery. In this study we compared 26 different decision tree algorithms to identify the presence of M-proteins in human serum by using numerical data from serum protein capillary electrophoresis. For the automated detection and clustering of data, we used an anonymized data set consisting of 67,073 samples. We found five methods with superior ability to detect M-proteins: Extra Trees (ET), Random Forest (RF), Histogram Grading Boosting Regressor (HGBR), Light Gradient Boosting Method (LGBM), and Extreme Gradient Boosting (XGB). Additionally, we implemented a game theoretic approach to disclose which features in the data set that were indicative of the resulting M-protein diagnosis. The results verified the gamma globulin fraction and part of the beta globulin fraction as the most important features of the electrophoresis analysis, thereby further strengthening the reliability of our approach. Finally, we tested the algorithms for classifying the M-protein isotypes, where ET and XGB showed the best performance out of the five algorithms tested. Our results show that serum capillary electrophoresis combined with decision tree algorithms have great potential in the application of rapid and accurate identification of M-proteins. Moreover, these methods would be applicable for a variety of blood analyses, such as hemoglobinopathies, indicating a wide-range diagnostic use. However, for M-protein isotype classification, combining machine learning solutions for numerical data from capillary electrophoresis with gel electrophoresis image data would be most advantageous.https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0299600&type=printable
spellingShingle Alexandros Sopasakis
Maria Nilsson
Mattias Askenmo
Fredrik Nyholm
Lillemor Mattsson Hultén
Victoria Rotter Sopasakis
Machine learning evaluation for identification of M-proteins in human serum.
PLoS ONE
title Machine learning evaluation for identification of M-proteins in human serum.
title_full Machine learning evaluation for identification of M-proteins in human serum.
title_fullStr Machine learning evaluation for identification of M-proteins in human serum.
title_full_unstemmed Machine learning evaluation for identification of M-proteins in human serum.
title_short Machine learning evaluation for identification of M-proteins in human serum.
title_sort machine learning evaluation for identification of m proteins in human serum
url https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0299600&type=printable
work_keys_str_mv AT alexandrossopasakis machinelearningevaluationforidentificationofmproteinsinhumanserum
AT marianilsson machinelearningevaluationforidentificationofmproteinsinhumanserum
AT mattiasaskenmo machinelearningevaluationforidentificationofmproteinsinhumanserum
AT fredriknyholm machinelearningevaluationforidentificationofmproteinsinhumanserum
AT lillemormattssonhulten machinelearningevaluationforidentificationofmproteinsinhumanserum
AT victoriarottersopasakis machinelearningevaluationforidentificationofmproteinsinhumanserum