Characterization of the effects of outliers on ComBat harmonization for removing inter-site data heterogeneity in multisite neuroimaging studies

Data harmonization is a key step widely used in multisite neuroimaging studies to remove inter-site heterogeneity of data distribution. However, data harmonization may even introduce additional inter-site differences in neuroimaging data if outliers are present in the data of one or more sites. It r...

Full description

Bibliographic Details
Main Authors: Qichao Han, Xiaoxiao Xiao, Sijia Wang, Wen Qin, Chunshui Yu, Meng Liang
Format: Article
Language:English
Published: Frontiers Media S.A. 2023-05-01
Series:Frontiers in Neuroscience
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fnins.2023.1146175/full
_version_ 1797820680503296000
author Qichao Han
Xiaoxiao Xiao
Sijia Wang
Wen Qin
Chunshui Yu
Chunshui Yu
Meng Liang
author_facet Qichao Han
Xiaoxiao Xiao
Sijia Wang
Wen Qin
Chunshui Yu
Chunshui Yu
Meng Liang
author_sort Qichao Han
collection DOAJ
description Data harmonization is a key step widely used in multisite neuroimaging studies to remove inter-site heterogeneity of data distribution. However, data harmonization may even introduce additional inter-site differences in neuroimaging data if outliers are present in the data of one or more sites. It remains unclear how the presence of outliers could affect the effectiveness of data harmonization and consequently the results of analyses using harmonized data. To address this question, we generated a normal simulation dataset without outliers and a series of simulation datasets with outliers of varying properties (e.g., outlier location, outlier quantity, and outlier score) based on a real large-sample neuroimaging dataset. We first verified the effectiveness of the most commonly used ComBat harmonization method in the removal of inter-site heterogeneity using the normal simulation data, and then characterized the effects of outliers on the effectiveness of ComBat harmonization and on the results of association analyses between brain imaging-derived phenotypes and a simulated behavioral variable using the simulation datasets with outliers. We found that, although ComBat harmonization effectively removed the inter-site heterogeneity in multisite data and consequently improved the detection of the true brain-behavior relationships, the presence of outliers could damage severely the effectiveness of ComBat harmonization in the removal of data heterogeneity or even introduce extra heterogeneity in the data. Moreover, we found that the effects of outliers on the improvement of the detection of brain-behavior associations by ComBat harmonization were dependent on how such associations were assessed (i.e., by Pearson correlation or Spearman correlation), and on the outlier location, quantity, and outlier score. These findings help us better understand the influences of outliers on data harmonization and highlight the importance of detecting and removing outliers prior to data harmonization in multisite neuroimaging studies.
first_indexed 2024-03-13T09:41:50Z
format Article
id doaj.art-c3ef13f65f5c4cc29735614d9c9b94fb
institution Directory Open Access Journal
issn 1662-453X
language English
last_indexed 2024-03-13T09:41:50Z
publishDate 2023-05-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Neuroscience
spelling doaj.art-c3ef13f65f5c4cc29735614d9c9b94fb2023-05-25T04:24:36ZengFrontiers Media S.A.Frontiers in Neuroscience1662-453X2023-05-011710.3389/fnins.2023.11461751146175Characterization of the effects of outliers on ComBat harmonization for removing inter-site data heterogeneity in multisite neuroimaging studiesQichao Han0Xiaoxiao Xiao1Sijia Wang2Wen Qin3Chunshui Yu4Chunshui Yu5Meng Liang6School of Medical Technology, School of Medical Imaging, Tianjin Key Laboratory of Functional Imaging, Tianjin Medical University, Tianjin, ChinaSchool of Medical Technology, School of Medical Imaging, Tianjin Key Laboratory of Functional Imaging, Tianjin Medical University, Tianjin, ChinaSchool of Medical Technology, School of Medical Imaging, Tianjin Key Laboratory of Functional Imaging, Tianjin Medical University, Tianjin, ChinaDepartment of Radiology and Tianjin Key Laboratory of Functional Imaging, Tianjin Medical University General Hospital, Tianjin, ChinaSchool of Medical Technology, School of Medical Imaging, Tianjin Key Laboratory of Functional Imaging, Tianjin Medical University, Tianjin, ChinaDepartment of Radiology and Tianjin Key Laboratory of Functional Imaging, Tianjin Medical University General Hospital, Tianjin, ChinaSchool of Medical Technology, School of Medical Imaging, Tianjin Key Laboratory of Functional Imaging, Tianjin Medical University, Tianjin, ChinaData harmonization is a key step widely used in multisite neuroimaging studies to remove inter-site heterogeneity of data distribution. However, data harmonization may even introduce additional inter-site differences in neuroimaging data if outliers are present in the data of one or more sites. It remains unclear how the presence of outliers could affect the effectiveness of data harmonization and consequently the results of analyses using harmonized data. To address this question, we generated a normal simulation dataset without outliers and a series of simulation datasets with outliers of varying properties (e.g., outlier location, outlier quantity, and outlier score) based on a real large-sample neuroimaging dataset. We first verified the effectiveness of the most commonly used ComBat harmonization method in the removal of inter-site heterogeneity using the normal simulation data, and then characterized the effects of outliers on the effectiveness of ComBat harmonization and on the results of association analyses between brain imaging-derived phenotypes and a simulated behavioral variable using the simulation datasets with outliers. We found that, although ComBat harmonization effectively removed the inter-site heterogeneity in multisite data and consequently improved the detection of the true brain-behavior relationships, the presence of outliers could damage severely the effectiveness of ComBat harmonization in the removal of data heterogeneity or even introduce extra heterogeneity in the data. Moreover, we found that the effects of outliers on the improvement of the detection of brain-behavior associations by ComBat harmonization were dependent on how such associations were assessed (i.e., by Pearson correlation or Spearman correlation), and on the outlier location, quantity, and outlier score. These findings help us better understand the influences of outliers on data harmonization and highlight the importance of detecting and removing outliers prior to data harmonization in multisite neuroimaging studies.https://www.frontiersin.org/articles/10.3389/fnins.2023.1146175/fulloutlierssite effectComBat harmonizationimaging derived phenotypesmagnetic resonance imagingmultisite
spellingShingle Qichao Han
Xiaoxiao Xiao
Sijia Wang
Wen Qin
Chunshui Yu
Chunshui Yu
Meng Liang
Characterization of the effects of outliers on ComBat harmonization for removing inter-site data heterogeneity in multisite neuroimaging studies
Frontiers in Neuroscience
outliers
site effect
ComBat harmonization
imaging derived phenotypes
magnetic resonance imaging
multisite
title Characterization of the effects of outliers on ComBat harmonization for removing inter-site data heterogeneity in multisite neuroimaging studies
title_full Characterization of the effects of outliers on ComBat harmonization for removing inter-site data heterogeneity in multisite neuroimaging studies
title_fullStr Characterization of the effects of outliers on ComBat harmonization for removing inter-site data heterogeneity in multisite neuroimaging studies
title_full_unstemmed Characterization of the effects of outliers on ComBat harmonization for removing inter-site data heterogeneity in multisite neuroimaging studies
title_short Characterization of the effects of outliers on ComBat harmonization for removing inter-site data heterogeneity in multisite neuroimaging studies
title_sort characterization of the effects of outliers on combat harmonization for removing inter site data heterogeneity in multisite neuroimaging studies
topic outliers
site effect
ComBat harmonization
imaging derived phenotypes
magnetic resonance imaging
multisite
url https://www.frontiersin.org/articles/10.3389/fnins.2023.1146175/full
work_keys_str_mv AT qichaohan characterizationoftheeffectsofoutliersoncombatharmonizationforremovingintersitedataheterogeneityinmultisiteneuroimagingstudies
AT xiaoxiaoxiao characterizationoftheeffectsofoutliersoncombatharmonizationforremovingintersitedataheterogeneityinmultisiteneuroimagingstudies
AT sijiawang characterizationoftheeffectsofoutliersoncombatharmonizationforremovingintersitedataheterogeneityinmultisiteneuroimagingstudies
AT wenqin characterizationoftheeffectsofoutliersoncombatharmonizationforremovingintersitedataheterogeneityinmultisiteneuroimagingstudies
AT chunshuiyu characterizationoftheeffectsofoutliersoncombatharmonizationforremovingintersitedataheterogeneityinmultisiteneuroimagingstudies
AT chunshuiyu characterizationoftheeffectsofoutliersoncombatharmonizationforremovingintersitedataheterogeneityinmultisiteneuroimagingstudies
AT mengliang characterizationoftheeffectsofoutliersoncombatharmonizationforremovingintersitedataheterogeneityinmultisiteneuroimagingstudies