Performance and Information Leakage in Splitfed Learning and Multi-Head Split Learning in Healthcare Data and Beyond

Machine learning (ML) in healthcare data analytics is attracting much attention because of the unprecedented power of ML to extract knowledge that improves the decision-making process. At the same time, laws and ethics codes drafted by countries to govern healthcare data are becoming stringent. Alth...

Full description

Bibliographic Details
Main Authors:	Praveen Joshi, Chandra Thapa, Seyit Camtepe, Mohammed Hasanuzzaman, Ted Scully, Haithem Afli
Format:	Article
Language:	English
Published:	MDPI AG 2022-07-01
Series:	Methods and Protocols
Subjects:	distributed collaborative machine learning split learning multi-head split learning parameter transmission-based distributed machine learning privacy-preserving machine learning information leakage in distributed learning
Online Access:	https://www.mdpi.com/2409-9279/5/4/60

_version_	1797408627340869632
author	Praveen Joshi Chandra Thapa Seyit Camtepe Mohammed Hasanuzzaman Ted Scully Haithem Afli
author_facet	Praveen Joshi Chandra Thapa Seyit Camtepe Mohammed Hasanuzzaman Ted Scully Haithem Afli
author_sort	Praveen Joshi
collection	DOAJ
description	Machine learning (ML) in healthcare data analytics is attracting much attention because of the unprecedented power of ML to extract knowledge that improves the decision-making process. At the same time, laws and ethics codes drafted by countries to govern healthcare data are becoming stringent. Although healthcare practitioners are struggling with an enforced governance framework, we see the emergence of distributed learning-based frameworks disrupting traditional-ML-model development. Splitfed learning (SFL) is one of the recent developments in distributed machine learning that empowers healthcare practitioners to preserve the privacy of input data and enables them to train ML models. However, SFL has some extra communication and computation overheads at the client side due to the requirement of client-side model synchronization. For a resource-constrained client side (hospitals with limited computational powers), removing such conditions is required to gain efficiency in the learning. In this regard, this paper studies SFL without client-side model synchronization. The resulting architecture is known as multi-head split learning (MHSL). At the same time, it is important to investigate information leakage, which indicates how much information is gained by the server related to the raw data directly out of the smashed data—the output of the client-side model portion—passed to it by the client. Our empirical studies examine the Resnet-18 and Conv1-D architecture model on the ECG and HAM-10000 datasets under IID data distribution. The results find that SFL provides 1.81% and 2.36% better accuracy than MHSL on the ECG and HAM-10000 datasets, respectively (for cut-layer value set to 1). Analysis of experimentation with various client-side model portions demonstrates that it has an impact on the overall performance. With an increase in layers in the client-side model portion, SFL performance improves while MHSL performance degrades. Experiment results also demonstrate that information leakage provided by mutual information score values in SFL is more than MHSL for ECG and HAM-10000 datasets by <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>2</mn><mo>×</mo><msup><mn>10</mn><mrow><mo>−</mo><mn>5</mn></mrow></msup></mrow></semantics></math></inline-formula> and <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>4</mn><mo>×</mo><msup><mn>10</mn><mrow><mo>−</mo><mn>3</mn></mrow></msup></mrow></semantics></math></inline-formula>, respectively.
first_indexed	2024-03-09T04:01:30Z
format	Article
id	doaj.art-6baad551885c4865b5cb3a6f337d5ff7
institution	Directory Open Access Journal
issn	2409-9279
language	English
last_indexed	2024-03-09T04:01:30Z
publishDate	2022-07-01
publisher	MDPI AG
record_format	Article
series	Methods and Protocols
spelling	doaj.art-6baad551885c4865b5cb3a6f337d5ff72023-12-03T14:12:31ZengMDPI AGMethods and Protocols2409-92792022-07-01546010.3390/mps5040060Performance and Information Leakage in Splitfed Learning and Multi-Head Split Learning in Healthcare Data and BeyondPraveen Joshi0Chandra Thapa1Seyit Camtepe2Mohammed Hasanuzzaman3Ted Scully4Haithem Afli5Department of Computer Sciences, Munster Technological University, MTU, T12 P928 Cork, IrelandCSIRO Data61, Marsfield, NSW 2122, AustraliaCSIRO Data61, Marsfield, NSW 2122, AustraliaDepartment of Computer Sciences, Munster Technological University, MTU, T12 P928 Cork, IrelandDepartment of Computer Sciences, Munster Technological University, MTU, T12 P928 Cork, IrelandDepartment of Computer Sciences, Munster Technological University, MTU, T12 P928 Cork, IrelandMachine learning (ML) in healthcare data analytics is attracting much attention because of the unprecedented power of ML to extract knowledge that improves the decision-making process. At the same time, laws and ethics codes drafted by countries to govern healthcare data are becoming stringent. Although healthcare practitioners are struggling with an enforced governance framework, we see the emergence of distributed learning-based frameworks disrupting traditional-ML-model development. Splitfed learning (SFL) is one of the recent developments in distributed machine learning that empowers healthcare practitioners to preserve the privacy of input data and enables them to train ML models. However, SFL has some extra communication and computation overheads at the client side due to the requirement of client-side model synchronization. For a resource-constrained client side (hospitals with limited computational powers), removing such conditions is required to gain efficiency in the learning. In this regard, this paper studies SFL without client-side model synchronization. The resulting architecture is known as multi-head split learning (MHSL). At the same time, it is important to investigate information leakage, which indicates how much information is gained by the server related to the raw data directly out of the smashed data—the output of the client-side model portion—passed to it by the client. Our empirical studies examine the Resnet-18 and Conv1-D architecture model on the ECG and HAM-10000 datasets under IID data distribution. The results find that SFL provides 1.81% and 2.36% better accuracy than MHSL on the ECG and HAM-10000 datasets, respectively (for cut-layer value set to 1). Analysis of experimentation with various client-side model portions demonstrates that it has an impact on the overall performance. With an increase in layers in the client-side model portion, SFL performance improves while MHSL performance degrades. Experiment results also demonstrate that information leakage provided by mutual information score values in SFL is more than MHSL for ECG and HAM-10000 datasets by <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>2</mn><mo>×</mo><msup><mn>10</mn><mrow><mo>−</mo><mn>5</mn></mrow></msup></mrow></semantics></math></inline-formula> and <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>4</mn><mo>×</mo><msup><mn>10</mn><mrow><mo>−</mo><mn>3</mn></mrow></msup></mrow></semantics></math></inline-formula>, respectively.https://www.mdpi.com/2409-9279/5/4/60distributed collaborative machine learningsplit learningmulti-head split learningparameter transmission-based distributed machine learningprivacy-preserving machine learninginformation leakage in distributed learning
spellingShingle	Praveen Joshi Chandra Thapa Seyit Camtepe Mohammed Hasanuzzaman Ted Scully Haithem Afli Performance and Information Leakage in Splitfed Learning and Multi-Head Split Learning in Healthcare Data and Beyond Methods and Protocols distributed collaborative machine learning split learning multi-head split learning parameter transmission-based distributed machine learning privacy-preserving machine learning information leakage in distributed learning
title	Performance and Information Leakage in Splitfed Learning and Multi-Head Split Learning in Healthcare Data and Beyond
title_full	Performance and Information Leakage in Splitfed Learning and Multi-Head Split Learning in Healthcare Data and Beyond
title_fullStr	Performance and Information Leakage in Splitfed Learning and Multi-Head Split Learning in Healthcare Data and Beyond
title_full_unstemmed	Performance and Information Leakage in Splitfed Learning and Multi-Head Split Learning in Healthcare Data and Beyond
title_short	Performance and Information Leakage in Splitfed Learning and Multi-Head Split Learning in Healthcare Data and Beyond
title_sort	performance and information leakage in splitfed learning and multi head split learning in healthcare data and beyond
topic	distributed collaborative machine learning split learning multi-head split learning parameter transmission-based distributed machine learning privacy-preserving machine learning information leakage in distributed learning
url	https://www.mdpi.com/2409-9279/5/4/60
work_keys_str_mv	AT praveenjoshi performanceandinformationleakageinsplitfedlearningandmultiheadsplitlearninginhealthcaredataandbeyond AT chandrathapa performanceandinformationleakageinsplitfedlearningandmultiheadsplitlearninginhealthcaredataandbeyond AT seyitcamtepe performanceandinformationleakageinsplitfedlearningandmultiheadsplitlearninginhealthcaredataandbeyond AT mohammedhasanuzzaman performanceandinformationleakageinsplitfedlearningandmultiheadsplitlearninginhealthcaredataandbeyond AT tedscully performanceandinformationleakageinsplitfedlearningandmultiheadsplitlearninginhealthcaredataandbeyond AT haithemafli performanceandinformationleakageinsplitfedlearningandmultiheadsplitlearninginhealthcaredataandbeyond

Performance and Information Leakage in Splitfed Learning and Multi-Head Split Learning in Healthcare Data and Beyond

Similar Items