Use of unsupervised machine learning to characterise HIV predictors in sub-Saharan Africa

Abstract Introduction Significant regional variations in the HIV epidemic hurt effective common interventions in sub-Saharan Africa. It is crucial to analyze HIV positivity distributions within clusters and assess the homogeneity of countries. We aim at identifying clusters of countries based on soc...

Full description

Bibliographic Details
Main Authors: Charles K. Mutai, Patrick E. McSharry, Innocent Ngaruye, Edouard Musabanganji
Format: Article
Language:English
Published: BMC 2023-07-01
Series:BMC Infectious Diseases
Online Access:https://doi.org/10.1186/s12879-023-08467-7
_version_ 1797774367796494336
author Charles K. Mutai
Patrick E. McSharry
Innocent Ngaruye
Edouard Musabanganji
author_facet Charles K. Mutai
Patrick E. McSharry
Innocent Ngaruye
Edouard Musabanganji
author_sort Charles K. Mutai
collection DOAJ
description Abstract Introduction Significant regional variations in the HIV epidemic hurt effective common interventions in sub-Saharan Africa. It is crucial to analyze HIV positivity distributions within clusters and assess the homogeneity of countries. We aim at identifying clusters of countries based on socio-behavioural predictors of HIV for screening. Method We used an agglomerative hierarchical, unsupervised machine learning, approach for clustering to analyse data for 146,733 male and 155,622 female respondents from 13 sub-Saharan African countries with 20 and 26 features, respectively, using Population-based HIV Impact Assessment (PHIA) data from the survey years 2015–2019. We employed agglomerative hierarchical clustering and optimal silhouette index criterion to identify clusters of countries based on the similarity of socio-behavioural characteristics. We analyse the distribution of HIV positivity with socio-behavioural predictors of HIV within each cluster. Results Two principal components were obtained, with the first describing 62.3% and 70.1% and the second explaining 18.3% and 20.6% variance of the total socio-behavioural variation in females and males, respectively. Two clusters per sex were identified, and the most predictor features in both sexes were: relationship with family head, enrolled in school, circumcision status for males, delayed pregnancy, work for payment in last 12 months, Urban area indicator, known HIV status and delayed pregnancy. The HIV positivity distribution with these variables was significant within each cluster. Conclusions /findings The findings provide a potential use of unsupervised machine learning approaches for substantially identifying clustered countries based on the underlying socio-behavioural characteristics.
first_indexed 2024-03-12T22:20:00Z
format Article
id doaj.art-a1f3a048ca094d4cbc45cd6ac70ffa07
institution Directory Open Access Journal
issn 1471-2334
language English
last_indexed 2024-03-12T22:20:00Z
publishDate 2023-07-01
publisher BMC
record_format Article
series BMC Infectious Diseases
spelling doaj.art-a1f3a048ca094d4cbc45cd6ac70ffa072023-07-23T11:07:55ZengBMCBMC Infectious Diseases1471-23342023-07-0123111310.1186/s12879-023-08467-7Use of unsupervised machine learning to characterise HIV predictors in sub-Saharan AfricaCharles K. Mutai0Patrick E. McSharry1Innocent Ngaruye2Edouard Musabanganji3African Center of Excellence in Data Science, University of RwandaAfrican Center of Excellence in Data Science, University of RwandaCollege of Science and Technology, University of RwandaCollege of Business and Economics, University of RwandaAbstract Introduction Significant regional variations in the HIV epidemic hurt effective common interventions in sub-Saharan Africa. It is crucial to analyze HIV positivity distributions within clusters and assess the homogeneity of countries. We aim at identifying clusters of countries based on socio-behavioural predictors of HIV for screening. Method We used an agglomerative hierarchical, unsupervised machine learning, approach for clustering to analyse data for 146,733 male and 155,622 female respondents from 13 sub-Saharan African countries with 20 and 26 features, respectively, using Population-based HIV Impact Assessment (PHIA) data from the survey years 2015–2019. We employed agglomerative hierarchical clustering and optimal silhouette index criterion to identify clusters of countries based on the similarity of socio-behavioural characteristics. We analyse the distribution of HIV positivity with socio-behavioural predictors of HIV within each cluster. Results Two principal components were obtained, with the first describing 62.3% and 70.1% and the second explaining 18.3% and 20.6% variance of the total socio-behavioural variation in females and males, respectively. Two clusters per sex were identified, and the most predictor features in both sexes were: relationship with family head, enrolled in school, circumcision status for males, delayed pregnancy, work for payment in last 12 months, Urban area indicator, known HIV status and delayed pregnancy. The HIV positivity distribution with these variables was significant within each cluster. Conclusions /findings The findings provide a potential use of unsupervised machine learning approaches for substantially identifying clustered countries based on the underlying socio-behavioural characteristics.https://doi.org/10.1186/s12879-023-08467-7
spellingShingle Charles K. Mutai
Patrick E. McSharry
Innocent Ngaruye
Edouard Musabanganji
Use of unsupervised machine learning to characterise HIV predictors in sub-Saharan Africa
BMC Infectious Diseases
title Use of unsupervised machine learning to characterise HIV predictors in sub-Saharan Africa
title_full Use of unsupervised machine learning to characterise HIV predictors in sub-Saharan Africa
title_fullStr Use of unsupervised machine learning to characterise HIV predictors in sub-Saharan Africa
title_full_unstemmed Use of unsupervised machine learning to characterise HIV predictors in sub-Saharan Africa
title_short Use of unsupervised machine learning to characterise HIV predictors in sub-Saharan Africa
title_sort use of unsupervised machine learning to characterise hiv predictors in sub saharan africa
url https://doi.org/10.1186/s12879-023-08467-7
work_keys_str_mv AT charleskmutai useofunsupervisedmachinelearningtocharacterisehivpredictorsinsubsaharanafrica
AT patrickemcsharry useofunsupervisedmachinelearningtocharacterisehivpredictorsinsubsaharanafrica
AT innocentngaruye useofunsupervisedmachinelearningtocharacterisehivpredictorsinsubsaharanafrica
AT edouardmusabanganji useofunsupervisedmachinelearningtocharacterisehivpredictorsinsubsaharanafrica