Automated subtyping of HIV-1 genetic sequences for clinical and surveillance purposes: performance evaluation of the new REGA version 3 and seven other tools.

BACKGROUND: To investigate differences in pathogenesis, diagnosis and resistance pathways between HIV-1 subtypes, an accurate subtyping tool for large datasets is needed. We aimed to evaluate the performance of automated subtyping tools to classify the different subtypes and circulating recombinant...

Full description

Bibliographic Details
Main Authors: Pineda-Peña, A, Faria, N, Imbrechts, S, Libin, P, Abecasis, AB, Deforche, K, Gómez-López, A, Camacho, R, de Oliveira, T, Vandamme, A
Format: Journal article
Language:English
Published: 2013
_version_ 1826257407966183424
author Pineda-Peña, A
Faria, N
Imbrechts, S
Libin, P
Abecasis, AB
Deforche, K
Gómez-López, A
Camacho, R
de Oliveira, T
Vandamme, A
author_facet Pineda-Peña, A
Faria, N
Imbrechts, S
Libin, P
Abecasis, AB
Deforche, K
Gómez-López, A
Camacho, R
de Oliveira, T
Vandamme, A
author_sort Pineda-Peña, A
collection OXFORD
description BACKGROUND: To investigate differences in pathogenesis, diagnosis and resistance pathways between HIV-1 subtypes, an accurate subtyping tool for large datasets is needed. We aimed to evaluate the performance of automated subtyping tools to classify the different subtypes and circulating recombinant forms using pol, the most sequenced region in clinical practice. We also present the upgraded version 3 of the Rega HIV subtyping tool (REGAv3). METHODOLOGY: HIV-1 pol sequences (PR+RT) for 4674 patients retrieved from the Portuguese HIV Drug Resistance Database, and 1872 pol sequences trimmed from full-length genomes retrieved from the Los Alamos database were classified with statistical-based tools such as COMET, jpHMM and STAR; similarity-based tools such as NCBI and Stanford; and phylogenetic-based tools such as REGA version 2 (REGAv2), REGAv3, and SCUEAL. The performance of these tools, for pol, and for PR and RT separately, was compared in terms of reproducibility, sensitivity and specificity with respect to the gold standard which was manual phylogenetic analysis of the pol region. RESULTS: The sensitivity and specificity for subtypes B and C was more than 96% for seven tools, but was variable for other subtypes such as A, D, F and G. With regard to the most common circulating recombinant forms (CRFs), the sensitivity and specificity for CRF01_AE was ~99% with statistical-based tools, with phylogenetic-based tools and with Stanford, one of the similarity based tools. CRF02_AG was correctly identified for more than 96% by COMET, REGAv3, Stanford and STAR. All the tools reached a specificity of more than 97% for most of the subtypes and the two main CRFs (CRF01_AE and CRF02_AG). Other CRFs were identified only by COMET, REGAv2, REGAv3, and SCUEAL and with variable sensitivity. When analyzing sequences for PR and RT separately, the performance for PR was generally lower and variable between the tools. Similarity and statistical-based tools were 100% reproducible, but this was lower for phylogenetic-based tools such as REGA (~99%) and SCUEAL (~96%). CONCLUSIONS: REGAv3 had an improved performance for subtype B and CRF02_AG compared to REGAv2 and is now able to also identify all epidemiologically relevant CRFs. In general the best performing tools, in alphabetical order, were COMET, jpHMM, REGAv3, and SCUEAL when analyzing pure subtypes in the pol region, and COMET and REGAv3 when analyzing most of the CRFs. Based on this study, we recommend to confirm subtyping with 2 well performing tools, and be cautious with the interpretation of short sequences.
first_indexed 2024-03-06T18:17:43Z
format Journal article
id oxford-uuid:0530ff09-bc99-4e48-853c-6a390c097a24
institution University of Oxford
language English
last_indexed 2024-03-06T18:17:43Z
publishDate 2013
record_format dspace
spelling oxford-uuid:0530ff09-bc99-4e48-853c-6a390c097a242022-03-26T08:55:48ZAutomated subtyping of HIV-1 genetic sequences for clinical and surveillance purposes: performance evaluation of the new REGA version 3 and seven other tools.Journal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:0530ff09-bc99-4e48-853c-6a390c097a24EnglishSymplectic Elements at Oxford2013Pineda-Peña, AFaria, NImbrechts, SLibin, PAbecasis, ABDeforche, KGómez-López, ACamacho, Rde Oliveira, TVandamme, ABACKGROUND: To investigate differences in pathogenesis, diagnosis and resistance pathways between HIV-1 subtypes, an accurate subtyping tool for large datasets is needed. We aimed to evaluate the performance of automated subtyping tools to classify the different subtypes and circulating recombinant forms using pol, the most sequenced region in clinical practice. We also present the upgraded version 3 of the Rega HIV subtyping tool (REGAv3). METHODOLOGY: HIV-1 pol sequences (PR+RT) for 4674 patients retrieved from the Portuguese HIV Drug Resistance Database, and 1872 pol sequences trimmed from full-length genomes retrieved from the Los Alamos database were classified with statistical-based tools such as COMET, jpHMM and STAR; similarity-based tools such as NCBI and Stanford; and phylogenetic-based tools such as REGA version 2 (REGAv2), REGAv3, and SCUEAL. The performance of these tools, for pol, and for PR and RT separately, was compared in terms of reproducibility, sensitivity and specificity with respect to the gold standard which was manual phylogenetic analysis of the pol region. RESULTS: The sensitivity and specificity for subtypes B and C was more than 96% for seven tools, but was variable for other subtypes such as A, D, F and G. With regard to the most common circulating recombinant forms (CRFs), the sensitivity and specificity for CRF01_AE was ~99% with statistical-based tools, with phylogenetic-based tools and with Stanford, one of the similarity based tools. CRF02_AG was correctly identified for more than 96% by COMET, REGAv3, Stanford and STAR. All the tools reached a specificity of more than 97% for most of the subtypes and the two main CRFs (CRF01_AE and CRF02_AG). Other CRFs were identified only by COMET, REGAv2, REGAv3, and SCUEAL and with variable sensitivity. When analyzing sequences for PR and RT separately, the performance for PR was generally lower and variable between the tools. Similarity and statistical-based tools were 100% reproducible, but this was lower for phylogenetic-based tools such as REGA (~99%) and SCUEAL (~96%). CONCLUSIONS: REGAv3 had an improved performance for subtype B and CRF02_AG compared to REGAv2 and is now able to also identify all epidemiologically relevant CRFs. In general the best performing tools, in alphabetical order, were COMET, jpHMM, REGAv3, and SCUEAL when analyzing pure subtypes in the pol region, and COMET and REGAv3 when analyzing most of the CRFs. Based on this study, we recommend to confirm subtyping with 2 well performing tools, and be cautious with the interpretation of short sequences.
spellingShingle Pineda-Peña, A
Faria, N
Imbrechts, S
Libin, P
Abecasis, AB
Deforche, K
Gómez-López, A
Camacho, R
de Oliveira, T
Vandamme, A
Automated subtyping of HIV-1 genetic sequences for clinical and surveillance purposes: performance evaluation of the new REGA version 3 and seven other tools.
title Automated subtyping of HIV-1 genetic sequences for clinical and surveillance purposes: performance evaluation of the new REGA version 3 and seven other tools.
title_full Automated subtyping of HIV-1 genetic sequences for clinical and surveillance purposes: performance evaluation of the new REGA version 3 and seven other tools.
title_fullStr Automated subtyping of HIV-1 genetic sequences for clinical and surveillance purposes: performance evaluation of the new REGA version 3 and seven other tools.
title_full_unstemmed Automated subtyping of HIV-1 genetic sequences for clinical and surveillance purposes: performance evaluation of the new REGA version 3 and seven other tools.
title_short Automated subtyping of HIV-1 genetic sequences for clinical and surveillance purposes: performance evaluation of the new REGA version 3 and seven other tools.
title_sort automated subtyping of hiv 1 genetic sequences for clinical and surveillance purposes performance evaluation of the new rega version 3 and seven other tools
work_keys_str_mv AT pinedapenaa automatedsubtypingofhiv1geneticsequencesforclinicalandsurveillancepurposesperformanceevaluationofthenewregaversion3andsevenothertools
AT farian automatedsubtypingofhiv1geneticsequencesforclinicalandsurveillancepurposesperformanceevaluationofthenewregaversion3andsevenothertools
AT imbrechtss automatedsubtypingofhiv1geneticsequencesforclinicalandsurveillancepurposesperformanceevaluationofthenewregaversion3andsevenothertools
AT libinp automatedsubtypingofhiv1geneticsequencesforclinicalandsurveillancepurposesperformanceevaluationofthenewregaversion3andsevenothertools
AT abecasisab automatedsubtypingofhiv1geneticsequencesforclinicalandsurveillancepurposesperformanceevaluationofthenewregaversion3andsevenothertools
AT deforchek automatedsubtypingofhiv1geneticsequencesforclinicalandsurveillancepurposesperformanceevaluationofthenewregaversion3andsevenothertools
AT gomezlopeza automatedsubtypingofhiv1geneticsequencesforclinicalandsurveillancepurposesperformanceevaluationofthenewregaversion3andsevenothertools
AT camachor automatedsubtypingofhiv1geneticsequencesforclinicalandsurveillancepurposesperformanceevaluationofthenewregaversion3andsevenothertools
AT deoliveirat automatedsubtypingofhiv1geneticsequencesforclinicalandsurveillancepurposesperformanceevaluationofthenewregaversion3andsevenothertools
AT vandammea automatedsubtypingofhiv1geneticsequencesforclinicalandsurveillancepurposesperformanceevaluationofthenewregaversion3andsevenothertools