Protocol to identify functional doppelgängers and verify biomedical gene expression data using doppelgangerIdentifier

Functional doppelgängers (FDs) are independently derived sample pairs that confound machine learning model (ML) performance when assorted across training and validation sets. Here, we detail the use of doppelgangerIdentifier (DI), providing software installation, data preparation, doppelgänger ident...

Full description

Bibliographic Details
Main Authors: Wang, Li Rong, Fan, Xiuyi, Goh, Wilson Wen Bin
Other Authors: School of Computer Science and Engineering
Format: Journal Article
Language:English
Published: 2023
Subjects:
Online Access:https://hdl.handle.net/10356/164598
_version_ 1826112588052692992
author Wang, Li Rong
Fan, Xiuyi
Goh, Wilson Wen Bin
author2 School of Computer Science and Engineering
author_facet School of Computer Science and Engineering
Wang, Li Rong
Fan, Xiuyi
Goh, Wilson Wen Bin
author_sort Wang, Li Rong
collection NTU
description Functional doppelgängers (FDs) are independently derived sample pairs that confound machine learning model (ML) performance when assorted across training and validation sets. Here, we detail the use of doppelgangerIdentifier (DI), providing software installation, data preparation, doppelgänger identification, and functional testing steps. We demonstrate examples with biomedical gene expression data. We also provide guidelines for the selection of user-defined function arguments. For complete details on the use and execution of this protocol, please refer to Wang et al. (2022).
first_indexed 2024-10-01T03:09:31Z
format Journal Article
id ntu-10356/164598
institution Nanyang Technological University
language English
last_indexed 2024-10-01T03:09:31Z
publishDate 2023
record_format dspace
spelling ntu-10356/1645982023-02-28T17:13:49Z Protocol to identify functional doppelgängers and verify biomedical gene expression data using doppelgangerIdentifier Wang, Li Rong Fan, Xiuyi Goh, Wilson Wen Bin School of Computer Science and Engineering Lee Kong Chian School of Medicine (LKCMedicine) School of Biological Sciences Centre for Biomedical Informatics Engineering::Computer science and engineering Science::Biological sciences Gene Expression Machine Learning Functional doppelgängers (FDs) are independently derived sample pairs that confound machine learning model (ML) performance when assorted across training and validation sets. Here, we detail the use of doppelgangerIdentifier (DI), providing software installation, data preparation, doppelgänger identification, and functional testing steps. We demonstrate examples with biomedical gene expression data. We also provide guidelines for the selection of user-defined function arguments. For complete details on the use and execution of this protocol, please refer to Wang et al. (2022). Ministry of Education (MOE) National Research Foundation (NRF) Published version This research/project is supported by the National Research Foundation, Singapore under its Industry Alignment Fund – Pre-positioning (IAF-PP) Funding Initiative. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not reflect the views of National Research Foundation, Singapore. W.W.B.G. also acknowledges support from a Ministry of Education (MOE), Singapore Tier 1 grant (grant no. RG35/20). 2023-02-06T05:37:02Z 2023-02-06T05:37:02Z 2022 Journal Article Wang, L. R., Fan, X. & Goh, W. W. B. (2022). Protocol to identify functional doppelgängers and verify biomedical gene expression data using doppelgangerIdentifier. STAR Protocols, 3(4), 101783-. https://dx.doi.org/10.1016/j.xpro.2022.101783 2666-1667 https://hdl.handle.net/10356/164598 10.1016/j.xpro.2022.101783 36317174 2-s2.0-85140458047 4 3 101783 en RG35/20 STAR Protocols © 2022 The Author(s). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). application/pdf
spellingShingle Engineering::Computer science and engineering
Science::Biological sciences
Gene Expression
Machine Learning
Wang, Li Rong
Fan, Xiuyi
Goh, Wilson Wen Bin
Protocol to identify functional doppelgängers and verify biomedical gene expression data using doppelgangerIdentifier
title Protocol to identify functional doppelgängers and verify biomedical gene expression data using doppelgangerIdentifier
title_full Protocol to identify functional doppelgängers and verify biomedical gene expression data using doppelgangerIdentifier
title_fullStr Protocol to identify functional doppelgängers and verify biomedical gene expression data using doppelgangerIdentifier
title_full_unstemmed Protocol to identify functional doppelgängers and verify biomedical gene expression data using doppelgangerIdentifier
title_short Protocol to identify functional doppelgängers and verify biomedical gene expression data using doppelgangerIdentifier
title_sort protocol to identify functional doppelgangers and verify biomedical gene expression data using doppelgangeridentifier
topic Engineering::Computer science and engineering
Science::Biological sciences
Gene Expression
Machine Learning
url https://hdl.handle.net/10356/164598
work_keys_str_mv AT wanglirong protocoltoidentifyfunctionaldoppelgangersandverifybiomedicalgeneexpressiondatausingdoppelgangeridentifier
AT fanxiuyi protocoltoidentifyfunctionaldoppelgangersandverifybiomedicalgeneexpressiondatausingdoppelgangeridentifier
AT gohwilsonwenbin protocoltoidentifyfunctionaldoppelgangersandverifybiomedicalgeneexpressiondatausingdoppelgangeridentifier