Federated learning of molecular properties with graph neural networks in a heterogeneous setting

Summary: Chemistry research has both high material and computational costs to conduct experiments. Intuitions are interested in differing classes of molecules, creating heterogeneous data that cannot be easily joined by conventional methods. This work introduces federated heterogeneous molecular lea...

Full description

Bibliographic Details
Main Authors: Wei Zhu, Jiebo Luo, Andrew D. White
Format: Article
Language:English
Published: Elsevier 2022-06-01
Series:Patterns
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2666389922001180
_version_ 1817975416865947648
author Wei Zhu
Jiebo Luo
Andrew D. White
author_facet Wei Zhu
Jiebo Luo
Andrew D. White
author_sort Wei Zhu
collection DOAJ
description Summary: Chemistry research has both high material and computational costs to conduct experiments. Intuitions are interested in differing classes of molecules, creating heterogeneous data that cannot be easily joined by conventional methods. This work introduces federated heterogeneous molecular learning. Federated learning allows end users to build a global model collaboratively while keeping their training data isolated. We first simulate a heterogeneous federated-learning benchmark (FedChem) by jointly performing scaffold splitting and latent Dirichlet allocation on existing datasets. Our results on FedChem show that significant learning challenges arise when working with heterogeneous molecules across clients. We then propose a method to alleviate the problem: Federated Learning by Instance reweighTing (FLIT(+)). FLIT(+) can align local training across clients. Experiments conducted on FedChem validate the advantages of this method. This work should enable a new type of collaboration for improving artificial intelligence (AI) in chemistry that mitigates concerns about sharing valuable chemical data. The bigger picture: Generating datasets with thousands of molecules for machine learning in chemistry is cost prohibitive due to the high material and/or computational costs. Additionally, chemical data’s intrinsic value makes institutions reluctant to contribute to a centralized dataset. Recent studies suggest that deep learning has the potential to accelerate molecule discovery, but there are few large datasets for chemistry. Instead, individual institutions gather their data privately, which leads to under-trained models with poor generalization performance. Even worse, the local models can be biased because institutions often focus on certain regions of chemical space important for their interests and expertise. We propose a federated-learning method with graph neural networks that can treat this heterogeneity and enable accurate federated learning on molecular-property prediction. We propose a heterogeneous federated-learning benchmark and show that our method is state of the art.
first_indexed 2024-04-13T21:49:41Z
format Article
id doaj.art-1de9c8f43ea64ee4913229a1a328b99f
institution Directory Open Access Journal
issn 2666-3899
language English
last_indexed 2024-04-13T21:49:41Z
publishDate 2022-06-01
publisher Elsevier
record_format Article
series Patterns
spelling doaj.art-1de9c8f43ea64ee4913229a1a328b99f2022-12-22T02:28:27ZengElsevierPatterns2666-38992022-06-0136100521Federated learning of molecular properties with graph neural networks in a heterogeneous settingWei Zhu0Jiebo Luo1Andrew D. White2Department of Computer Science, University of Rochester, Rochester, NY, USADepartment of Computer Science, University of Rochester, Rochester, NY, USADepartment of Chemical Engineering, University of Rochester, Rochester, NY, USA; Corresponding authorSummary: Chemistry research has both high material and computational costs to conduct experiments. Intuitions are interested in differing classes of molecules, creating heterogeneous data that cannot be easily joined by conventional methods. This work introduces federated heterogeneous molecular learning. Federated learning allows end users to build a global model collaboratively while keeping their training data isolated. We first simulate a heterogeneous federated-learning benchmark (FedChem) by jointly performing scaffold splitting and latent Dirichlet allocation on existing datasets. Our results on FedChem show that significant learning challenges arise when working with heterogeneous molecules across clients. We then propose a method to alleviate the problem: Federated Learning by Instance reweighTing (FLIT(+)). FLIT(+) can align local training across clients. Experiments conducted on FedChem validate the advantages of this method. This work should enable a new type of collaboration for improving artificial intelligence (AI) in chemistry that mitigates concerns about sharing valuable chemical data. The bigger picture: Generating datasets with thousands of molecules for machine learning in chemistry is cost prohibitive due to the high material and/or computational costs. Additionally, chemical data’s intrinsic value makes institutions reluctant to contribute to a centralized dataset. Recent studies suggest that deep learning has the potential to accelerate molecule discovery, but there are few large datasets for chemistry. Instead, individual institutions gather their data privately, which leads to under-trained models with poor generalization performance. Even worse, the local models can be biased because institutions often focus on certain regions of chemical space important for their interests and expertise. We propose a federated-learning method with graph neural networks that can treat this heterogeneity and enable accurate federated learning on molecular-property prediction. We propose a heterogeneous federated-learning benchmark and show that our method is state of the art.http://www.sciencedirect.com/science/article/pii/S2666389922001180DSML 3: Development/pre-production: Data science output has been rolled out/validated across multiple domains/problem
spellingShingle Wei Zhu
Jiebo Luo
Andrew D. White
Federated learning of molecular properties with graph neural networks in a heterogeneous setting
Patterns
DSML 3: Development/pre-production: Data science output has been rolled out/validated across multiple domains/problem
title Federated learning of molecular properties with graph neural networks in a heterogeneous setting
title_full Federated learning of molecular properties with graph neural networks in a heterogeneous setting
title_fullStr Federated learning of molecular properties with graph neural networks in a heterogeneous setting
title_full_unstemmed Federated learning of molecular properties with graph neural networks in a heterogeneous setting
title_short Federated learning of molecular properties with graph neural networks in a heterogeneous setting
title_sort federated learning of molecular properties with graph neural networks in a heterogeneous setting
topic DSML 3: Development/pre-production: Data science output has been rolled out/validated across multiple domains/problem
url http://www.sciencedirect.com/science/article/pii/S2666389922001180
work_keys_str_mv AT weizhu federatedlearningofmolecularpropertieswithgraphneuralnetworksinaheterogeneoussetting
AT jieboluo federatedlearningofmolecularpropertieswithgraphneuralnetworksinaheterogeneoussetting
AT andrewdwhite federatedlearningofmolecularpropertieswithgraphneuralnetworksinaheterogeneoussetting