Federated learning of molecular properties with graph neural networks in a heterogeneous setting
Summary: Chemistry research has both high material and computational costs to conduct experiments. Intuitions are interested in differing classes of molecules, creating heterogeneous data that cannot be easily joined by conventional methods. This work introduces federated heterogeneous molecular lea...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2022-06-01
|
Series: | Patterns |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2666389922001180 |
_version_ | 1817975416865947648 |
---|---|
author | Wei Zhu Jiebo Luo Andrew D. White |
author_facet | Wei Zhu Jiebo Luo Andrew D. White |
author_sort | Wei Zhu |
collection | DOAJ |
description | Summary: Chemistry research has both high material and computational costs to conduct experiments. Intuitions are interested in differing classes of molecules, creating heterogeneous data that cannot be easily joined by conventional methods. This work introduces federated heterogeneous molecular learning. Federated learning allows end users to build a global model collaboratively while keeping their training data isolated. We first simulate a heterogeneous federated-learning benchmark (FedChem) by jointly performing scaffold splitting and latent Dirichlet allocation on existing datasets. Our results on FedChem show that significant learning challenges arise when working with heterogeneous molecules across clients. We then propose a method to alleviate the problem: Federated Learning by Instance reweighTing (FLIT(+)). FLIT(+) can align local training across clients. Experiments conducted on FedChem validate the advantages of this method. This work should enable a new type of collaboration for improving artificial intelligence (AI) in chemistry that mitigates concerns about sharing valuable chemical data. The bigger picture: Generating datasets with thousands of molecules for machine learning in chemistry is cost prohibitive due to the high material and/or computational costs. Additionally, chemical data’s intrinsic value makes institutions reluctant to contribute to a centralized dataset. Recent studies suggest that deep learning has the potential to accelerate molecule discovery, but there are few large datasets for chemistry. Instead, individual institutions gather their data privately, which leads to under-trained models with poor generalization performance. Even worse, the local models can be biased because institutions often focus on certain regions of chemical space important for their interests and expertise. We propose a federated-learning method with graph neural networks that can treat this heterogeneity and enable accurate federated learning on molecular-property prediction. We propose a heterogeneous federated-learning benchmark and show that our method is state of the art. |
first_indexed | 2024-04-13T21:49:41Z |
format | Article |
id | doaj.art-1de9c8f43ea64ee4913229a1a328b99f |
institution | Directory Open Access Journal |
issn | 2666-3899 |
language | English |
last_indexed | 2024-04-13T21:49:41Z |
publishDate | 2022-06-01 |
publisher | Elsevier |
record_format | Article |
series | Patterns |
spelling | doaj.art-1de9c8f43ea64ee4913229a1a328b99f2022-12-22T02:28:27ZengElsevierPatterns2666-38992022-06-0136100521Federated learning of molecular properties with graph neural networks in a heterogeneous settingWei Zhu0Jiebo Luo1Andrew D. White2Department of Computer Science, University of Rochester, Rochester, NY, USADepartment of Computer Science, University of Rochester, Rochester, NY, USADepartment of Chemical Engineering, University of Rochester, Rochester, NY, USA; Corresponding authorSummary: Chemistry research has both high material and computational costs to conduct experiments. Intuitions are interested in differing classes of molecules, creating heterogeneous data that cannot be easily joined by conventional methods. This work introduces federated heterogeneous molecular learning. Federated learning allows end users to build a global model collaboratively while keeping their training data isolated. We first simulate a heterogeneous federated-learning benchmark (FedChem) by jointly performing scaffold splitting and latent Dirichlet allocation on existing datasets. Our results on FedChem show that significant learning challenges arise when working with heterogeneous molecules across clients. We then propose a method to alleviate the problem: Federated Learning by Instance reweighTing (FLIT(+)). FLIT(+) can align local training across clients. Experiments conducted on FedChem validate the advantages of this method. This work should enable a new type of collaboration for improving artificial intelligence (AI) in chemistry that mitigates concerns about sharing valuable chemical data. The bigger picture: Generating datasets with thousands of molecules for machine learning in chemistry is cost prohibitive due to the high material and/or computational costs. Additionally, chemical data’s intrinsic value makes institutions reluctant to contribute to a centralized dataset. Recent studies suggest that deep learning has the potential to accelerate molecule discovery, but there are few large datasets for chemistry. Instead, individual institutions gather their data privately, which leads to under-trained models with poor generalization performance. Even worse, the local models can be biased because institutions often focus on certain regions of chemical space important for their interests and expertise. We propose a federated-learning method with graph neural networks that can treat this heterogeneity and enable accurate federated learning on molecular-property prediction. We propose a heterogeneous federated-learning benchmark and show that our method is state of the art.http://www.sciencedirect.com/science/article/pii/S2666389922001180DSML 3: Development/pre-production: Data science output has been rolled out/validated across multiple domains/problem |
spellingShingle | Wei Zhu Jiebo Luo Andrew D. White Federated learning of molecular properties with graph neural networks in a heterogeneous setting Patterns DSML 3: Development/pre-production: Data science output has been rolled out/validated across multiple domains/problem |
title | Federated learning of molecular properties with graph neural networks in a heterogeneous setting |
title_full | Federated learning of molecular properties with graph neural networks in a heterogeneous setting |
title_fullStr | Federated learning of molecular properties with graph neural networks in a heterogeneous setting |
title_full_unstemmed | Federated learning of molecular properties with graph neural networks in a heterogeneous setting |
title_short | Federated learning of molecular properties with graph neural networks in a heterogeneous setting |
title_sort | federated learning of molecular properties with graph neural networks in a heterogeneous setting |
topic | DSML 3: Development/pre-production: Data science output has been rolled out/validated across multiple domains/problem |
url | http://www.sciencedirect.com/science/article/pii/S2666389922001180 |
work_keys_str_mv | AT weizhu federatedlearningofmolecularpropertieswithgraphneuralnetworksinaheterogeneoussetting AT jieboluo federatedlearningofmolecularpropertieswithgraphneuralnetworksinaheterogeneoussetting AT andrewdwhite federatedlearningofmolecularpropertieswithgraphneuralnetworksinaheterogeneoussetting |