Graph inductive biases in transformers without message passing
<p>Transformers for graph data are increasingly widely studied and successful in numerous learning tasks. Graph inductive biases are crucial for Graph Transformers, and previous works incorporate them using message-passing modules and/or positional encodings. However, Graph Transformers that u...
Main Authors: | , , , , , , , |
---|---|
Format: | Internet publication |
Language: | English |
Published: |
2023
|
_version_ | 1826313260944588800 |
---|---|
author | Ma, L Lin, C Lim, D Romero-Soriano, A Dokania, PK Coates, M Torr, PHS Lim, S-N |
author_facet | Ma, L Lin, C Lim, D Romero-Soriano, A Dokania, PK Coates, M Torr, PHS Lim, S-N |
author_sort | Ma, L |
collection | OXFORD |
description | <p>Transformers for graph data are increasingly widely studied and successful in numerous learning tasks. Graph inductive biases are crucial for Graph Transformers, and previous works incorporate them using message-passing modules and/or positional encodings. However, Graph Transformers that use message-passing inherit known issues of message-passing, and differ significantly from Transformers used in other domains, thus making transfer of research advances more difficult. On the other hand, Graph Transformers without message-passing often perform poorly on smaller datasets, where inductive biases are more crucial. To bridge this gap, we propose the Graph Inductive bias Transformer (GRIT) -- a new Graph Transformer that incorporates graph inductive biases without using message passing. GRIT is based on several architectural changes that are each theoretically and empirically justified, including: learned relative positional encodings initialized with random walk probabilities, a flexible attention mechanism that updates node and node-pair representations, and injection of degree information in each layer. We prove that GRIT is expressive -- it can express shortest path distances and various graph propagation matrices. GRIT achieves state-of-the-art empirical performance across a variety of graph datasets, thus showing the power that Graph Transformers without message-passing can deliver.</p> |
first_indexed | 2024-09-25T04:10:17Z |
format | Internet publication |
id | oxford-uuid:485881ef-df45-41b8-893e-eb5e62f87b33 |
institution | University of Oxford |
language | English |
last_indexed | 2024-09-25T04:10:17Z |
publishDate | 2023 |
record_format | dspace |
spelling | oxford-uuid:485881ef-df45-41b8-893e-eb5e62f87b332024-06-21T10:57:55ZGraph inductive biases in transformers without message passingInternet publicationhttp://purl.org/coar/resource_type/c_7ad9uuid:485881ef-df45-41b8-893e-eb5e62f87b33EnglishSymplectic Elements2023Ma, LLin, CLim, DRomero-Soriano, ADokania, PKCoates, MTorr, PHSLim, S-N<p>Transformers for graph data are increasingly widely studied and successful in numerous learning tasks. Graph inductive biases are crucial for Graph Transformers, and previous works incorporate them using message-passing modules and/or positional encodings. However, Graph Transformers that use message-passing inherit known issues of message-passing, and differ significantly from Transformers used in other domains, thus making transfer of research advances more difficult. On the other hand, Graph Transformers without message-passing often perform poorly on smaller datasets, where inductive biases are more crucial. To bridge this gap, we propose the Graph Inductive bias Transformer (GRIT) -- a new Graph Transformer that incorporates graph inductive biases without using message passing. GRIT is based on several architectural changes that are each theoretically and empirically justified, including: learned relative positional encodings initialized with random walk probabilities, a flexible attention mechanism that updates node and node-pair representations, and injection of degree information in each layer. We prove that GRIT is expressive -- it can express shortest path distances and various graph propagation matrices. GRIT achieves state-of-the-art empirical performance across a variety of graph datasets, thus showing the power that Graph Transformers without message-passing can deliver.</p> |
spellingShingle | Ma, L Lin, C Lim, D Romero-Soriano, A Dokania, PK Coates, M Torr, PHS Lim, S-N Graph inductive biases in transformers without message passing |
title | Graph inductive biases in transformers without message passing |
title_full | Graph inductive biases in transformers without message passing |
title_fullStr | Graph inductive biases in transformers without message passing |
title_full_unstemmed | Graph inductive biases in transformers without message passing |
title_short | Graph inductive biases in transformers without message passing |
title_sort | graph inductive biases in transformers without message passing |
work_keys_str_mv | AT mal graphinductivebiasesintransformerswithoutmessagepassing AT linc graphinductivebiasesintransformerswithoutmessagepassing AT limd graphinductivebiasesintransformerswithoutmessagepassing AT romerosorianoa graphinductivebiasesintransformerswithoutmessagepassing AT dokaniapk graphinductivebiasesintransformerswithoutmessagepassing AT coatesm graphinductivebiasesintransformerswithoutmessagepassing AT torrphs graphinductivebiasesintransformerswithoutmessagepassing AT limsn graphinductivebiasesintransformerswithoutmessagepassing |