Graph inductive biases in transformers without message passing

<p>Transformers for graph data are increasingly widely studied and successful in numerous learning tasks. Graph inductive biases are crucial for Graph Transformers, and previous works incorporate them using message-passing modules and/or positional encodings. However, Graph Transformers that u...

Full description

Bibliographic Details
Main Authors: Ma, L, Lin, C, Lim, D, Romero-Soriano, A, Dokania, PK, Coates, M, Torr, PHS, Lim, S-N
Format: Internet publication
Language:English
Published: 2023
_version_ 1826313260944588800
author Ma, L
Lin, C
Lim, D
Romero-Soriano, A
Dokania, PK
Coates, M
Torr, PHS
Lim, S-N
author_facet Ma, L
Lin, C
Lim, D
Romero-Soriano, A
Dokania, PK
Coates, M
Torr, PHS
Lim, S-N
author_sort Ma, L
collection OXFORD
description <p>Transformers for graph data are increasingly widely studied and successful in numerous learning tasks. Graph inductive biases are crucial for Graph Transformers, and previous works incorporate them using message-passing modules and/or positional encodings. However, Graph Transformers that use message-passing inherit known issues of message-passing, and differ significantly from Transformers used in other domains, thus making transfer of research advances more difficult. On the other hand, Graph Transformers without message-passing often perform poorly on smaller datasets, where inductive biases are more crucial. To bridge this gap, we propose the Graph Inductive bias Transformer (GRIT) -- a new Graph Transformer that incorporates graph inductive biases without using message passing. GRIT is based on several architectural changes that are each theoretically and empirically justified, including: learned relative positional encodings initialized with random walk probabilities, a flexible attention mechanism that updates node and node-pair representations, and injection of degree information in each layer. We prove that GRIT is expressive -- it can express shortest path distances and various graph propagation matrices. GRIT achieves state-of-the-art empirical performance across a variety of graph datasets, thus showing the power that Graph Transformers without message-passing can deliver.</p>
first_indexed 2024-09-25T04:10:17Z
format Internet publication
id oxford-uuid:485881ef-df45-41b8-893e-eb5e62f87b33
institution University of Oxford
language English
last_indexed 2024-09-25T04:10:17Z
publishDate 2023
record_format dspace
spelling oxford-uuid:485881ef-df45-41b8-893e-eb5e62f87b332024-06-21T10:57:55ZGraph inductive biases in transformers without message passingInternet publicationhttp://purl.org/coar/resource_type/c_7ad9uuid:485881ef-df45-41b8-893e-eb5e62f87b33EnglishSymplectic Elements2023Ma, LLin, CLim, DRomero-Soriano, ADokania, PKCoates, MTorr, PHSLim, S-N<p>Transformers for graph data are increasingly widely studied and successful in numerous learning tasks. Graph inductive biases are crucial for Graph Transformers, and previous works incorporate them using message-passing modules and/or positional encodings. However, Graph Transformers that use message-passing inherit known issues of message-passing, and differ significantly from Transformers used in other domains, thus making transfer of research advances more difficult. On the other hand, Graph Transformers without message-passing often perform poorly on smaller datasets, where inductive biases are more crucial. To bridge this gap, we propose the Graph Inductive bias Transformer (GRIT) -- a new Graph Transformer that incorporates graph inductive biases without using message passing. GRIT is based on several architectural changes that are each theoretically and empirically justified, including: learned relative positional encodings initialized with random walk probabilities, a flexible attention mechanism that updates node and node-pair representations, and injection of degree information in each layer. We prove that GRIT is expressive -- it can express shortest path distances and various graph propagation matrices. GRIT achieves state-of-the-art empirical performance across a variety of graph datasets, thus showing the power that Graph Transformers without message-passing can deliver.</p>
spellingShingle Ma, L
Lin, C
Lim, D
Romero-Soriano, A
Dokania, PK
Coates, M
Torr, PHS
Lim, S-N
Graph inductive biases in transformers without message passing
title Graph inductive biases in transformers without message passing
title_full Graph inductive biases in transformers without message passing
title_fullStr Graph inductive biases in transformers without message passing
title_full_unstemmed Graph inductive biases in transformers without message passing
title_short Graph inductive biases in transformers without message passing
title_sort graph inductive biases in transformers without message passing
work_keys_str_mv AT mal graphinductivebiasesintransformerswithoutmessagepassing
AT linc graphinductivebiasesintransformerswithoutmessagepassing
AT limd graphinductivebiasesintransformerswithoutmessagepassing
AT romerosorianoa graphinductivebiasesintransformerswithoutmessagepassing
AT dokaniapk graphinductivebiasesintransformerswithoutmessagepassing
AT coatesm graphinductivebiasesintransformerswithoutmessagepassing
AT torrphs graphinductivebiasesintransformerswithoutmessagepassing
AT limsn graphinductivebiasesintransformerswithoutmessagepassing