Graph inductive biases in transformers without message passing

<p>Transformers for graph data are increasingly widely studied and successful in numerous learning tasks. Graph inductive biases are crucial for Graph Transformers, and previous works incorporate them using message-passing modules and/or positional encodings. However, Graph Transformers that u...

Full description

Bibliographic Details
Main Authors:	Ma, L, Lin, C, Lim, D, Romero-Soriano, A, Dokania, PK, Coates, M, Torr, PHS, Lim, S-N
Format:	Internet publication
Language:	English
Published:	2023

_version_	1826313260944588800
author	Ma, L Lin, C Lim, D Romero-Soriano, A Dokania, PK Coates, M Torr, PHS Lim, S-N
author_facet	Ma, L Lin, C Lim, D Romero-Soriano, A Dokania, PK Coates, M Torr, PHS Lim, S-N
author_sort	Ma, L
collection	OXFORD
description	<p>Transformers for graph data are increasingly widely studied and successful in numerous learning tasks. Graph inductive biases are crucial for Graph Transformers, and previous works incorporate them using message-passing modules and/or positional encodings. However, Graph Transformers that use message-passing inherit known issues of message-passing, and differ significantly from Transformers used in other domains, thus making transfer of research advances more difficult. On the other hand, Graph Transformers without message-passing often perform poorly on smaller datasets, where inductive biases are more crucial. To bridge this gap, we propose the Graph Inductive bias Transformer (GRIT) -- a new Graph Transformer that incorporates graph inductive biases without using message passing. GRIT is based on several architectural changes that are each theoretically and empirically justified, including: learned relative positional encodings initialized with random walk probabilities, a flexible attention mechanism that updates node and node-pair representations, and injection of degree information in each layer. We prove that GRIT is expressive -- it can express shortest path distances and various graph propagation matrices. GRIT achieves state-of-the-art empirical performance across a variety of graph datasets, thus showing the power that Graph Transformers without message-passing can deliver.</p>
first_indexed	2024-09-25T04:10:17Z
format	Internet publication
id	oxford-uuid:485881ef-df45-41b8-893e-eb5e62f87b33
institution	University of Oxford
language	English
last_indexed	2024-09-25T04:10:17Z
publishDate	2023
record_format	dspace
spelling	oxford-uuid:485881ef-df45-41b8-893e-eb5e62f87b332024-06-21T10:57:55ZGraph inductive biases in transformers without message passingInternet publicationhttp://purl.org/coar/resource_type/c_7ad9uuid:485881ef-df45-41b8-893e-eb5e62f87b33EnglishSymplectic Elements2023Ma, LLin, CLim, DRomero-Soriano, ADokania, PKCoates, MTorr, PHSLim, S-N<p>Transformers for graph data are increasingly widely studied and successful in numerous learning tasks. Graph inductive biases are crucial for Graph Transformers, and previous works incorporate them using message-passing modules and/or positional encodings. However, Graph Transformers that use message-passing inherit known issues of message-passing, and differ significantly from Transformers used in other domains, thus making transfer of research advances more difficult. On the other hand, Graph Transformers without message-passing often perform poorly on smaller datasets, where inductive biases are more crucial. To bridge this gap, we propose the Graph Inductive bias Transformer (GRIT) -- a new Graph Transformer that incorporates graph inductive biases without using message passing. GRIT is based on several architectural changes that are each theoretically and empirically justified, including: learned relative positional encodings initialized with random walk probabilities, a flexible attention mechanism that updates node and node-pair representations, and injection of degree information in each layer. We prove that GRIT is expressive -- it can express shortest path distances and various graph propagation matrices. GRIT achieves state-of-the-art empirical performance across a variety of graph datasets, thus showing the power that Graph Transformers without message-passing can deliver.</p>
spellingShingle	Ma, L Lin, C Lim, D Romero-Soriano, A Dokania, PK Coates, M Torr, PHS Lim, S-N Graph inductive biases in transformers without message passing
title	Graph inductive biases in transformers without message passing
title_full	Graph inductive biases in transformers without message passing
title_fullStr	Graph inductive biases in transformers without message passing
title_full_unstemmed	Graph inductive biases in transformers without message passing
title_short	Graph inductive biases in transformers without message passing
title_sort	graph inductive biases in transformers without message passing
work_keys_str_mv	AT mal graphinductivebiasesintransformerswithoutmessagepassing AT linc graphinductivebiasesintransformerswithoutmessagepassing AT limd graphinductivebiasesintransformerswithoutmessagepassing AT romerosorianoa graphinductivebiasesintransformerswithoutmessagepassing AT dokaniapk graphinductivebiasesintransformerswithoutmessagepassing AT coatesm graphinductivebiasesintransformerswithoutmessagepassing AT torrphs graphinductivebiasesintransformerswithoutmessagepassing AT limsn graphinductivebiasesintransformerswithoutmessagepassing

Graph inductive biases in transformers without message passing

Similar Items