Graph inductive biases in transformers without message passing

Transformers for graph data are increasingly widely studied and successful in numerous learning tasks. Graph inductive biases are crucial for Graph Transformers, and previous works incorporate them using message-passing modules and/or positional encodings. However, Graph Transformers that use messag...

Full description

Bibliographic Details
Main Authors:	Ma, L, Lin, C, Lim, D, Romero-Soriano, A, Dokania, PK, Coates, M, Torr, PHS, Lim, S-N
Format:	Conference item
Language:	English
Published:	Proceedings of Machine Learning Research 2023

_version_	1826313302772285440
author	Ma, L Lin, C Lim, D Romero-Soriano, A Dokania, PK Coates, M Torr, PHS Lim, S-N
author_facet	Ma, L Lin, C Lim, D Romero-Soriano, A Dokania, PK Coates, M Torr, PHS Lim, S-N
author_sort	Ma, L
collection	OXFORD
description	Transformers for graph data are increasingly widely studied and successful in numerous learning tasks. Graph inductive biases are crucial for Graph Transformers, and previous works incorporate them using message-passing modules and/or positional encodings. However, Graph Transformers that use message-passing inherit known issues of message-passing, and differ significantly from Transformers used in other domains, thus making transfer of research advances more difficult. On the other hand, Graph Transformers without message-passing often perform poorly on smaller datasets, where inductive biases are more important. To bridge this gap, we propose the Graph Inductive bias Transformer (GRIT) — a new Graph Transformer that incorporates graph inductive biases without using message passing. GRIT is based on several architectural changes that are each theoretically and empirically justified, including: learned relative positional encodings initialized with random walk probabilities, a flexible attention mechanism that updates node and node-pair representations, and injection of degree information in each layer. We prove that GRIT is expressive — it can express shortest path distances and various graph propagation matrices. GRIT achieves state-of-the-art empirical performance across a variety of graph datasets, thus showing the power that Graph Transformers without message-passing can deliver.
first_indexed	2024-03-07T08:11:24Z
format	Conference item
id	oxford-uuid:a4632517-439c-411a-aad4-ec5279127d64
institution	University of Oxford
language	English
last_indexed	2024-09-25T04:10:54Z
publishDate	2023
publisher	Proceedings of Machine Learning Research
record_format	dspace
spelling	oxford-uuid:a4632517-439c-411a-aad4-ec5279127d642024-06-21T11:05:17ZGraph inductive biases in transformers without message passingConference itemhttp://purl.org/coar/resource_type/c_5794uuid:a4632517-439c-411a-aad4-ec5279127d64EnglishSymplectic ElementsProceedings of Machine Learning Research2023Ma, LLin, CLim, DRomero-Soriano, ADokania, PKCoates, MTorr, PHSLim, S-NTransformers for graph data are increasingly widely studied and successful in numerous learning tasks. Graph inductive biases are crucial for Graph Transformers, and previous works incorporate them using message-passing modules and/or positional encodings. However, Graph Transformers that use message-passing inherit known issues of message-passing, and differ significantly from Transformers used in other domains, thus making transfer of research advances more difficult. On the other hand, Graph Transformers without message-passing often perform poorly on smaller datasets, where inductive biases are more important. To bridge this gap, we propose the Graph Inductive bias Transformer (GRIT) — a new Graph Transformer that incorporates graph inductive biases without using message passing. GRIT is based on several architectural changes that are each theoretically and empirically justified, including: learned relative positional encodings initialized with random walk probabilities, a flexible attention mechanism that updates node and node-pair representations, and injection of degree information in each layer. We prove that GRIT is expressive — it can express shortest path distances and various graph propagation matrices. GRIT achieves state-of-the-art empirical performance across a variety of graph datasets, thus showing the power that Graph Transformers without message-passing can deliver.
spellingShingle	Ma, L Lin, C Lim, D Romero-Soriano, A Dokania, PK Coates, M Torr, PHS Lim, S-N Graph inductive biases in transformers without message passing
title	Graph inductive biases in transformers without message passing
title_full	Graph inductive biases in transformers without message passing
title_fullStr	Graph inductive biases in transformers without message passing
title_full_unstemmed	Graph inductive biases in transformers without message passing
title_short	Graph inductive biases in transformers without message passing
title_sort	graph inductive biases in transformers without message passing
work_keys_str_mv	AT mal graphinductivebiasesintransformerswithoutmessagepassing AT linc graphinductivebiasesintransformerswithoutmessagepassing AT limd graphinductivebiasesintransformerswithoutmessagepassing AT romerosorianoa graphinductivebiasesintransformerswithoutmessagepassing AT dokaniapk graphinductivebiasesintransformerswithoutmessagepassing AT coatesm graphinductivebiasesintransformerswithoutmessagepassing AT torrphs graphinductivebiasesintransformerswithoutmessagepassing AT limsn graphinductivebiasesintransformerswithoutmessagepassing

Graph inductive biases in transformers without message passing

Similar Items