# How Attentive are Graph Attention Networks?

@article{Brody2021HowAA, title={How Attentive are Graph Attention Networks?}, author={Shaked Brody and Uri Alon and Eran Yahav}, journal={ArXiv}, year={2021}, volume={abs/2105.14491} }

Graph Attention Networks (GATs) are one of the most popular GNN architectures and are considered as the state-of-the-art architecture for representation learning with graphs. In GAT, every node attends to its neighbors given its own representation as the query. However, in this paper we show that GATs can only compute a restricted kind of attention where the ranking of attended nodes is unconditioned on the query node. We formally define this restricted kind of attention as static attention and… Expand

#### Figures and Tables from this paper

#### 5 Citations

A-GHN: Attention-based Fusion of Multiple GraphHeat Networks for Structural to Functional Brain Mapping

- Biology
- 2021

It is argued that the proposed deep learning method overcomes the scalability and computational inefficiency issues but can still learn the SC-FC mapping successfully, and experiments demonstrate that A-GHN outperforms the existing methods in learning the complex nature of human brain function. Expand

Learning Dynamic Graph Representation of Brain Connectome with Spatio-Temporal Attention

- Computer Science, Biology
- ArXiv
- 2021

STAGIN is proposed, a method for learning dynamic graph representation of the brain connectome with spatio-temporal attention that combines novel READOUT functions and the Transformer encoder to provide spatial and temporal explainability with attention. Expand

Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction

- Computer Science
- ArXiv
- 2021

A novel Graph2SMILES model is described that combines the power of Transformer models for text generation with the permutation invariance of molecular graph encoders that mitigates the need for input data augmentation. Expand

Enel: Context-Aware Dynamic Scaling of Distributed Dataflow Jobs using Graph Propagation

- Computer Science
- ArXiv
- 2021

This paper presents Enel, a novel dynamic scaling approach that uses message propagation on an attributed graph to model dataflow jobs and, thus, allows for deriving effective rescaling decisions and can be reused across different execution contexts. Expand

FDGATII : Fast Dynamic Graph Attention with Initial Residual and Identity Mapping

- Computer Science
- 2021

While Graph Neural Networks have gained popularity in multiple domains, graph-structured input remains a major challenge due to (a) oversmoothing, (b) noisy neighbours (heterophily), and (c) the… Expand

#### References

SHOWING 1-10 OF 73 REFERENCES

Improving Graph Attention Networks with Large Margin-based Constraints

- Computer Science, Mathematics
- ArXiv
- 2019

This work first theoretically demonstrate the over-smoothing behavior of GATs and then develops an approach using constraint on the attention weights according to the class boundary and feature aggregation pattern, which leads to significant improvements over the previous state-of-the-art graph attention methods on all datasets. Expand

How to Find Your Friendly Neighborhood: Graph Attention Design with Self-Supervision

- Computer Science
- ICLR
- 2021

A self-supervised graph attention network (SuperGAT) is proposed, an improved graph attention model for noisy graphs that exploits two attention forms compatible with a self- supervised task to predict edges, whose presence and absence contain the inherent information about the importance of the relationships between nodes. Expand

Graph Attention Networks

- Mathematics, Computer Science
- ICLR
- 2018

We present graph attention networks (GATs), novel neural network architectures that operate on graph-structured data, leveraging masked self-attentional layers to address the shortcomings of prior… Expand

Attention-based Graph Neural Network for Semi-supervised Learning

- Computer Science, Mathematics
- ArXiv
- 2018

A novel graph neural network is proposed that removes all the intermediate fully-connected layers, and replaces the propagation layers with attention mechanisms that respect the structure of the graph, and demonstrates that this approach outperforms competing methods on benchmark citation networks datasets. Expand

Graph Representation Learning via Hard and Channel-Wise Attention Networks

- Mathematics, Computer Science
- KDD
- 2019

Compared to GAO, hGAO improves performance and saves computational cost by only attending to important nodes, and Efficiency comparison shows that the cGAO leads to dramatic savings in computational resources, making them applicable to large graphs. Expand

Learning Attention-based Embeddings for Relation Prediction in Knowledge Graphs

- Computer Science, Mathematics
- ACL
- 2019

This paper proposes a novel attention-based feature embedding that captures both entity and relation features in any given entity’s neighborhood and encapsulate relation clusters and multi-hop relations in the model. Expand

GaAN: Gated Attention Networks for Learning on Large and Spatiotemporal Graphs

- Computer Science, Mathematics
- UAI
- 2018

The effectiveness of GaAN on the inductive node classification problem is demonstrated, and the Graph Gated Recurrent Unit (GGRU) is constructed with GaAN as a building block to address the traffic speed forecasting problem. Expand

How Powerful are Graph Neural Networks?

- Computer Science, Mathematics
- ICLR
- 2019

This work characterize the discriminative power of popular GNN variants, such as Graph Convolutional Networks and GraphSAGE, and show that they cannot learn to distinguish certain simple graph structures, and develops a simple architecture that is provably the most expressive among the class of GNNs. Expand

PairNorm: Tackling Oversmoothing in GNNs

- Computer Science, Mathematics
- ICLR
- 2020

PairNorm is a novel normalization layer that is based on a careful analysis of the graph convolution operator, which prevents all node embeddings from becoming too similar and significantly boosts performance for a new problem setting that benefits from deeper GNNs. Expand

DropEdge: Towards Deep Graph Convolutional Networks on Node Classification

- Computer Science
- ICLR
- 2020

DropEdge is a general skill that can be equipped with many other backbone models (e.g. GCN, ResGCN, GraphSAGE, and JKNet) for enhanced performance and consistently improves the performance on a variety of both shallow and deep GCNs. Expand