Centrality in Graph Theory


Introduction

For a given node in a social network, centrality aims to propose methods that represent the importance of graph vertices in terms of a number. Have you ever had that group of friends you’re only friends with because of one person (and also everyone else seems to be in the group because of this one person)? That person would have high centrality.

How to Measure?

Let’s assume we have access to an undirected graph \(G\) defined as a dictionary mapping vertices to other vertices in the graph, i.e., \(G := \{A : (B, C), B : (A, D, C), ...\}\).

One simple measure of centrality is simply the degree of the node. This is the number of edges leading out of the node. Node \(A\) has degree 2 in \(G\) for instance. Another metric for centrality in a webpage ranking setting could be the PageRank of the node.

Closeness Centrality

A potentially useful way to measure the centrality of a node is the closeness centrality. We can define closeness centrality as,

$$ \textrm{Centrality}(v) = \frac{1}{\textrm{Distance to other vertices}} $$

Continuing with the example from the introduction, the person who “holds the group together” would be close to many people and have high closeness centrality. To operationalize this formula, we could do something like run breadth first search to compute the pairs distances between \(v\) and every other node and take the average (or some other statistic). Where \(m\) is the number of edges and \(n\) is the number of vertices, we could run this algorithm in \(O(mn)\) because we could simply run BFS once.

Betweeness Centrality

A potential issue with Closeness Centrality is that it will assign high centrality to vertices that are close to other vertices but only through a single vertex. Let’s say I’m connected to many people in a group but only through one friend. In some sense, I’m less critical to the network, though I’m closely connected to many people.

We can capture this notion using betweeness centrality. Let’s consider all pairs of vertices in the graph, \(u,w\). If the shortest path between all pairs of vertices goes through \(v\) often, the vertex \(v\) is more important. We express between centrality as follows,

$$\textrm{centrality}(v) = \sum_{u,w \neq v} \frac{\textrm{Number shortest paths from }u\textrm{ to }w \textrm{ through }v}{\textrm{Number shortest paths from } u \textrm{ to } w }$$

A simple (but non-optimal) way to compute this metric is to run Floyd-Warshall generate all the shortest paths between vertices on the graph. This algorithm runs in \(O(n^3)\). An algorithm than runs in \(O(mn)\) was proposed in Brandes 2001.

Takeaways

Depending on the goal, there are many possible ways to express the importance of graph vertices.

Related Posts

Convexity

Short discussion on convexity

Paper discussion, "You are grounded!" Latent Name Artifacts in Pre-trained Language Models

TL;DR Pre-trained language models learn artifacts in the training data associated with names and pass this on to downstream tasks.