Positions and Influence#

Readings#

In this part, you’ll explore ways of measuring the importance or centrality of a node in a network, using measures such as Degree, Closeness, and Betweenness centrality, and Page Rank. You’ll learn about the assumptions each measure makes, the algorithms we can use to compute them, and the different functions available on NetworkX to measure centrality.

Node Importance#

Social network analysis is usually interested in answering this question: Based on the structure of the network, which are the most important nodes?

SNA offers different ways of thinking about “importance”.

Let’s load the Karate Club network from NetworkX to explore different measures of node importance.

import networkx as nx
import matplotlib.pyplot as plt

# G = nx.karate_club_graph()
G = nx.krackhardt_kite_graph()

nx.draw_networkx(G)
_images/a657d02b11eb37f79b2320d301e3182c1bf1d1f48004c047ffeca9277ca14690.png

Centrality#

Centrality measures identify the most important nodes in a network:

  • Influential nodes in a social network

  • Nodes that disseminate information to many nodes or prevent epidemics

  • Hubs in a transportation network

  • Important pages on the Web

  • Nodes that prevent the network from breaking up

Degree centrality#

Assumption: important nodes have many connections.

Degree: number of neighbors.

Undirected networks: use degree

Directed networks: use in-degree or out-degree

Degree centrality: controls for the size of the network

nx.degree(G)
DegreeView({0: 4, 1: 4, 2: 3, 3: 6, 4: 3, 5: 5, 6: 5, 7: 3, 8: 2, 9: 1})

The degree centrality of node 0 would be: 16 / (34-1) = 0.48

A faster way to calculate degree centrality of all nodes is to use the degree_centrality function:

degree_cen = nx.degree_centrality(G)
degree_cen[0]
0.4444444444444444

For directed networks, in-degree and out-degree are differentiated.

Use in_degree_centrality and out_degree_centrality instead.

Closeness Centrality#

Assumption: important nodes are close to other nodes.

close_cen = nx.closeness_centrality(G)
close_cen[0]
0.5294117647058824
sum(nx.shortest_path_length(G, 0).values())
17
(len(G.nodes())-1) / sum(nx.shortest_path_length(G, 0).values())
0.5294117647058824

In a triangle, the closeness centrality of each node is 1, because each node is one hop away from the other two nodes.

In a square, the closeness centrality of each node is 0.75, because each node is one hop away from two other nodes and two hops away from the fourth node. The closeness centrality = (4 - 1) / (1 + 1 + 2) = 3/4.

square = nx.cycle_graph(4)
nx.draw_spring(square)
print(nx.closeness_centrality(square))
{0: 0.75, 1: 0.75, 2: 0.75, 3: 0.75}
_images/76a2e1fddd3885b10d1c7471cf0c6750525cbdab52e231f9461321201cd9e1f5.png

Betweenness Centrality#

Assumption: important nodes connect other nodes.

Betweeness centrality for node N is computed as:

  • The percent of cases where

  • For each pair of nodes M and P (which are not N)

  • The shortest path from M to P passes through N

Normalization: betwenness centrality values will be larger in graphs with many nodes. To control for this, we divide centrality values by the number of pairs of nodes in the graph (excluding the considered node).

btwn_cen = nx.betweenness_centrality(G, normalized = True)
btwn_cen[0]
0.023148148148148143

We can identify the top 5 nodes with the highest betweenness centrality:

import operator
sorted(btwn_cen.items(), key=operator.itemgetter(1), reverse = True)[0:5]
[(7, 0.38888888888888884),
 (5, 0.23148148148148148),
 (6, 0.23148148148148148),
 (8, 0.2222222222222222),
 (3, 0.10185185185185183)]

Eigenvector Centrality#

Complex math, but assigns centrality to nodes through recursive process where

  • More and stronger connections are positive

  • Connections to nodes with higher eigenvector centrality contribute more than connections to nodes with lower eigenvector centrality

A key part of the original PageRank in Google

eigen_cen = nx.eigenvector_centrality(G)
sorted(eigen_cen.items(), key=operator.itemgetter(1), reverse = True)[0:5]
[(3, 0.48102048812210046),
 (5, 0.3976910106255469),
 (6, 0.3976910106255469),
 (0, 0.35220898139203594),
 (1, 0.35220898139203594)]

Here, we see node 33 has higher eigenvector centrality than node 0, which has higher betweenness centrality. One reason is node 0 is connected to a bunch of nodes that are not as “central” as nodes connected with node 33.

Extract Node-level Features from the Network#

These centality measures can be meaningful for different purposes.

They can be used as features for other analyses.

It is trivial to compute a range of node-level features and put them in a data frame.

import pandas as pd

df = pd.DataFrame(index = G.nodes)
df['club'] = pd.Series(nx.get_node_attributes(G, 'club')) # extract node attribute 'club' in the network
/var/folders/ws/bwvb7bg1107311r6961dd1y80000gp/T/ipykernel_53355/1921708752.py:4: FutureWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
  df['club'] = pd.Series(nx.get_node_attributes(G, 'club')) # extract node attribute 'club' in the network
df['degree'] = pd.Series(nx.degree(G))
df['degree_centrality'] = pd.Series(nx.degree_centrality(G))
df['closeness_centrality'] = pd.Series(nx.closeness_centrality(G))
df['betweenness_centrality'] = pd.Series(nx.betweenness_centrality(G, normalized = True))
df['eigenvector_centrality'] = pd.Series(nx.eigenvector_centrality(G))

df
club degree degree_centrality closeness_centrality betweenness_centrality eigenvector_centrality
0 NaN (0, 4) 0.444444 0.529412 0.023148 0.352209
1 NaN (1, 4) 0.444444 0.529412 0.023148 0.352209
2 NaN (2, 3) 0.333333 0.500000 0.000000 0.285835
3 NaN (3, 6) 0.666667 0.600000 0.101852 0.481020
4 NaN (4, 3) 0.333333 0.500000 0.000000 0.285835
5 NaN (5, 5) 0.555556 0.642857 0.231481 0.397691
6 NaN (6, 5) 0.555556 0.642857 0.231481 0.397691
7 NaN (7, 3) 0.333333 0.600000 0.388889 0.195862
8 NaN (8, 2) 0.222222 0.428571 0.222222 0.048075
9 NaN (9, 1) 0.111111 0.310345 0.000000 0.011164

Summary#

Social network analysis provides a range of measures to depict the importance of a node.

The interpretation of each measure depends on the particular context in which it’s applied.

Similar to measures of network connectivity, these node importance measures could also be further enrichsed by considering direction, weight, and temporality. Please consult additional resources if interested.