Ego Networks

Ego Networks#

Ego-centric analysis shifts the analytical lens onto a sole ego actor and concentrates on the local pattern of relations in which that ego is embedded as well as the types of resources to which those relations provide access. (Carolan, 2014, ch. 7)

The concept of ego-centric networks is pitched against sociocentric networks that we’ve been exploring in the past few weeks. Some researchers also refer to ego-centric networks as ego networks or personal networks.

Ego-centric and sociocentric networks are distinct in several important ways:

Unbounded versus bounded networks. Sociocentric SNA attempts to collect data on ties between all members of a socially or geographically-bounded group and has limited inference beyond that group. Egocentric SNA assesses individuals’ personal networks across any number of social settings using name generators, and is therefore less limited in theoretical and substantive scope.
Focus on individual rather than group outcomes. Sociocentric SNA often focuses on network structures of groups as predictors of group-level outcomes (e.g. concentration of power, resource distribution, information diffusion). In contrast, egocentric SNA is concerned with how people’s patterns of interaction shape their individual-level outcomes (e.g. health, voting behavior, employment opportunities).
Flexibility in data collection. Because sociocentric SNA must use a census of a particular bounded group as its sampling frame, data collection is very time-consuming, expensive, and targeted to a specific set of research questions. In contrast, because egocentric SNA uses individuals as cases, potential sampling frames and data collection strategies are virtually limitless. Egocentric data collection tools can easily be incorporated into large-scale or nationally-representative surveys being fielded for a variety of other purposes.

Ego-centric networks are useful when the foci of the research are individuals in a network, if capturing the complete network is less important, and/or when the researcher plan to correlate attribute data of individuals with their relational characteristics in a network. Examples of ego-centric networks’ applications abound.

Readings#

Carolan, B. V. (2014). Social network analysis and education: Theory, methods & applications (Ch. 7). SAGE Publications.
Lukács J., Á., & Dávid, B. (2023). Connecting for success: Egocentric network types among underrepresented minority students at college. Social Networks, 72, 35–43. https://doi.org/10.1016/j.socnet.2022.09.002
[Optional] Marsden, P. V. (2002). Egocentric and sociocentric measures of network centrality. Social Networks, 24(4), 407–422. https://doi.org/10.1016/S0378-8733(02)00016-3

When reading, consider this question: How ego-centric networks could be applied to your research projects? You do not necessarily need to focus on your class project but projects in your field in general.

Collecting Ego Network Data#

As discussed in the reading (Carolan, 2014), there are basically two ways to construct ego-centric networks:

Ego-centric networks by design: When a research project is initiated by asking ego-centric questions, ego-centric data are usually directed collecting. For example, when a name generator questionnaire is distributed to a sample of students in a large high school to study in-school friendship of students, each student’s response will be directly used to construct a network.
Derived ego-centric networks: When a complete network can be captured, we can also derive ego-centric networks by filtering network data. For example, if we’re analyzing our own Slack discussions, we can also create an ego-centric network for each one of us to investigate our connectedness in the class.

In either of these conditions, an important decision to make is how you define the neighborhood of the ego-centric network, or how many steps does an ego can reach. This will again be informed by theories and contextual information you bring to bear.

Consider this question: What definition of the neighborhood will make sense for your research projects?

Building ego networks from egos#

Have you ever noticed that Google would provide autocomplete suggestions when you Google something?

Below, I type ‘chatgpt vs’ and Google provides a list of suggestions.

If we consider the initial search term (e.g. ‘chatgpt’) as the ego, the list of suggested counterparts are alters. The ego, alters, and their ties for an ego network.

Using code below, we can construct such the ego network for ‘chatgpt’. Notice that we are not only interested in identifying the alters but also their inter-connections. So after getting the initial list of suggestions, we need to run the Google queries again for each alter to identify their connections.

import requests
import pandas as pd
import networkx as nx

# This is the ego
search_term = 'chatgpt'

# Build Google search query
url = f'https://suggestqueries.google.com/complete/search?&client=firefox&gl=us&hl=en&q={search_term}%20vs%20'
response = requests.get(url)

# Make the query
if response.status_code == 200:
    data = response.json()
    # Now 'data' contains the JSON data from the URL
    print(data)
else:
    print('Failed to retrieve data:', response.status_code)

['chatgpt vs ', ['chatgpt vs deepseek', 'chatgpt vs gemini', 'chatgpt vs copilot', 'chatgpt vs grok', 'chatgpt vs claude', 'chatgpt vs openai', 'chatgpt vs chatgpt plus', 'chatgpt vs gemini vs copilot', 'chatgpt vs claude vs gemini', 'chatgpt vs grok 3'], [], {'google:suggestsubtypes': [[512, 433], [512, 433], [512, 433, 131], [512, 433, 131], [512, 433], [512, 433, 131], [512], [512], [512], [512, 433, 131]]}]

# Extract a list of suggestions
initial_suggestions = [element.replace('chatgpt vs ', '') for element in data[1]]
# Remove combo items such as 'gemini vs copilot'
initial_suggestions = [element for element in initial_suggestions if 'vs' not in element]
# print(initial_suggestions)

# Create an edge list data frame
df_edge_list = pd.DataFrame({'source': ['chatgpt'] * len(initial_suggestions), 'target': initial_suggestions})

print(df_edge_list)

    source        target
chatgpt      deepseek
chatgpt        gemini
chatgpt       copilot
chatgpt          grok
chatgpt        claude
chatgpt        openai
chatgpt  chatgpt plus
chatgpt        grok 3

# Now iterate on the list of alters, and identify their own alters
# When doing so, iteratively add edge lists to the main edge list
for term in initial_suggestions:
    if 'vs' in term: # skip alters that have two parts (such as gemini vs copilot)
        continue
    
    print(f'querying {term}')
    url = f"https://suggestqueries.google.com/complete/search?&client=firefox&gl=us&hl=en&q={term}%20vs%20"
    response = requests.get(url)

    if response.status_code == 200:
        data = response.json()
    else:
        print('Failed to retrieve data:', response.status_code)
    
    suggestions = [element.replace(f'{term} vs ', '') for element in data[1]]
    suggestions = [element for element in suggestions if 'vs' not in element]

    # Create a DataFrame
    df_tmp = pd.DataFrame({'source': [term] * len(suggestions), 'target': suggestions})

    # Concatenate the current DataFrame with the final DataFrame
    df_edge_list = pd.concat([df_edge_list, df_tmp], ignore_index=True)

# Let's check out the final edge list
df_edge_list

querying deepseek
querying gemini

querying copilot
querying grok

querying claude
querying openai

querying chatgpt plus
querying grok 3

	source	target
0	chatgpt	deepseek
1	chatgpt	gemini
2	chatgpt	copilot
3	chatgpt	grok
4	chatgpt	claude
...	...	...
77	grok 3	gpt 4
78	grok 3	claude 3.7
79	grok 3	chatgpt reddit
80	grok 3	o3
81	grok 3	chatgpt comparison

82 rows × 2 columns

# Create a network object
G = nx.from_pandas_edgelist(df_edge_list)

# We can visualize this network, which has more than the ego network we want
nx.draw_networkx(G)

_images/a2d2c38f7e0a531b1e98e5cd014199ee3c15b9a73d94ace40ab3b9c1eaf143e7.png

# Extract an ego network for chatgpt
EG = nx.ego_graph(G, 'chatgpt', distance = 'distance', radius = 1)

# Visualize the ego network
nx.draw_networkx(EG)

# Run network analysis algorithms on the ego network
nx.density(EG)

EG.size()

_images/2b75233fd45dda3885a4a110eb941aad25989da98b69faceb8cb6d154938575a.png

If you’d like to build an ego network of another term – say Gemini – you can start from the ego following the same approach to cover alters and ties in the neighborhood you intend to cover.

Extracting Ego Networks from Larger Networks#

Another approach is to extract ego networks from a much larger network.

Below, let’s consider an open dataset about Primary school dynamic contacts. Here is the description of the dataset:

Two temporal networks of contacts among students and teachers at a primary school in Lyon, France, on consecutive days of in October 2009. Each network accumulates all contacts over the course of a single day; contacts were sampled at 20-second intervals.

Using code below, we first read the network data from the website, construct the whole network, and then derive ego networks of interested egos.

import zipfile
import io

# This URL is from the dataset's webpage: https://networks.skewed.de/net/sp_primary_school
zip_url = 'https://networks.skewed.de/net/sp_primary_school/files/sp_primary_school.csv.zip'
sub_dir = 'data/sp_primary_school' # where you'd like to extract files
edges_file = 'edges.csv' # name of edges file
nodes_file = 'nodes.csv' # name of nodes file

# Download the zip file
response = requests.get(zip_url)
if response.status_code == 200:
    # Extract the zip file
    with zipfile.ZipFile(io.BytesIO(response.content)) as zip_ref:
        zip_ref.extractall(sub_dir)
    print('Extraction complete.')
else:
    print('Failed to download the zip file:', response.status_code)

Extraction complete.

import os

nodes = pd.read_csv(os.path.join(sub_dir, nodes_file), sep=',')
# nodes = nodes.rename(columns={'class': 'group', '# index': 'index'})
nodes = nodes.set_axis(['ind', 'raw_id', 'group', 'gender', 'pos'], axis='columns') # rename the columns
nodes

	ind	raw_id	group	gender	pos
0	0	1558	3B	M	array([-1.74555536, -3.28500536])
1	1	1567	3B	M	array([-1.72030544, -3.26474249])
2	2	1560	3B	F	array([-1.71955628, -3.25207655])
3	3	1570	3B	F	array([-1.67965811, -3.2940906 ])
4	4	1574	3B	F	array([-1.7003117 , -3.28849509])
...	...	...	...	...	...
237	237	1750	5B	M	array([-1.80754116, -3.07109776])
238	238	1715	2B	F	array([-1.91644382, -3.36835294])
239	239	1744	3B	M	array([-1.67574526, -3.31140817])
240	240	1799	1A	Unknown	array([-1.75033375, -3.37851526])
241	241	1647	2A	M	array([-1.77879562, -3.38107451])

242 rows × 5 columns

edges = pd.read_csv(os.path.join(sub_dir, edges_file), sep=',')
# edges = edges.rename(columns={'# source': 'source'})
edges = edges.set_axis(['source', 'target', 'time'], axis='columns') # rename the columns
edges

	source	target	time
0	0	1	31220
1	0	1	31240
2	0	16	31260
3	0	1	31260
4	0	16	31280
...	...	...	...
125768	241	112	148040
125769	241	112	148060
125770	241	112	148080
125771	241	112	148100
125772	241	112	148120

125773 rows × 3 columns

Construct the full network with nodes and edges#

G_sp = nx.Graph()

# Add nodes -- let's add each node on by one and store 'position' as a node attribute
for _, row in nodes.iterrows():
    G_sp.add_node(row['ind'], group=row['group'], gender=row['gender'], position = row['pos'])

# Add edges with attributes from the DataFrame to the graph
for _, row in edges.iterrows():
    G_sp.add_edge(row['source'], row['target'], time = row['time'])

import matplotlib.pyplot as plt
import matplotlib.cm as cm

# Get unique group names
groups = set(nx.get_node_attributes(G_sp, 'group').values())

# Generate a color map
cmap = cm.get_cmap('tab10', len(groups))

# Create a color mapping based on group names
node_colors = [cmap(list(groups).index(G_sp.nodes[node]['group'])) for node in G_sp.nodes()]

# Draw the graph
pos = nx.spring_layout(G_sp)
nx.draw(G_sp, pos, node_color=node_colors, with_labels=True)

# Display the graph
plt.show()

/var/folders/43/_xbkk7kn1yq72t5llzy_kjz40000gq/T/ipykernel_8374/3968682777.py:8: MatplotlibDeprecationWarning: The get_cmap function was deprecated in Matplotlib 3.7 and will be removed two minor releases later. Use ``matplotlib.colormaps[name]`` or ``matplotlib.colormaps.get_cmap(obj)`` instead.
  cmap = cm.get_cmap('tab10', len(groups))

_images/aa87e80705398abc1953fd6af6a6c911c5647354c79f60fc412d9e24e4d1263a.png

Let’s focus on ego 76#

First, set radius to 1.

# Extract an ego network for chatgpt
EG = nx.ego_graph(G_sp, 76, radius = 1)

# Visualize the ego network
node_colors = [cmap(list(groups).index(EG.nodes[node]['group'])) for node in EG.nodes()]
nx.draw_networkx(EG, node_color=node_colors, with_labels=True)

_images/c54bd92a87d7ae6a471245d4a293dc0c69c4fc81897d9b3355bb3172bc89d6f7.png

We can also set radius to 2 to reach a larger neighhood of the ego.

# Extract an ego network for chatgpt
EG = nx.ego_graph(G_sp, 76, radius = 2)
# Visualize the ego network
node_colors = [cmap(list(groups).index(EG.nodes[node]['group'])) for node in EG.nodes()]
nx.draw_networkx(EG, node_color=node_colors, with_labels=True)

_images/55f441171c4b38d403680a6bcbe0098b30d542dfc2ab0c718cc66d62dc3cc86b.png

Because the sample network is dense, the ego network gets large quickly when increasing the radius.

Please feel free to adapt the code samples to your own projects.