{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Data Hands-on\n", "\n", "This Jupyter Notebook demonstrate a few examples of reading and preparing data for network analysis. \n", "\n", "Python code is provided below to help you get started but you may need to make revisions to make it work for the particular dataset you work with. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Read data and preparing data for NetworkX\n", "\n", "In NetworkX, you can read network data from different formats, such as `adjascency list`, `adjascency matrix`, `edge list`, etc. \n", "\n", "In order to read data into NetworkX, you need to first transform your data into a proper format that is acceptable for NetworkX. This process applies to any other SNA software packages. \n", "\n", "See [this NetworkX reference page](https://networkx.org/documentation/stable/reference/readwrite/index.html) for details. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example 1: Reading Well-formatted Network Data from CSV\n", "\n", "\n", "In this example, we read an open dataset named [Vickers 7th Graders (1981)](https://networks.skewed.de/net/7th_graders). Check out the dataset page for more information. \n", "\n", "The network data can be downloaded as a ZIP file that contains nodes and edges. \n", "\n", "Below, we read the ZIP file directly from the website and then extract these CSV files for network construction. " ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | ID | \n", "name | \n", "position | \n", "
---|---|---|---|
0 | \n", "0 | \n", "NaN | \n", "array([-1.25328431, -5.19582831]) | \n", "
1 | \n", "1 | \n", "NaN | \n", "array([-1.17175257, -5.39938108]) | \n", "
2 | \n", "2 | \n", "NaN | \n", "array([-1.04017448, -5.48106062]) | \n", "
3 | \n", "3 | \n", "NaN | \n", "array([-1.07294196, -5.61418605]) | \n", "
4 | \n", "4 | \n", "NaN | \n", "array([-1.36638259, -5.48876888]) | \n", "
5 | \n", "5 | \n", "NaN | \n", "array([-1.26042538, -5.47973283]) | \n", "
6 | \n", "6 | \n", "NaN | \n", "array([-0.94831916, -5.36563054]) | \n", "
7 | \n", "7 | \n", "NaN | \n", "array([-1.33077439, -5.4211611 ]) | \n", "
8 | \n", "8 | \n", "NaN | \n", "array([-1.33161904, -5.59167846]) | \n", "
9 | \n", "9 | \n", "NaN | \n", "array([-1.17888983, -5.62070043]) | \n", "
10 | \n", "10 | \n", "NaN | \n", "array([-1.25979885, -5.36942177]) | \n", "
11 | \n", "11 | \n", "NaN | \n", "array([-1.16099435, -5.26591003]) | \n", "
12 | \n", "12 | \n", "NaN | \n", "array([-1.47551501, -5.56898687]) | \n", "
13 | \n", "13 | \n", "NaN | \n", "array([-1.44937462, -5.27609678]) | \n", "
14 | \n", "14 | \n", "NaN | \n", "array([-1.47503673, -5.19523579]) | \n", "
15 | \n", "15 | \n", "NaN | \n", "array([-1.34305148, -5.30624285]) | \n", "
16 | \n", "16 | \n", "NaN | \n", "array([-1.57709797, -5.191396 ]) | \n", "
17 | \n", "17 | \n", "NaN | \n", "array([-1.38360822, -5.76487804]) | \n", "
18 | \n", "18 | \n", "NaN | \n", "array([-1.56234605, -5.28175874]) | \n", "
19 | \n", "19 | \n", "NaN | \n", "array([-1.51655974, -5.3567369 ]) | \n", "
20 | \n", "20 | \n", "NaN | \n", "array([-1.3609869, -5.2106605]) | \n", "
21 | \n", "21 | \n", "NaN | \n", "array([-1.4339152 , -5.36211952]) | \n", "
22 | \n", "22 | \n", "NaN | \n", "array([-1.47547163, -5.43990738]) | \n", "
23 | \n", "23 | \n", "NaN | \n", "array([-1.60111353, -5.39505365]) | \n", "
24 | \n", "24 | \n", "NaN | \n", "array([-1.52556491, -5.74985098]) | \n", "
25 | \n", "25 | \n", "NaN | \n", "array([-1.67574528, -5.44971572]) | \n", "
26 | \n", "26 | \n", "NaN | \n", "array([-1.68245577, -5.31040503]) | \n", "
27 | \n", "27 | \n", "NaN | \n", "array([-1.55409141, -5.49705225]) | \n", "
28 | \n", "28 | \n", "NaN | \n", "array([-1.61097712, -5.57691642]) | \n", "
\n", " | source | \n", "target | \n", "weight | \n", "layer | \n", "
---|---|---|---|---|
0 | \n", "0 | \n", "5 | \n", "1 | \n", "1 | \n", "
1 | \n", "0 | \n", "7 | \n", "1 | \n", "1 | \n", "
2 | \n", "0 | \n", "10 | \n", "1 | \n", "1 | \n", "
3 | \n", "0 | \n", "11 | \n", "1 | \n", "1 | \n", "
4 | \n", "0 | \n", "13 | \n", "1 | \n", "1 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
735 | \n", "28 | \n", "12 | \n", "1 | \n", "3 | \n", "
736 | \n", "28 | \n", "22 | \n", "1 | \n", "3 | \n", "
737 | \n", "28 | \n", "23 | \n", "1 | \n", "3 | \n", "
738 | \n", "28 | \n", "26 | \n", "1 | \n", "3 | \n", "
739 | \n", "28 | \n", "27 | \n", "1 | \n", "3 | \n", "
740 rows × 4 columns
\n", "\n", " | ID | \n", "poster | \n", "thread | \n", "toID | \n", "
---|---|---|---|---|
0 | \n", "1 | \n", "69497 | \n", "1 | \n", "<NA> | \n", "
1 | \n", "2 | \n", "44591 | \n", "2 | \n", "<NA> | \n", "
2 | \n", "3 | \n", "24601 | \n", "2 | \n", "2 | \n", "
3 | \n", "4 | \n", "74570 | \n", "3 | \n", "<NA> | \n", "
4 | \n", "5 | \n", "29022 | \n", "3 | \n", "4 | \n", "
5 | \n", "6 | \n", "12345 | \n", "3 | \n", "5 | \n", "
6 | \n", "7 | \n", "24601 | \n", "3 | \n", "5 | \n", "
7 | \n", "8 | \n", "29022 | \n", "3 | \n", "7 | \n", "
8 | \n", "9 | \n", "24601 | \n", "3 | \n", "8 | \n", "
9 | \n", "10 | \n", "74577 | \n", "4 | \n", "<NA> | \n", "
\n", " | ID | \n", "poster | \n", "
---|---|---|
2 | \n", "3 | \n", "24601 | \n", "
4 | \n", "5 | \n", "29022 | \n", "
5 | \n", "6 | \n", "12345 | \n", "
6 | \n", "7 | \n", "24601 | \n", "
7 | \n", "8 | \n", "29022 | \n", "
... | \n", "... | \n", "... | \n", "
192 | \n", "193 | \n", "24601 | \n", "
193 | \n", "194 | \n", "73263 | \n", "
194 | \n", "195 | \n", "68491 | \n", "
195 | \n", "196 | \n", "26362 | \n", "
196 | \n", "197 | \n", "4582 | \n", "
150 rows × 2 columns
\n", "\n", " | ID | \n", "poster | \n", "thread | \n", "toID | \n", "receiver | \n", "
---|---|---|---|---|---|
0 | \n", "6 | \n", "12345 | \n", "3 | \n", "5 | \n", "29022 | \n", "
1 | \n", "7 | \n", "24601 | \n", "3 | \n", "5 | \n", "29022 | \n", "
2 | \n", "8 | \n", "29022 | \n", "3 | \n", "7 | \n", "24601 | \n", "
3 | \n", "9 | \n", "24601 | \n", "3 | \n", "8 | \n", "29022 | \n", "
4 | \n", "12 | \n", "12345 | \n", "4 | \n", "11 | \n", "24601 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
103 | \n", "193 | \n", "24601 | \n", "47 | \n", "192 | \n", "62306 | \n", "
104 | \n", "194 | \n", "73263 | \n", "47 | \n", "193 | \n", "24601 | \n", "
105 | \n", "195 | \n", "68491 | \n", "47 | \n", "193 | \n", "24601 | \n", "
106 | \n", "196 | \n", "26362 | \n", "47 | \n", "195 | \n", "68491 | \n", "
107 | \n", "197 | \n", "4582 | \n", "47 | \n", "195 | \n", "68491 | \n", "
108 rows × 5 columns
\n", "\n", " | ID | \n", "poster | \n", "thread | \n", "toID | \n", "receiver | \n", "
---|---|---|---|---|---|
71 | \n", "129 | \n", "8639 | \n", "30 | \n", "128 | \n", "12345 | \n", "
73 | \n", "131 | \n", "8639 | \n", "30 | \n", "130 | \n", "12345 | \n", "
\n", " | poster | \n", "receiver | \n", "weight | \n", "
---|---|---|---|
9 | \n", "8639 | \n", "12345 | \n", "2 | \n", "
10 | \n", "9061 | \n", "3903 | \n", "2 | \n", "
20 | \n", "12345 | \n", "24601 | \n", "2 | \n", "
22 | \n", "12345 | \n", "47634 | \n", "2 | \n", "
30 | \n", "21588 | \n", "50718 | \n", "2 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "
93 | \n", "87525 | \n", "21731 | \n", "1 | \n", "
94 | \n", "89206 | \n", "86620 | \n", "1 | \n", "
95 | \n", "97784 | \n", "73263 | \n", "1 | \n", "
96 | \n", "98582 | \n", "10827 | \n", "1 | \n", "
97 | \n", "98582 | \n", "80045 | \n", "1 | \n", "
98 rows × 3 columns
\n", "\n", " | from_user | \n", "text | \n", "created_at | \n", "from_user_id | \n", "geo_coordinates | \n", "iso_language_code | \n", "to_user_id | \n", "id | \n", "to_user_id_str | \n", "source | \n", "from_user_id_str | \n", "id_str | \n", "profile_image_url | \n", "status_url | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
36 | \n", "Anna_De_Liddo | \n", "@houshuang Hi Stian! we are still in Banf, but... | \n", "Thu, 03 Mar 2011 05:31:57 +0000 | \n", "9526430 | \n", "NaN | \n", "en | \n", "172290.0 | \n", "4.318180e+16 | \n", "172290.0 | \n", "<a href="http://www.tweetdeck.com"... | \n", "9526430 | \n", "4.318180e+16 | \n", "http://a3.twimg.com/profile_images/20659452/IM... | \n", "http://twitter.com/Anna_De_Liddo/statuses/4318... | \n", "
48 | \n", "psychemedia | \n", "@andymcg what edu related activity data projec... | \n", "Wed, 02 Mar 2011 21:03:37 +0000 | \n", "69223 | \n", "NaN | \n", "en | \n", "2316588.0 | \n", "4.305387e+16 | \n", "2316588.0 | \n", "<a href="http://www.tweetdeck.com"... | \n", "69223 | \n", "4.305387e+16 | \n", "http://a1.twimg.com/profile_images/1195013164/... | \n", "http://twitter.com/psychemedia/statuses/430538... | \n", "
67 | \n", "NicolaAvery | \n", "@Anna_De_Liddo thank you, no idea how difficul... | \n", "Wed, 02 Mar 2011 17:11:29 +0000 | \n", "13181495 | \n", "NaN | \n", "en | \n", "9526430.0 | \n", "4.299545e+16 | \n", "9526430.0 | \n", "<a href="http://twitter.com/">... | \n", "13181495 | \n", "4.299545e+16 | \n", "http://a3.twimg.com/profile_images/797991041/n... | \n", "http://twitter.com/NicolaAvery/statuses/429954... | \n", "
70 | \n", "Anna_De_Liddo | \n", "@NicolaAvery and btw this is a nice idea/featu... | \n", "Wed, 02 Mar 2011 16:51:48 +0000 | \n", "9526430 | \n", "NaN | \n", "en | \n", "13181495.0 | \n", "4.299050e+16 | \n", "13181495.0 | \n", "<a href="http://www.tweetdeck.com"... | \n", "9526430 | \n", "4.299050e+16 | \n", "http://a3.twimg.com/profile_images/20659452/IM... | \n", "http://twitter.com/Anna_De_Liddo/statuses/4299... | \n", "
71 | \n", "Anna_De_Liddo | \n", "@NicolaAvery you can see, edit/manage your tag... | \n", "Wed, 02 Mar 2011 16:50:53 +0000 | \n", "9526430 | \n", "NaN | \n", "en | \n", "13181495.0 | \n", "4.299027e+16 | \n", "13181495.0 | \n", "<a href="http://www.tweetdeck.com"... | \n", "9526430 | \n", "4.299027e+16 | \n", "http://a3.twimg.com/profile_images/20659452/IM... | \n", "http://twitter.com/Anna_De_Liddo/statuses/4299... | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
1327 | \n", "houshuang | \n", "@sbskmi @psychemedia Looking forward to seeing... | \n", "Sat, 26 Feb 2011 01:55:04 +0000 | \n", "172290 | \n", "NaN | \n", "en | \n", "19130020.0 | \n", "4.131530e+16 | \n", "19130020.0 | \n", "<a href="http://www.tweetdeck.com"... | \n", "172290 | \n", "4.131530e+16 | \n", "http://a3.twimg.com/profile_images/52634283/51... | \n", "http://twitter.com/houshuang/statuses/41315277... | \n", "
1328 | \n", "houshuang | \n", "@sebpaquet in banff for #LAK11 back wednesday... | \n", "Sat, 26 Feb 2011 01:50:49 +0000 | \n", "172290 | \n", "NaN | \n", "en | \n", "386316.0 | \n", "4.131421e+16 | \n", "386316.0 | \n", "<a href="http://www.tweetdeck.com"... | \n", "172290 | \n", "4.131421e+16 | \n", "http://a3.twimg.com/profile_images/52634283/51... | \n", "http://twitter.com/houshuang/statuses/41314208... | \n", "
1338 | \n", "houshuang | \n", "@dougclow I'm here now, it's actually not that... | \n", "Fri, 25 Feb 2011 21:47:36 +0000 | \n", "172290 | \n", "NaN | \n", "en | \n", "308967.0 | \n", "4.125300e+16 | \n", "308967.0 | \n", "<a href="http://www.nambu.com/" r... | \n", "172290 | \n", "4.125300e+16 | \n", "http://a3.twimg.com/profile_images/52634283/51... | \n", "http://twitter.com/houshuang/statuses/41253000... | \n", "
1358 | \n", "gsiemens | \n", "@JonElmSherrill :). Aparrently, it's supposed ... | \n", "Thu, 24 Feb 2011 20:30:42 +0000 | \n", "5748 | \n", "NaN | \n", "en | \n", "237016721.0 | \n", "4.087126e+16 | \n", "237016721.0 | \n", "<a href="http://www.tweetdeck.com"... | \n", "5748 | \n", "4.087126e+16 | \n", "http://a3.twimg.com/profile_images/1238005253/... | \n", "http://twitter.com/gsiemens/statuses/408712594... | \n", "
1361 | \n", "weisblatt | \n", "@davecormier your blog and videos make partici... | \n", "Thu, 24 Feb 2011 16:15:23 +0000 | \n", "2290211 | \n", "NaN | \n", "en | \n", "153639.0 | \n", "4.080701e+16 | \n", "153639.0 | \n", "<a href="http://twitter.com/">... | \n", "2290211 | \n", "4.080701e+16 | \n", "http://a0.twimg.com/profile_images/907710761/s... | \n", "http://twitter.com/weisblatt/statuses/40807007... | \n", "
219 rows × 14 columns
\n", "