I am using networkx to build an email network structure from a txt file where each row represents an "edge." I first loaded the txt file (3 columns: {'#Sender', 'Recipient', 'time'}) into Python and then converted to an networkx object using the following code:
import networkx as nx
import pandas as pd
email_df = pd.read_csv('email_network.txt', delimiter = '->')
email = nx.from_pandas_dataframe(email_df, '#Sender', 'Recipient', edge_attr = 'time')
The email.txt data can be accessed here.
However, email_df (a pandas DataFrame object) has a length of 82927, while email (a Networkx object) has a length of 3251.
In [1]: len(email_df)
In [2]: 82927
In [3]: len(email.edges())
In [4]: 3251
I got really confused because even if for rows containing the same two nodes in the first two columns of email_df with the same sequence of direction (say, '1' to '2'), the third column ('time', meaning timestamped) should distinguish them from each other, hence, no replicated edges would appear. Then why does the number of edges dramatically decreased from 82927 to 3251 after I used nx.from_pandas_dataframe to read from `email_df'?
Would anyone help explain this to me?
Thank you.