1

I'm trying to find the same number of occurrences in both data frames This is a follow-up question for my previous question
I got 2 data frames

df1=pd.DataFrame([[1,None],[1,None,],[1,None],[1,'item_a'],[2,'item_a'],[2,'item_b'],[2,'item_f'],[3,'item_e'],[3,'item_e'],[3,'item_g'],[3,'item_h']],columns=['id','A'])
df2=pd.DataFrame([[1,'item_a'],[1,'item_b'],[1,'item_c'],[1,'item_d'],[2,'item_a'],[2,'item_b'],[2,'item_c'],[2,'item_d'],[3,'item_e'],[3,'item_f'],[3,'item_g'],[3,'item_h']],columns=['id','A'])

 df1
        id  A
    0   1   None
    1   1   None
    2   1   None
    3   1   item_a # id 1 has 1 occurrences in total in df1
    4   2   item_a
    5   2   item_b
    6   2   item_f #id 2 has 3 occurrences in total in df1(id 2 has 3 occurrences here)
    7   3   item_e
    8   3   item_e
    9   3   item_g
    10  3   item_h #id3 has 4 ccurrences in total in df1

df2 id A 0 1 item_a 1 1 item_b 2 1 item_c 3 1 item_d 4 2 item_a 5 2 item_b 6 2 item_c 7 2 item_d 8 3 item_e 9 3 item_f 10 3 item_g 11 3 item_h


I got an answer on how to find similarities by using

previous result:
d=pd.merge(df1,df2,how='inner')
        id  A
3   1   item_a # id 1 has 1 occurrences in total in d
4   2   item_a
5   2   item_b # id 2 has 2 occurrences in total in d(id 2 has 2 occurrences here which does not match all the occurrences(3) in df1)
7   3   item_e
8   3   item_e
9   3   item_g
10  3   item_h #id 3 has 4 occurrences in total in d

What I've tried to find same number of occurrences in both data frames:
d[d['id'].value_counts()==df1['id'].value_counts()]
Which gave me an error:Can only compare identically-labeled Series objects
I've also tried different things using rename to put a column name for value_counts and merge them but failed.

Match: Count of occurrences in df1 for an id match count of occurrences in result data frame d

        cnt_in_df1|cntin_d
for id1:     1    | 1  count #match => id 1 should be in the desired output.
for id2:     3    | 2  count #mismatch=> id 2 should not be in the desired output
for id3:     4    | 4  count #match => id 3 should be in the desired output.

My desired output for this question:

    id  count 
0   1    1
1   3    4

1 Answers1

1

EDIT: Thanks for you clarifying the question. So now the problem is checking the counts of ids in two data frames are the same.

Here is how you could go about it:

d1 = pd.DataFrame(df1[~df1['A'].isnull()].groupby("id").size())
d2 = pd.DataFrame(d[~d['A'].isnull()].groupby("id").size())

d = pd.merge(d1,d2,on="id")

ids_ = d[d["0_x"] == d["0_y"]].index.values

RETURN: array([1, 3])

This will now give an array of ids where the counts in both df1 and d are the same.

shepan6
  • 1,486
  • 7
  • 14