Given a test dataset as follows:
id city district quantity price
0 1 bj hd 12.0 23.0
1 2 bj cy 23.0 45.0
2 3 bj hd NaN NaN
3 4 bj cy NaN NaN
4 5 sh hp 12.0 NaN
5 6 sh hp NaN NaN
6 7 sh pd NaN NaN
I would like to drop duplicated rows based on city and district, then drop rows if its quantity is NaN, but if city and district are not duplicated, then even if quantity is NaN, it's not necessary to drop rows.
Code based on link from here:
m1 = df['quantity'].notna()
m2 = ~df[['city', 'district']].duplicated()
df1 = df[m1 & m2]
print(df1)
Out:
id city district quantity price
0 1 bj hd 12.0 23.0
1 2 bj cy 23.0 45.0
4 5 sh hp 12.0 NaN
But I want to keep the last row since it's not duplicated with any rows. How could I do that?
id city district quantity price
0 1 bj hd 12.0 23.0
1 2 bj cy 23.0 45.0
2 5 sh hp 12.0 NaN
3 7 sh pd NaN NaN