2

I am trying to add quantiles based on column J1 within each ID group in the below dataframe.

import pandas as pd
try_df = pd.DataFrame({'ID':['1','1','1','1','1','2','2','2','2','2','3','3','3','3','3'], 'J1': range(15)})
print(try_df)    
try_df["quantiles"] = try_df.groupby("ID")["J1"].transform(pd.qcut,4,["Q1","Q2","Q3","Q4"])

   ID  J1
0   1   0
1   1   1
2   1   2
3   1   3
4   1   4
5   2   5
6   2   6
7   2   7
8   2   8
9   2   9
10  3  10
11  3  11
12  3  12
13  3  13
14  3  14

The above code gives a value error: invalid literal for long() with base 10: 'Q4'

In the real data it throws this error:

ValueError: could not convert string to float: Q2

Any suggestions on how to resolve it?

cs95
  • 379,657
  • 97
  • 704
  • 746
Apoorv
  • 177
  • 1
  • 3
  • 15

2 Answers2

3

Try groupby + apply:

try_df.groupby("ID")["J1"].apply(lambda x: pd.qcut(x, 4, ["Q1","Q2","Q3","Q4"]))

0     Q1
1     Q1
2     Q2
3     Q3
4     Q4
5     Q1
6     Q1
7     Q2
8     Q3
9     Q4
10    Q1
11    Q1
12    Q2
13    Q3
14    Q4
Name: J1, dtype: category
Categories (4, object): [Q1 < Q2 < Q3 < Q4]
cs95
  • 379,657
  • 97
  • 704
  • 746
  • Thanks for the quick reply, It worked fine on the given data, when i do the same thing on real data it gives a ValueError: Bin edges must be unique: array([ nan, nan, nan, nan, nan, nan]) . I have one blank value in the data. Any idea what could be the possible reason? I tried replacing it with some number too. – Apoorv Oct 24 '17 at 07:37
  • @Apoorv I cannot pinpoint the reason for your error given this information, but take a look at https://stackoverflow.com/questions/36880490/why-use-pandas-qcut-return-valueerror-bin-edges-must-be-unique which might help. – cs95 Oct 24 '17 at 07:40
  • Thanks for the link! I removed that blank value and it worked fine. I think value is somehow erroneous and not getting treated. – Apoorv Oct 24 '17 at 07:47
  • @Apoorv Makes sense, although I always believed pandas should be adept at handling missing data without erroring out like that. – cs95 Oct 24 '17 at 07:48
0

For me your code working very nice in pandas 0.20.3:

try_df["quantiles"] = try_df.groupby("ID")["J1"].transform(pd.qcut,4,["Q1","Q2","Q3","Q4"])
print (try_df)
   ID  J1 quantiles
0   1   0        Q1
1   1   1        Q1
2   1   2        Q2
3   1   3        Q3
4   1   4        Q4
5   2   5        Q1
6   2   6        Q1
7   2   7        Q2
8   2   8        Q3
9   2   9        Q4
10  3  10        Q1
11  3  11        Q1
12  3  12        Q2
13  3  13        Q3
14  3  14        Q4

Another solution is define function with lambda:

try_df["quantiles"] = (try_df.groupby("ID")["J1"]
                             .transform(lambda x: pd.qcut(x,4,["Q1","Q2","Q3","Q4"])))
print (try_df)
   ID  J1 quantiles
0   1   0        Q1
1   1   1        Q1
2   1   2        Q2
3   1   3        Q3
4   1   4        Q4
5   2   5        Q1
6   2   6        Q1
7   2   7        Q2
8   2   8        Q3
9   2   9        Q4
10  3  10        Q1
11  3  11        Q1
12  3  12        Q2
13  3  13        Q3
14  3  14        Q4
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252