10

I am trying to calculate cumulative sum with groupby using Pandas's DataFrame. However, I don't get expected output.

My Source Code:

import pandas as pd
Employee = [['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'],
        ['CSE', 'CSE', 'EEE', 'EEE', 'CE', 'CE', 'ME', 'ME'],
        ['Cat-1', 'Cat-2', 'Cat-1', 'Cat-2', 'Cat-1', 'Cat-2', 'Cat-1', 'Cat-2']]

index = pd.MultiIndex.from_arrays(Employee, names=['Name', 'Dept', 'Category'])

Scale = [1, 2, 2, 3, 3, 1, 2, 3]
Salary = [100, 200, 200, 300, 300, 100, 200, 300]

df = pd.DataFrame({'scale': Scale,
               'salary': Salary},
              index=index)

df1 = df.groupby(['Category', 'scale']).cumsum()
print(df1)

Expected Output:

Cat-1    1         100
         2         500
         3         800
Cat-2    1         100
         2         300
         3         900

Obtained Result:

Name Dept Category        
A    CSE  Cat-1        100
B    CSE  Cat-2        200
C    EEE  Cat-1        300
D    EEE  Cat-2        500
E    CE   Cat-1        600
F    CE   Cat-2        600
G    ME   Cat-1       1000
H    ME   Cat-2       1200

Groupby doesn't work. However, if I use sum() (i.e. df1 = df.groupby(['Category', 'scale']).sum()) instead of cumsum(), groupby works perfectly.

Shaido
  • 652
  • 6
  • 13
sakamoto sato
  • 103
  • 1
  • 1
  • 5

1 Answers1

12

There are multiple entries for each group so you need to aggregate the data twice, in other words, use groupby twice. Once to get the sum for each group and once to calculate the cumulative sum of these sums.

It can be done as follows:

df.groupby(['Category','scale']).sum().groupby('Category').cumsum()

Note that the cumsum should be applied on groups as partitioned by the Category column only to get the desired result.

Shaido
  • 652
  • 6
  • 13