So again, i have another question related to this: I'm processing a DataFrame, which looks like the following:
the thing is that now I want to add an additional column, called 'position', in which, according to the contributor_id, and the number of edits, the number of the corresponding row appears. The thing is that now, I don't want the count of rows to reestart until the value in nEdits is greater than 0, and This number must be reinitiated to 1 when the contributor_id changes:
contributor_id timestamp nEdits Position
0 8 2018-01-01 1 1
1 8 2018-02-01 1 2
2 8 2018-03-01 1 3
3 8 2018-04-01 1 4
4 8 2018-05-01 1 5
5 8 2018-06-01 1 6
6 8 2018-07-01 1 7
7 8 2018-08-01 1 8
8 26424341 2018-01-01 0 0
9 26424341 2018-02-01 0 0
10 26424341 2018-03-01 11 1
11 26424341 2018-04-01 34 2
12 26424341 2018-05-01 42 3
13 26424341 2018-06-01 46 4
14 26424341 2018-07-01 50 5
15 26424341 2018-08-01 54 6
16 26870381 2018-01-01 465 1
17 26870381 2018-02-01 566 2
18 26870381 2018-03-01 601 3
The idea I got from some answers to compute the position column is to do: df.groupby("contributor_id").position.cumsum()
But I don't know how to include the condition that nEdits must be greater than 0 in order to reestart the count.
