Create Period column based on a date column where the first month is 1, second 2, etc

Question

I have a dataset with many project's monthly expendituries (cost curve), like this one:

Project	Date	Expenditure(USD)
Project A	12-2020	500
Project A	01-2021	1257
Project A	02-2021	125889
Project A	03-2021	102447
Project A	04-2021	1248
Project A	05-2021	1222
Project A	06-2021	856
Project B	01-2021	5589
Project B	02-2021	52874
Project B	03-2021	5698745
Project B	04-2021	2031487
Project B	05-2021	2359874
Project B	06-2021	25413
Project B	07-2021	2014
Project B	08-2021	2569

Using python, I want to create a "Period" column that replace the month value for a integer that represents the count of months of the project, like this: Where the line is the first month of the Project A (12-2020) the code should put 1 in the "Period" column, the second month (01-2021) is 2, the third (02-2021) is 3, etc. because I need to focus on the number of months that the projects of my dataframe had an expediture (month 1, month 2, month 3...)

Project	Date	Period	Expenditure(USD)
Project A	12-2020	1	500
Project A	01-2021	2	1257
Project A	02-2021	3	125889
Project A	03-2021	4	102447
Project A	04-2021	5	1248
Project A	05-2021	6	1222
Project A	06-2021	7	856
Project B	01-2021	1	5589
Project B	02-2021	2	52874
Project B	03-2021	3	5698745
Project B	04-2021	4	2031487
Project B	05-2021	5	2359874
Project B	06-2021	6	25413
Project B	07-2021	7	2014
Project B	08-2021	8	2569

score 0 · Accepted Answer · answered Nov 10 '21 at 07:48

The easiest thing is for you to calculate for each row:

The start date of the corresponding project.
The months since current date and start date of the project.

Below is a sample code that does that for you:

import pandas as pd
import numpy as np
df = pd.DataFrame(
    [
        ["Project A", "12-2020", 500],
        ["Project A", "01-2021", 1257],
        ["Project A", "02-2021", 125889],
        ["Project A", "03-2021", 102447],
        ["Project A", "04-2021", 1248],
        ["Project A", "05-2021", 1222],
        ["Project A", "06-2021", 856],
        ["Project B", "01-2021", 5589],
        ["Project B", "02-2021", 52874],
        ["Project B", "03-2021", 5698745],
        ["Project B", "04-2021", 2031487],
        ["Project B", "05-2021", 2359874],
        ["Project B", "06-2021", 25413],
        ["Project B", "07-2021", 2014],
        ["Project B", "08-2021", 2569],
    ],
    columns=["Project", "Date", "Expenditure(USD)"],
)
df["Date"] = pd.to_datetime(df["Date"], format="%m-%Y")  # Convert date column type
get the start date of the project
i.e find the lowest date of rows that have the same project as the current row
df["Project Start Date"] = df.apply(lambda row: min(df[df["Project"] == row["Project"]]["Date"]), axis=1)
calculate the period
i.e. the number of months of the current date since the start of the project + 1
df["Period"] = ((df["Date"] - df["Project Start Date"]) / np.timedelta64(1, "M") + 1).round().astype(int)
print(df)

It gives you the following:

      Project        Date  Expenditure(USD) Project Start Date  Period
0   Project A  2020-12-01               500         2020-12-01       1
1   Project A  2021-01-01              1257         2020-12-01       2
2   Project A  2021-02-01            125889         2020-12-01       3
3   Project A  2021-03-01            102447         2020-12-01       4
4   Project A  2021-04-01              1248         2020-12-01       5
5   Project A  2021-05-01              1222         2020-12-01       6
6   Project A  2021-06-01               856         2020-12-01       7
7   Project B  2021-01-01              5589         2021-01-01       1
8   Project B  2021-02-01             52874         2021-01-01       2
9   Project B  2021-03-01           5698745         2021-01-01       3
10  Project B  2021-04-01           2031487         2021-01-01       4
11  Project B  2021-05-01           2359874         2021-01-01       5
12  Project B  2021-06-01             25413         2021-01-01       6
13  Project B  2021-07-01              2014         2021-01-01       7
14  Project B  2021-08-01              2569         2021-01-01       8

Create Period column based on a date column where the first month is 1, second 2, etc

1 Answers1

get the start date of the project

i.e find the lowest date of rows that have the same project as the current row

calculate the period

i.e. the number of months of the current date since the start of the project + 1