I have two datasets that look like this:
df1:
| Date | City | State | Quantity |
|---|---|---|---|
| 2019-01 | Chicago | IL | 35 |
| 2019-01 | Orlando | FL | 322 |
| ... | .... | ... | ... |
| 2021-07 | Chicago | IL | 334 |
| 2021-07 | Orlando | FL | 4332 |
df2:
| Date | City | State | Sales |
|---|---|---|---|
| 2020-03 | Chicago | IL | 30 |
| 2020-03 | Orlando | FL | 319 |
| ... | ... | ... | ... |
| 2021-07 | Chicago | IL | 331 |
| 2021-07 | Orlando | FL | 4000 |
My date is in format period[M] for both datasets. I have tried using the df1.join(df2,how='outer') and (df2.join(df1,how='outer') commands but they don't add up correctly, essentially, in 2019-01, I have sales for 2020-03. How can I join these two datasets such that my output is as follows:
I have not been able to use merge() because I would have to merge with a combination of City and State and Date
| Date | City | State | Quantity | Sales |
|---|---|---|---|---|
| 2019-01 | Chicago | IL | 35 | NaN |
| 2019-01 | Orlando | FL | 322 | NaN |
| ... | ... | ... | ... | ... |
| 2021-07 | Chicago | IL | 334 | 331 |
| 2021-07 | Orlando | FL | 4332 | 4000 |