1

I have a data.frame where I want to create a new column categorising the period from which the sample originates. Each period starts in August 1st and finishes July 31st. For instant, period 1 is 01/Aug/2001 to 31/Jul/2002 and period 2 is 01/Aug/2002 to 31/Jul/2003

I have been trying with dplyr as I want a tidy solution.

The following is an example of the data.frame

Date <- seq(as.Date("2001/08/01"), by = "month", length.out = 60)
AHD <- rnorm(60, mean = 12, sd = 1)
df <- data.frame(Date=Date, AHD = AHD)

I can subset the data.frame into one of the periods using:

df %>%
  group_by(dr = cut(Date, breaks = c(range(Date), 
                                     as.Date(c("2001-08-01", "2002-07-31"))), include.lowest=TRUE) )

However, I do not know how to do this for a sequence of periods nor how to build the new category column. I need a new column saying whether each row belongs to period 1, period 2 etc.

2 Answers2

1

We could create a sequence of yearly dates from min of Date to max and use that in cut

period <- seq(min(df$Date), max(df$Date), by = "1 year")
df$period <- cut(df$Date, breaks =  c(period, Inf), 
              labels = paste0("period", seq_along(period)))


df
#         Date          AHD  period
#1  2001-08-01 10.792934251 period1
#2  2001-09-01 12.277429242 period1
#3  2001-10-01 13.084441177 period1
#4  2001-11-01  9.654302297 period1
#5  2001-12-01 12.429124689 period1
#6  2002-01-01 12.506055892 period1
#7  2002-02-01 11.425260040 period1
#8  2002-03-01 11.453368144 period1
#9  2002-04-01 11.435548001 period1
#10 2002-05-01 11.109962171 period1
#11 2002-06-01 11.522807300 period1
#12 2002-07-01 11.001613555 period1
#13 2002-08-01 11.223746105 period2
#....

We can also use findInterval which gives same result as cut

df$period <- paste0("period", findInterval(df$Date,  period))

data

set.seed(1234)
Date <- seq(as.Date("2001/08/01"), by = "month", length.out = 60)
AHD <- rnorm(60, mean = 12, sd = 1)
df <- data.frame(Date=Date, AHD = AHD)
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Yep that works great! Thanks. However, is there a way in which you can get it to work without specifying the date range. The first row of the data.frame is the 1st August of a given year, and you need to group by the 31st July of the following year. – Christopher Kavazos Aug 16 '19 at 00:38
  • @ChristopherKavazos No need to group by, we can specify `min` and `max` `Date` as intervals and should work for all date ranges. I have updated the answer. – Ronak Shah Aug 16 '19 at 00:44
1

Here is a solution based on an answer to similar question here

library(tidyverse)
library(zoo)


Date <- seq(as.Date("2001/08/01"), by = "month", length.out = 60)
AHD <- rnorm(60, mean = 12, sd = 1)
df <- data.frame(Date=Date, AHD = AHD)

df=df%>% mutate(period=as.integer(as.yearmon(Date) - 7/12 + 1)-2001)
hghg hghg
  • 127
  • 9