I am trying to drop observations with prices in the top and bottom one percent, by year. I have been attempting to use dplyr's group_by function to group by year_sold and then mutate()to create a variable to_drop whose value is conditional on the variable price being between the 1st and 99th percentile. Here's what I have so far:
df <- df %>% dplyr::group_by(year_sold) %>%
mutate(to_drop = ifelse(price <= quantile(price,0.01) | price >= quantile(price,0.99),1,0))
However, I am not getting the quantiles of price grouped by year_sold. Removing dplyr::group_by(year_sold) %>%
doesn't seem to change my results.
I'm trying to find alternatives to Stata's very useful bysort command. Here's how I would have done this in Stata:
gen to_drop = 0
foreach y in year_sold {
quietly bysort `y': summarize price, detail // To get r(p1) and r(p99)
bysort `y': replace to_drop = 1 if ! inrange(price, r(p1), r(p99))
}
Can someone please help me either figure out why group_by isn't working as I would expect it to, or alternatively, help me figure out another way of achieving this task in R?