I have a dataframe like so:
df<- data.frame(region = c("1","1","1","1","1","2","3","3","3"),
loc = c("104","104","104","105","105","106","107", "108", "109"),
interact = c("A_B","A_B", "B_C", "C_D", "A_B", "E_F", "E_F", "F_G", "A_B"))
I would like to make a dataframe that:
1) counts the incidence frequency of a given interaction occurring among loc levels for each region subset. Therefore, in the example above, in region 1 there are two loc (104 and 105) that both have the interact A_B. Thus, the incidence frequency of A_B for region 1 = 2. Duplicate interact levels in the same loc are not counted. So while A_B occurs 3 times in region 1, it occurs only in two unique loc. The incidence frequency counts how many unique loc level this interact occurs in.
2) The new dataframe should vectorize all possible interact levels among all regions, and count incidences of these for each region. As a consequence, 0's should be included for all levels of interact that did not occur in that region.
3) The first row needs to be a count of unique loc levels in that region. In region1 there were 2 loc levels(104,105), region2 1 loc level(106) and in region 3, 3 loc levels(107-109).
The final output will look like:
output<- data.frame(interact = c("","A_B","B_C","C_D","E_F","F_G"),
region1 = c("2","2","1","0","1","0"),
region2 = c("1","0","0","0","1","0"),
region3 = c("3","1","0","0","1","1"))
I do not know where to start with this, but here is what I have adapted from @akrun in a similar question posted on Convert from long to wide format counting frequency of eliminated factor level (Prepping dataframe for input into iNEXT Online), but get errors with:
library(tidyverse)
df %>%
group_by(region = paste0('region', region)) %>%
summarise(interact = "", V1 = n_distinct(loc)) %>%
spread(region, V1),
df %>%
group_by(region = paste0('region', region) & loc),
interact = as.character(interact)) %>%
summarise(V1 = length(unique((interact)) %>%
spread(region, V1, fill = 0))