0

I have a dataset that looks like this:

conifer.abundance <- c(6,7,8,2,3,4,5,1,7,8,9,8,7,6,5,1)
lily.abundance <- c(5,5,5,5,4,4,4,4,6,7,8,2,3,4,5,1)
type <- c("Control","Control","Control","Control","Control","Control","Control","Control","Treatment","Treatment","Treatment","Treatment","Treatment","Treatment","Treatment","Treatment")
class <- c("City","Rural","City","Rural","City","Rural","City","Rural","City","Rural","City","Rural","City","Rural","City","Rural")
climate <- c("wet","wet","dry","dry","wet","wet","dry","dry","wet","wet","dry","dry","wet","wet","dry","dry")
all.abundance <- conifer.abundance + lily.abundance
dat88 <- data.frame(climate,type,class,conifer.abundance, lily.abundance,all.abundance)

This is a 2x2x2 design. I want to plot barplots such that the mean of all.abundance is represented as sum of mean conifer.abundance and mean lily.abundance (stacked) and it has a legend of its own. I tried following this code, but it seems like it using fill to stack the bars, but I need to use it for a different purpose here. Suppose, I have several more data points, I would also need to plot a bootstrapped confidence interval (as below). Any suggestions? Here is my current code for plotting the graph above.

  pd <- position_dodge(0.82) 
  ggplot(dat88, aes(x=class, y=all.abundance, fill = climate)) + 
  theme_bw() + 
  stat_summary(geom="bar", fun.y=mean, position = "dodge") + 
  stat_summary(geom="errorbar", fun.data=mean_cl_boot,position = pd) + 
  ylab("Total Abundance") + 
  facet_grid(~type)

Please note that I have slightly changed the dataset to represent a more biologically fitting scenario.

Share
  • 395
  • 7
  • 19
  • I'm not sure I understand the updated requirements. If you want to plot the bootstrapped mean against the y-axis, does it still make sense to plot total abundance there? Or do you mean the sum of *average* abundance for conifers & *average* abundance for lily? – Z.Lin Sep 19 '17 at 05:09
  • You are right. Some of average abundance of confiers and lily, makes more sense. – Share Sep 19 '17 at 12:57

1 Answers1

2

If you want to stack the height values for female & male, you'll need to melt / gather them into a single variable.

The following two methods for manipulating the data frame are equivalent. Depends on which packages you are more familiar with:

# data.table package
dat2 <- data.table::melt(dat, measure.vars = c("male.height", "female.height"),
                         variable.name = "Gender", value.name = "height")

# tidyr package
dat3 <- tidyr::gather(dat, key = Gender, value = height, 
                      male.height, female.height, factor_key = TRUE)

> all.equal(dat2, dat3)
[1] TRUE

Since this is a 2 x 2 x 2 design, I added a dimension to facet_grid to show both type and species. If that's not needed, simply revert to facet_grid(~type):

ggplot(dat2,
       aes(x = class, y = height, fill = Gender)) +
  geom_col() +
  ylab("Total Height") +
  facet_grid(species~type) +
  scale_fill_discrete(breaks = c("female.height", "male.height"),
                      labels = c("female", "male"))

facet plot

Z.Lin
  • 28,055
  • 6
  • 54
  • 94
  • Imagine I have several samples like this and I want to also plot the bootstrap CI on the mean total height. How can I do it with this code? I know that in this example it does not make biological sense to take total.height and then take a mean..but I'm just trying to provide a decent extension of what I have. – Share Sep 19 '17 at 03:38
  • Can you include a richer data sample? At present there's exactly one data point for each combination of factors, so it's hard to understand the motivation. – Z.Lin Sep 19 '17 at 03:40