0

Data looks as follows:

df1=data.frame(Date=as.Date(c('8/27/2001','8/27/2001','8/27/2001','11/13/2001','11/13/2001','11/13/2001','8/3/2012','8/3/2012'),format="%m/%d/%Y"),
    Name=c('Joe', 'Joe', 'Joe', 'Billy', 'Billy', 'Billy','Emma','Emma'),
    Sample=c('Pre','Post','Discard','Pre','Post','Discard','Bone','Pre'),
    Cells=c(15,7,3,12,5,2,14,NA))
    Date        Name    Sample Cells
1   2001-08-27  Joe     Pre     15
2   2001-08-27  Joe     Post    7
3   2001-08-27  Joe     Discard 3
4   2001-11-13  Billy   Pre     12
5   2001-11-13  Billy   Post    5
6   2001-11-13  Billy   Discard 2
7   2012-08-03  Emma    Bone    14
8   2012-08-03  Emma    Pre     NA

I would like to add a calculated column called "Yield" based on unique groupings of date and name (e.g. entries 1-3, 4-6 or 7-8 would all represent distinct groups). Real data can be incomplete (see entries 7-8).

The "yield" column should be:

Cells where Sample="Post" divided by Cells where Sample="Pre"

Desired output:

    Date        Name    Sample Cells Yield
1   2001-08-27  Joe     Pre     15   NA
2   2001-08-27  Joe     Post    7    0.46
3   2001-08-27  Joe     Discard 3    NA
4   2001-11-13  Billy   Pre     12   NA
5   2001-11-13  Billy   Post    5    0.41
6   2001-11-13  Billy   Discard 2    NA
7   2012-08-03  Emma    Bone    14   NA
8   2012-08-03  Emma    Pre     NA   NA

I am new to R, and would like to use it efficiently (e.g. with dplyr). The above can be done through loops, but I am looking for a more elegant solution. I've consulted the following threads for guidance, but so far haven't found a solution:

Assign value to group based on condition in column

R create column from another column, depending on row

Conditional calculation in R based on Row values and categories

2 Answers2

2

If you are not too attached to that particular table format you can do the following:

library(dplyr)
library(tidyr)

df1 %>% 
    spread(Sample, Cells) %>% 
    mutate(Pre_Post_Yield = Post/Pre)

Which would return a slightly more human-readable table:

        Date  Name Bone Discard Post Pre Pre_Post_Yield
1 2001-08-27   Joe   NA       3    7  15      0.4666667
2 2001-11-13 Billy   NA       2    5  12      0.4166667
3 2012-08-03  Emma   14      NA   NA  NA             NA

To return to long format you can add gather(Sample, Cells, Bone:Pre). Note that the result will look quite different from your sample output because R will fill in variable combinations that weren't there before. It might look a little weird at first, but you'll learn that it's actually quite useful, for example because it makes your missing data explicit:

         Date  Name Pre_Post_Yield  Sample Cells
1  2001-08-27   Joe      0.4666667    Bone    NA
2  2001-11-13 Billy      0.4166667    Bone    NA
3  2012-08-03  Emma             NA    Bone    14
4  2001-08-27   Joe      0.4666667 Discard     3
5  2001-11-13 Billy      0.4166667 Discard     2
6  2012-08-03  Emma             NA Discard    NA
7  2001-08-27   Joe      0.4666667    Post     7
8  2001-11-13 Billy      0.4166667    Post     5
9  2012-08-03  Emma             NA    Post    NA
10 2001-08-27   Joe      0.4666667     Pre    15
11 2001-11-13 Billy      0.4166667     Pre    12
12 2012-08-03  Emma             NA     Pre    NA
  • 1
    That's a neat trick as well (spread and gather). I'll be sure to experiment with it. Apologies, but I've already accepted the first answer. Otherwise yours would have also been very good. Thanks! – Angry_at_Linux Jul 12 '19 at 05:19
1

You can do:

library(dplyr)

df1 %>%
  group_by(Date, Name) %>%
  mutate(Yield = ifelse(Sample == "Post", Cells[Sample == "Post"]/Cells[Sample == "Pre"], NA))

# A tibble: 8 x 5
# Groups:   Name [3]
  Date       Name  Sample  Cells  Yield
  <date>     <fct> <fct>   <dbl>  <dbl>
1 2001-08-27 Joe   Pre        15 NA    
2 2001-08-27 Joe   Post        7  0.467
3 2001-08-27 Joe   Discard     3 NA    
4 2001-11-13 Billy Pre        12 NA    
5 2001-11-13 Billy Post        5  0.417
6 2001-11-13 Billy Discard     2 NA    
7 2012-08-03 Emma  Bone       14 NA    
8 2012-08-03 Emma  Pre        NA NA    
Ritchie Sacramento
  • 29,890
  • 4
  • 48
  • 56