28

I have a dataset like the one below. I would like to remove all characters after the character ©. How can I do that in R?

data_clean_phrase <- c("Copyright © The Society of Geomagnetism and Earth", 
"© 2013 Chinese National Committee ")

data_clean_df <- as.data.frame(data_clean_phrase)

Ethan
  • 1,657
  • 9
  • 25
  • 39
Hamideh
  • 942
  • 2
  • 12
  • 22

2 Answers2

31

For instance:

 rs<-c("copyright @ The Society of mo","I want you to meet me @ the coffeshop")
 s<-gsub("@.*","",rs)
 s
 [1] "copyright "             "I want you to meet me "

Or, if you want to keep the @ character:

 s<-gsub("(@).*","\\1",rs)
 s
 [1] "copyright @"             "I want you to meet me @"

EDIT: If what you want is to remove everything from the last @ on you just have to follow this previous example with the appropriate regex. Example:

rs<-c("copyright @ The Society of mo located @ my house","I want you to meet me @ the coffeshop")
s<-gsub("(.*)@.*","\\1",rs)
s
[1] "copyright @ The Society of mo located " "I want you to meet me "

Given the matching we are looking for, both sub and gsub will give you the same answer.

MASL
  • 511
  • 5
  • 8
0

For the sake of completeness: You could use the stringr package to extract what you want.

library(stringr)
data_clean_phrase <- c("Copyright © The Society of Geomagnetism and Earth", 
                       "© 2013 Chinese National Committee ")

str_extract(data_clean_phrase, "^(.?©)") # including the @ str_extract(data_clean_phrase, "^.(?=(©))") # excluding the @

Note: I chose to str_extract, you could also choose to str_remove.

ToWii
  • 101
  • 1