52

I've been trying to get a specific regex working but I can't get it to do what I need.

Basically, I want it to look for ROCKET. The regex should match ROCKET in upper or lower cases, and with or without punctuation, but not when part of another word. So, the regex would trigger on any of these:

rocket
RoCKEt
hi Rocket
This is a rocket.
ROCKET's engine

but NOT trigger on ROCKET when it is found in something like

Rocketeer
Sprocket

I've been trying to get it right using a regex generator online but I can't get it to match exactly.

Kefka
  • 1,536
  • 2
  • 17
  • 32

7 Answers7

33

I suggest bookmarking the MSDN Regular Expression Quick Reference

you want to achieve a case insensitive match for the word "rocket" surrounded by non-alphanumeric characters. A regex that would work would be:

\W*((?i)rocket(?-i))\W*

What it will do is look for zero or more (*) non-alphanumeric (\W) characters, followed by a case insensitive version of rocket ( (?i)rocket(?-i) ), followed again by zero or more (*) non-alphanumeric characters (\W). The extra parentheses around the rocket-matching term assigns the match to a separate group. The word rocket will thus be in match group 1.

UPDATE 1: Matt said in the comment that this regex is to be used in python. Python has a slightly different syntax. To achieve the same result in python, use this regex and pass the re.IGNORECASE option to the compile or match function.

\W*(rocket)\W*

On Regex101 this can be simulated by entering "i" in the textbox next to the regex input.

UPDATE 2 Ismael has mentioned, that the regex is not quite correct, as it might match "1rocket1". He posted a much better solution, namely

(?:^|\W)rocket(?:$|\W)

Xaser
  • 956
24

I think the look-aheads are overkill in this case, and you would be better off using word boundaries with the ignorecase option,

\brocket\b

In other words, in python:

>>> x="rocket's"
>>> y="rocket1."
>>> c=re.compile(r"\brocket\b",re.I)  # with the ignorecase option
>>> c.findall(y)
[]
>>> c.findall(x)
['rocket']
beroe
  • 1,255
4

I don't have enough reputation to comment, so I have to make a post to share why I think the user beroe's solution is the best way to solve this problem. Take for example this string of text from the codewars challenge 'Most frequently used words in a text':

a a a b c c d d d d e e e e e

The goal of this challenge is to count the occurrences of words in the text. If we go with the most popular solution:

(?:^|\W)rocket(?:$|\W)

in our string of text if we search for 'a' instead of 'rocket' using re.findall for python it will only return two matches (the first and last a), since the \W capture overlaps the middle a from matching. Using \b for the word barrier on the other hand returns all 3 a's as matches

\brocket\b

Again, credit to user beroe's solution above.

Cyb3rKo
  • 103
Rob R
  • 41
1

With grep and sed, you can use \<rocket\>.  With grep, the -i option will make it case-insensitive (ignore case):

grep -i '\<rocket\>'

I don't know any way to make all sed regexes case-insensitive, but there's always the caveman way:

sed -n '/\<[Rr][Oo][Cc][Kk][Ee][Tt]\>/p'
0

Use the Search for whole words only option.

As far as punctuations, you can't answer it till you know the flavour/flavor.

It's a very old thread, so posted for someone who might visit with a need, later. Ones who originated the thread might have moved to something else... No?

0

I think you can use something like this to specific your word that you want: /^(rocket|RoCKEt)$/g

0

For online regex generators(if the text is constant):

/\brocket\b/gi

And if you need to use a variable in a regular expression, then: Ex.:

let inputStr = "I need to check the following text: rocket RoCKEt hi Rocket This is a rocket. ROCKET's engine Rocketeer Sprocket";

let replaceThis = "ROCKET"; let re = new RegExp(\\b${replaceThis}\\b, 'gi'); console.log(inputStr.replace(re, "****")); // "I need to check the following text: **** ****** hi ****** This is a ****. ****'s engine Rocketeer Sprocket"

Kishor
  • 101