I am using storm crawler 1.10 and Elastic Search 6.3.x. For Example I have a main website https://www.abce.org and it has subpages like https://abce.org/def and https://abce.org/ghi. I want to crawl specifically the pages under https://www.abce.org/ghi.
My seed Url is https://www.abce.org/ghi/.
Currently I applied below different regex filters at each time.
+^https:\/\/www.abce.org\/ghi*+^(?:https?:\/\/)www.abce.org\/ghi(.+)*$+^(?:https?:\/\/)?(?:www\.)?abce\.[a-zA-Z0-9.\S]+$
I tested my regex expressions regexr its shows valid. But when I check on statusindex its displaying only discovered seed url and nothing else.