we already fetched the URLs and stored in the db using jsoup lib.Now we are looking to extract the data and store in db,but we are looking only specific field,rather than storing the whole page. for example :http://www.flipkart.com/shoes/ when we fetch this link, we need field like brands ,prices, reviews etc.. using java code!! Please help !
Asked
Active
Viewed 509 times
1 Answers
-2
There are two ways you can filter out the whole content,
- Apply
Regexon the response content and extract the needed fields. - Using
xpathyou can extract the needed fields (Preferred and recommended way of parsing).
Ex: 1 - Regex
- Generate the
regexpattern for your selected page. - Get the response as
Stringand apply the pattern and retrieve the data.
Ex: 2 - XPath
- Identify the methodolgy to locate each and every html element uniquely (Or list)
- Get the response as
html/xmlform and apply thexpathon the retrieved content and get the data.
Vikrant Kashyap
- 6,398
- 3
- 32
- 52
Hakuna Matata
- 755
- 3
- 13
-
1Regex should not be used to parse html. http://stackoverflow.com/a/6751339/1176178 – Zack Aug 02 '16 at 13:08