I have written a webcrawler that crawls a website with keyward but i want to login to my specified website and filter information by keyword.How to achive that. i posting my code so far i have done .
public class DB {
public Connection conn = null;
public DB() {
try {
Class.forName("com.mysql.jdbc.Driver");
String url = "jdbc:mysql://localhost:3306/test";
conn = DriverManager.getConnection(url, "root","root");
System.out.println("conn built");
} catch (SQLException e) {
e.printStackTrace();
} catch (ClassNotFoundException e) {
e.printStackTrace();
}
}
public ResultSet runSql(String sql) throws SQLException {
Statement sta = conn.createStatement();
return sta.executeQuery(sql);
}
public boolean runSql2(String sql) throws SQLException {
Statement sta = conn.createStatement();
return sta.execute(sql);
}
@Override
protected void finalize() throws Throwable {
if (conn != null || !conn.isClosed()) {
conn.close();
}
}
}
public class Main {
public static DB db = new DB();
public static void main(String[] args) throws SQLException, IOException {
db.runSql2("TRUNCATE Record;");
processPage("http://m.naukri.com/login");
}
public static void processPage(String URL) throws SQLException, IOException{
//check if the given URL is already in database;
String sql = "select * from Record where URL = '"+URL+"'";
ResultSet rs = db.runSql(sql);
if(rs.next()){
}else{
//store the URL to database to avoid parsing again
sql = "INSERT INTO `test`.`Record` " + "(`URL`) VALUES " + "(?);";
PreparedStatement stmt = db.conn.prepareStatement(sql, Statement.RETURN_GENERATED_KEYS);
stmt.setString(1, URL);
stmt.execute();
//get useful information
Connection.Response res = Jsoup.connect("http://www.naukri.com/").data("username","jeet.chatterjee.88@gmail.com","password","Letmein321")
.method(Method.POST)
.execute();
//http://m.naukri.com/login
Map<String, String> loginCookies = res.cookies();
Document doc = Jsoup.connect("http://m.naukri.com/login")
.cookies(loginCookies)
.get();
if(doc.text().contains("")){
System.out.println(URL);
}
//get all links and recursively call the processPage method
Elements questions = doc.select("a[href]");
for(Element link: questions){
if(link.attr("abs:href").contains("naukri.com"))
processPage(link.attr("abs:href"));
}
}
}
}
And the table structure also
CREATE TABLE IF NOT EXISTS `Record` (
`RecordID` INT(11) NOT NULL AUTO_INCREMENT,
`URL` text NOT NULL,
PRIMARY KEY (`RecordID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
Now i want to use my username and password for that crawling so that crawler can log in to the site dynamically and crawl infomation on the basis of keyword.. Lets say my username is lucifer & password is lucifer123
