0

I have a function that logins into a site and searches for a string in the following page. The process currently takes 10 seconds, but wanted to see if there was anything I could do to speed it up. I wonder if was possible to have the curl login persist over clients session or maybe search the document better.

public function curlLogin($url, $post_values, $cookieJar) {

        $timeout = 30;

        $curl_connection = curl_init();
        curl_setopt($curl_connection, CURLOPT_URL, $url);
        curl_setopt($curl_connection, CURLOPT_TIMEOUT, $timeout);
        curl_setopt($curl_connection, CURLOPT_USERAGENT,"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)");
        curl_setopt($curl_connection, CURLOPT_COOKIEJAR, $cookieJar);
        curl_setopt($curl_connection, CURLOPT_COOKIEFILE, $cookieJar);
        curl_setopt($curl_connection, CURLOPT_COOKIESESSION, 0);
        curl_setopt($curl_connection, CURLOPT_HEADER, 1);
        curl_setopt($curl_connection, CURLOPT_RETURNTRANSFER, 1);
        curl_setopt($curl_connection, CURLOPT_SSL_VERIFYPEER, 0);
        curl_setopt($curl_connection, CURLOPT_POST, 1);
        curl_setopt($curl_connection, CURLOPT_POSTFIELDS, $post_values);
        curl_setopt($curl_connection, CURLOPT_HTTPHEADER,
        array("Content-type: application/x-www-form-urlencoded"));
        curl_exec($curl_connection);
        return $curl_connection;

    }

    public function curlPost($curl_connection, $url, $post_values, $cookieJar) {

        $timeout = 30;

        curl_setopt($curl_connection, CURLOPT_URL, $url);
        curl_setopt($curl_connection, CURLOPT_TIMEOUT, $timeout);
        curl_setopt($curl_connection, CURLOPT_USERAGENT,"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)");
        curl_setopt($curl_connection, CURLOPT_COOKIEJAR, $cookieJar);
        curl_setopt($curl_connection, CURLOPT_COOKIEFILE, $cookieJar);
        curl_setopt($curl_connection, CURLOPT_COOKIESESSION, 0);
        curl_setopt($curl_connection, CURLOPT_HEADER, 1);
        curl_setopt($curl_connection, CURLOPT_RETURNTRANSFER, 1);
        curl_setopt($curl_connection, CURLOPT_SSL_VERIFYPEER, 0);
        curl_setopt($curl_connection, CURLOPT_POST, 1);
        curl_setopt($curl_connection, CURLOPT_POSTFIELDS, $post_values);
        curl_setopt($curl_connection, CURLOPT_HTTPHEADER,
        array("Content-type: application/x-www-form-urlencoded"));
        $result = curl_exec($curl_connection);
        return $result;

    }

$cookieJar = tempnam ("/tmp", "CURLCOOKIE");

$curl_connection = $this->curlLogin($login_url, $post_values, $cookieJar);

$result = $this->curlPost($curl_connection, $next_url, $params, $cookieJar);

if (strpos($result,'string 1') > 0) {
    $success = true;
    $message = 'string 1 is present';
}else if (strpos($result,'string 2') > 0){
    $success = false;
    $message = 'string 2 is present';
}else if (strpos($result,'string 3') > 0){
    $success = false;
    $message = 'string 3 is present';
}else{
    $success = false;
    $message = 'None of the above strings are present.';
}

curl_close($curl_connection);
unlink($cookieJar);
madphp
  • 1,716
  • 5
  • 31
  • 72
  • possible duplicate of [php - Fastest way to check presence of text in many domains (above 1000)](http://stackoverflow.com/questions/12891689/php-fastest-way-to-check-presence-of-text-in-many-domains-above-1000) , [How to prevent server from overloading during Curl requests in PHP](http://stackoverflow.com/questions/13461194/how-to-prevent-server-from-overloading-during-curl-requests-in-php/13461652) , [php get all the images from url which width and height >=200 more quicker](http://stackoverflow.com/a/10036599/1226894) – Baba May 02 '13 at 18:07
  • when you load those pages in firefox monitoring them with firebug, what do your page load times say? – Zak May 02 '13 at 18:07
  • Searching for substrings is so fast compared to the rest, that you won't gain anything by looking in that direction. – mzedeler May 02 '13 at 18:07
  • Remember EVERY Curl operation is a separate HTTP request, do you really have to design the login like that? You login runs 3 HTTP request minimum. – Hermann Stephane Ntsamo May 02 '13 at 18:10
  • It takes 10 seconds for page load through firebug. Those other answers say use curl_multi, but Im accessin same site and I have to wait til login successful before scraping other page. Herman - Where is it doing those 3 requests exactly? – madphp May 02 '13 at 18:15
  • I also wondered how i could skip the login, after a cookie was saved. – madphp May 02 '13 at 18:17
  • The Curl function 'curl_exec' initiate an HTTP request. When some body press on login that 1 HTTP request, when you call '$this->curlLogin' that initiate another HTTP request and at last when you call '$this->curlPost' that is another HTTP request. That makes 3 as far as I can see. If those URLs that you used in the CURL operation already are slow to load that might be you problem. Test them seperately first and see and improve the slow ones. – Hermann Stephane Ntsamo May 02 '13 at 18:21
  • From what i can see the initial login is the slowest one. – madphp May 02 '13 at 18:29

1 Answers1

2

You can avoid logging in every time by reusing your cookiejar.

Create a file called cookies.txt in the directory containing your script and assign: $cookieJar = 'cookies.txt'.

After running the script for the first time, simply remove call to the curlLogin() function and your curlPost() function should use the cookies correctly and return data as if you were logged in.

Remember, CURLOPT_COOKIEFILE is to specify where to "read" cookies from and CURLOPT_COOKIEJAR is where you want the response cookies to be written.

So you could probably do without CURLOPT_COOKIEJAR in your curlPost() function.

imlokesh
  • 2,506
  • 2
  • 22
  • 26