How to send mail when Googlebot crawls a webpage of your website

How to send an email, as soon as Googlebot crawls a webpage of your website, in PHP?

Many of us have always wished to get some kind of intimation as soon as Google (Googlebot to be specific) crawls our websites. Isn’t it?

Don’t know if there are any tools available for the same but what about developing a small PHP script which does it for us?

The idea is to include (PHP include function) a small PHP script, in a webpage (probably homepage like index.php), which you want should intimate you, when it is being accessed by Google crawler, the great Googlebot. :)

Googlebot’s identity can be confirmed (not 100% though) by checking it’s “HTTP User Agent” along with reverse and forward DNS lookup of the Googlebot IP address.

Refer, for more info, http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=80553

Note: This is not a fool proof solution.

<?php
// www.expertsguide.info
// Put this code in some file, call that file from other weppages.
// For other crawlers change $crawler_name
/* To run only for google comment the foreach loop, array $crawlers and uncomment below two lines
//$crawler_name = 'Googlebot' ;
//$crawler_host_str = '.googlebot.com' ; */
$crawlers = array('Googlebot'=>'.googlebot.com',
                  'Yahoo! Slurp'=>'.crawl.yahoo.net',
                  'MSNBot'=> '.search.msn.com');
foreach($crawlers as $crawler_name => $crawler_host_str)
{
  if( isset($_SERVER['HTTP_USER_AGENT']) &&
      ( strpos($_SERVER['HTTP_USER_AGENT'], $crawler_name) !== false ) &&
      isset($_SERVER['REMOTE_ADDR']) )
  {
     $ip = $_SERVER['REMOTE_ADDR'];              //assume Googlebot never comes with fake IP
     $hostname = gethostbyaddr_timeout($ip);     //get the hostname (reverse DNS lookup)
     $hostip = gethostbyaddr_timeout($hostname); //get the hostname (forward DNS lookup)

     //if( preg_match("/$crawler_host_str/i",$hostname,$matches) && (strcmp($ip,$hostip)==0) )
     if( (strpos($hostname, $crawler_host_str)!==false) && (strcmp($ip,$hostip)==0) )   //no regex burden
     {
        $to       = "john.martin@example.com";               //put your email here
        $subject  = "$crawler_name just crawled your webpage";   //mail subject you want to see
        $message  = "";
        $message .= " $crawler_name HTTP User Agent: " . $_SERVER['HTTP_USER_AGENT'] . PHP_EOL;
        $message .= " IP: ".$ip. PHP_EOL;
        $message .= " Host name (reverse DNS lookup): ".$hostname . PHP_EOL;
        $message .= " Host IP (forward DNS lookup): ".$hostip . PHP_EOL;
        $message .= " Visited: ".$_SERVER['HTTP_HOST']."/".$_SERVER['SCRIPT_NAME'].PHP_EOL;

        mailit($to, $subject, $message);
     }
   }
}
///////////////////////////////////////////////////////////////////////////////////
function mailit($to, $subject, $message)
{
  $headers = '';
  // To send HTML mail, the Content-type header must be set, uncomment below two lines
  //$headers = 'MIME-Version: 1.0' . "\r\n";
  //$headers .= 'Content-type: text/html; charset=iso-8859-1' . "\r\n";
  //$headers .= 'To: receiver@gmail.com' . "\r\n";
  //$headers .= 'Cc: somebody@gmail.com' . "\r\n";
  //$headers .= 'Bcc: somebody@yahoo.co.in' . "\r\n";
  $headers .= 'From: donotreply@example.com '."\r\n";

  $message = "Email was sent from http://www.mydomain.com \r\n ".$message."\r\n";

  if( ! mail($to, $subject, $message, $headers) ) {
    //echo "Message delivery failed...";
  }
}

function gethostbyaddr_timeout($ip)
{
   //The -W option makes host wait for wait seconds.
   //If wait is less than one, the wait interval is set to one second.
   //When the -w option is used, host will effectively wait forever for a reply.
   //DONT USE IT HERE.
  $output = `host -W 1 $ip`;
  if( ereg('.* pointer ([A-Za-z0-9.-]+)\..*', $output, $regs) ) {
    return $regs[1];   //hostname
  }
  elseif( ereg('.* address ([A-Za-z0-9.-]+).*', $output, $regs) )
  {
    return $regs[1];  //IP address
  }
  return $ip;
}
?>

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>