How to send mail when Googlebot crawls a webpage of your website

Rating 4.00 out of 5

How to send an email, as soon as Googlebot crawls a webpage of your website, in PHP?

Many of us have always wished to get some kind of intimation as soon as Google (Googlebot to be specific) crawls our websites. Isn’t it?
Don’t know if there are any tools available for the same but what about developing a small PHP script which does it for us?

The idea is to include (PHP include function) a small PHP script, in a webpage (probably homepage like index.php), which you want should intimate you, when it is being accessed by Google crawler, the great Googlebot. :)
Googlebot’s identity can be confirmed (not 100% though) by checking it’s “HTTP User Agent” along with reverse and forward DNS lookup of the Googlebot IP address.

Refer, for more info, http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=80553

Note: This is not a fool proof solution.

<?php
// www.expertsguide.info
// Put this code in some file, call that file from other weppages.
// For other crawlers change $crawler_name
/* To run only for google comment the foreach loop, array $crawlers and uncomment below two lines
//$crawler_name = 'Googlebot' ;
//$crawler_host_str = '.googlebot.com' ; */
$crawlers = array('Googlebot'=>'.googlebot.com',
                  'Yahoo! Slurp'=>'.crawl.yahoo.net',
                  'MSNBot'=> '.search.msn.com');
foreach($crawlers as $crawler_name => $crawler_host_str)
{
  if( isset($_SERVER['HTTP_USER_AGENT']) &&
      ( strpos($_SERVER['HTTP_USER_AGENT'], $crawler_name) !== false ) &&
      isset($_SERVER['REMOTE_ADDR']) )
  {
     $ip = $_SERVER['REMOTE_ADDR'];              //assume Googlebot never comes with fake IP
     $hostname = gethostbyaddr_timeout($ip);     //get the hostname (reverse DNS lookup)
     $hostip = gethostbyaddr_timeout($hostname); //get the hostname (forward DNS lookup)

     //if( preg_match("/$crawler_host_str/i",$hostname,$matches) && (strcmp($ip,$hostip)==0) )
     if( (strpos($hostname, $crawler_host_str)!==false) && (strcmp($ip,$hostip)==0) )   //no regex burden
     {
        $to       = "john.martin@example.com";               //put your email here
        $subject  = "$crawler_name just crawled your webpage";   //mail subject you want to see
        $message  = "";
        $message .= " $crawler_name HTTP User Agent: " . $_SERVER['HTTP_USER_AGENT'] . PHP_EOL;
        $message .= " IP: ".$ip. PHP_EOL;
        $message .= " Host name (reverse DNS lookup): ".$hostname . PHP_EOL;
        $message .= " Host IP (forward DNS lookup): ".$hostip . PHP_EOL;
        $message .= " Visited: ".$_SERVER['HTTP_HOST']."/".$_SERVER['SCRIPT_NAME'].PHP_EOL;

        mailit($to, $subject, $message);
     }
   }
}
///////////////////////////////////////////////////////////////////////////////////
function mailit($to, $subject, $message)
{
  $headers = '';
  // To send HTML mail, the Content-type header must be set, uncomment below two lines
  //$headers = 'MIME-Version: 1.0' . "\r\n";
  //$headers .= 'Content-type: text/html; charset=iso-8859-1' . "\r\n";
  //$headers .= 'To: receiver@gmail.com' . "\r\n";
  //$headers .= 'Cc: somebody@gmail.com' . "\r\n";
  //$headers .= 'Bcc: somebody@yahoo.co.in' . "\r\n";
  $headers .= 'From: donotreply@example.com '."\r\n";

  $message = "Email was sent from http://www.mydomain.com \r\n ".$message."\r\n";

  if( ! mail($to, $subject, $message, $headers) ) {
    //echo "Message delivery failed...";
  }
}

function gethostbyaddr_timeout($ip)
{
   //The -W option makes host wait for wait seconds.
   //If wait is less than one, the wait interval is set to one second.
   //When the -w option is used, host will effectively wait forever for a reply.
   //DONT USE IT HERE.
  $output = `host -W 1 $ip`;
  if( ereg('.* pointer ([A-Za-z0-9.-]+)\..*', $output, $regs) ) {
    return $regs[1];   //hostname
  }
  elseif( ereg('.* address ([A-Za-z0-9.-]+).*', $output, $regs) )
  {
    return $regs[1];  //IP address
  }
  return $ip;
}
?>

Keep checking this article for more updates.

38 comments to How to send mail when Googlebot crawls a webpage of your website

  • Wendi Movius

    Hello man, I was searching google and searching for an article to see and stumbled on your weblog. I am really glad I did, you posted some great info. Did a fast bookmark on this article and will be checking back every once in a while to see if you submit any more stuff. Cool stuff keep up the wonderful work.

    Like or Dislike: Thumb up 0 Thumb down 0

  • how to get rid stomach fat

    Aw, this was a really quality post. In theory I’d like to write like this too – taking time and real effort to make a good article… but what can I say… I procrastinate alot and never seem to get something done.

    Like or Dislike: Thumb up 1 Thumb down 1

  • there cheats for farmville

    Hey admin, very informative blog post! Pleasee continue this awesome work..

    Like or Dislike: Thumb up 1 Thumb down 0

  • Conservative

    Please tell me it worked right? I dont want to sumit it again if i do not have to! Either the blog glitced out or i am an idiot, the second option doesnt surprise me lol. thanks for a great blog!

    Like or Dislike: Thumb up 0 Thumb down 0

    • admin

      Ofcourse, it works. You might need to change some variables like your email id etc. I can help you out in setting them correct, if you want.

      Like or Dislike: Thumb up 1 Thumb down 0

  • usedcarsinkerala

    Usually I do not post on blogs, but I would like to say that this article really forced me to do so! Thanks, really nice article.

    Like or Dislike: Thumb up 0 Thumb down 0

  • Gravura

    Wow – now that’s perspective! I think we often react in agreement or disagreement because of our emotions, but hearing another side, passionately presented, really makes us think!

    Like or Dislike: Thumb up 0 Thumb down 0

  • key west motels

    I love everything about this! Woo!

    Like or Dislike: Thumb up 0 Thumb down 0

  • takapuna doctor

    Hello I was just checking If somebody could help me out with this , I view this blog a fair bit but sometimes the background keeps messing up and I cant read the text. PLease help me

    Like or Dislike: Thumb up 0 Thumb down 0

    • admin

      What problem are you facing … can you please gimme some more details? Your browser and it version. Computer screen size etc.

      Like or Dislike: Thumb up 0 Thumb down 0

  • sam ciin

    hi,now, i saw this site,is good

    Like or Dislike: Thumb up 0 Thumb down 0

  • Pankaj Pratap

    Now that’s what exactly I was looking for. If it is really working then you need applause. I’ll try running it and give you feedback.

    Like or Dislike: Thumb up 0 Thumb down 0

  • Pankaj Pratap

    One more thing…Will it work for Yahoo and Bing crawlers too?

    Like or Dislike: Thumb up 0 Thumb down 0

  • car covers

    As a Newbie, I’m usually searching online for articles that can help me. Thank you.

    Like or Dislike: Thumb up 0 Thumb down 0

  • tinnitis

    Glad to see that this site works well on my Google phone , everything I want to do is functional. Thanks for keeping it up to date with the latest.

    Like or Dislike: Thumb up 1 Thumb down 0

  • Mirtha Batis

    Your blog is very interesting. Thanks for the information.

    Like or Dislike: Thumb up 0 Thumb down 0

  • Luigi Fulk

    Good work! Your post is an excellent example of why I keep comming back to read your excellent quality content that is forever updated. Thank you!

    Like or Dislike: Thumb up 0 Thumb down 0

  • annuity

    Great post. I will be bookmarking and sharing it with my social networks.

    Like or Dislike: Thumb up 0 Thumb down 0

  • Antoine Arnerich

    I’ve just subscribed to your RSS feed. I love your content.

    Like or Dislike: Thumb up 0 Thumb down 0

  • opony

    That is some inspirational stuff. Never knew that opinions could be this varied. Thanks for all the enthusiasm to offer such helpful information here.

    Like or Dislike: Thumb up 0 Thumb down 0

  • Zona Palczynski

    Loved it, thanks.

    Like or Dislike: Thumb up 0 Thumb down 0

  • Keenan Richberg

    thanks a lot the information

    Like or Dislike: Thumb up 0 Thumb down 0

  • Wilbur Glucksman

    wow… thanks a lot writing this

    Like or Dislike: Thumb up 0 Thumb down 0

  • Elena Standfield

    howdy, I view all your blog posts, keep them coming.

    Like or Dislike: Thumb up 0 Thumb down 0

  • Patricia Oravec

    Wow, thanks for publishing this great article. I’m a real enthusiast and get pleasure from reading your web-site. Believe it or not it’s generally rather tough to expose this information penned with any actual sense of quailty like it is here. Thanks.

    Like or Dislike: Thumb up 0 Thumb down 0

  • Benjamin

    You made some good points there. I did a search on the topic and found most people will agree with your blog.

    Like or Dislike: Thumb up 0 Thumb down 0

  • Alva Oaks

    gr8 resrch bro…

    Like or Dislike: Thumb up 0 Thumb down 0

  • Pointer Men's Basketball

    You you should make changes to the blog name title Send mail when Googlebot crawls a webpage in PHP | Experts Guide to more suited for your blog post you create. I liked the post nevertheless.

    Like or Dislike: Thumb up 0 Thumb down 0

  • Laura J. Simon

    i want it

    Like or Dislike: Thumb up 0 Thumb down 0

  • graphic design careers

    I really don’t accept this article. Nevertheless, I did researched in Google and I’ve found out you are correct and I seemed to be thinking in the wrong way. Keep on producing good quality articles like this.

    Like or Dislike: Thumb up 0 Thumb down 0

  • paul

    Hi thanks for this. If I put this at the beginning of my current index.php I get errors about session headers already having been sent. I understand what this means but am not sure where to relocate this code. Thanks!

    Like or Dislike: Thumb up 0 Thumb down 0

    • admin

      This code doesn’t send any headers or any output. It should not give you any error.
      Try putting this somewhere else (eg at the bottom of index.php or in some other page).

      Like or Dislike: Thumb up 1 Thumb down 0

  • paul

    Hey thanks admin. Silly me, I had placed an echo in there when I was testing and my subsequent coding tries to send headers.

    That being said, I cannot get the script to work at all in terms of functionaility. The hostname lookups work fine, so does the mailit function but I have had hits from at least these two without anything happening: crawl-66-249-67-184.googlebot.com
    b3091002.crawl.yahoo.net

    I added ‘Test’=>’.wp.shawcable.net’ to the array definition and an echo before the mailit call and nothing happens. No error, nothing is triggered.

    Like or Dislike: Thumb up 0 Thumb down 0

  • paul

    ooops my test didn’t make sense that’s my host not useragent. Will do another test against:

    Mozilla/5.0 (compatible; Googlebot/2.1; http://www.google.com/bot.html)
    Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)

    Like or Dislike: Thumb up 0 Thumb down 0

  • paul

    Ahhh yes so your code is of course fine, however your array definitions ought to be tweaked ;


    '.googlebot.com');
    foreach($crawlers as $crawler_name)
    {
    if (strpos($test, $crawler_name) !== false )
    {
    echo "BOOM!";
    }
    }
    ?>

    does indeed go boom

    Like or Dislike: Thumb up 0 Thumb down 0

    • admin

      you should check the ip also, as I did. Then only you be sure that it was a genuine googlebot. That is reverse DNS lookup. Refer the link for more info, google.com/support/webmasters/bin/answer.py?hl=en&answer=80553

      Like or Dislike: Thumb up 2 Thumb down 0

  • Mark

    Nice work. Code really works.

    Like or Dislike: Thumb up 0 Thumb down 0

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>