How to send an email, as soon as Googlebot crawls a webpage of your website, in PHP?
Many of us have always wished to get some kind of intimation as soon as Google (Googlebot to be specific) crawls our websites. Isn’t it?
Don’t know if there are any tools available for the same but what about developing a small PHP script which does it for us?
The idea is to include (PHP include function) a small PHP script, in a webpage (probably homepage like index.php), which you want should intimate you, when it is being accessed by Google crawler, the great Googlebot. ![]()
Googlebot’s identity can be confirmed (not 100% though) by checking it’s “HTTP User Agent” along with reverse and forward DNS lookup of the Googlebot IP address.
Refer, for more info, http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=80553
Note: This is not a fool proof solution.
<?php // www.expertsguide.info // Put this code in some file, call that file from other weppages. // For other crawlers change $crawler_name /* To run only for google comment the foreach loop, array $crawlers and uncomment below two lines //$crawler_name = 'Googlebot' ; //$crawler_host_str = '.googlebot.com' ; */ $crawlers = array('Googlebot'=>'.googlebot.com', 'Yahoo! Slurp'=>'.crawl.yahoo.net', 'MSNBot'=> '.search.msn.com'); foreach($crawlers as $crawler_name => $crawler_host_str) { if( isset($_SERVER['HTTP_USER_AGENT']) && ( strpos($_SERVER['HTTP_USER_AGENT'], $crawler_name) !== false ) && isset($_SERVER['REMOTE_ADDR']) ) { $ip = $_SERVER['REMOTE_ADDR']; //assume Googlebot never comes with fake IP $hostname = gethostbyaddr_timeout($ip); //get the hostname (reverse DNS lookup) $hostip = gethostbyaddr_timeout($hostname); //get the hostname (forward DNS lookup) //if( preg_match("/$crawler_host_str/i",$hostname,$matches) && (strcmp($ip,$hostip)==0) ) if( (strpos($hostname, $crawler_host_str)!==false) && (strcmp($ip,$hostip)==0) ) //no regex burden { $to = "john.martin@example.com"; //put your email here $subject = "$crawler_name just crawled your webpage"; //mail subject you want to see $message = ""; $message .= " $crawler_name HTTP User Agent: " . $_SERVER['HTTP_USER_AGENT'] . PHP_EOL; $message .= " IP: ".$ip. PHP_EOL; $message .= " Host name (reverse DNS lookup): ".$hostname . PHP_EOL; $message .= " Host IP (forward DNS lookup): ".$hostip . PHP_EOL; $message .= " Visited: ".$_SERVER['HTTP_HOST']."/".$_SERVER['SCRIPT_NAME'].PHP_EOL; mailit($to, $subject, $message); } } } /////////////////////////////////////////////////////////////////////////////////// function mailit($to, $subject, $message) { $headers = ''; // To send HTML mail, the Content-type header must be set, uncomment below two lines //$headers = 'MIME-Version: 1.0' . "\r\n"; //$headers .= 'Content-type: text/html; charset=iso-8859-1' . "\r\n"; //$headers .= 'To: receiver@gmail.com' . "\r\n"; //$headers .= 'Cc: somebody@gmail.com' . "\r\n"; //$headers .= 'Bcc: somebody@yahoo.co.in' . "\r\n"; $headers .= 'From: donotreply@example.com '."\r\n"; $message = "Email was sent from http://www.mydomain.com \r\n ".$message."\r\n"; if( ! mail($to, $subject, $message, $headers) ) { //echo "Message delivery failed..."; } } function gethostbyaddr_timeout($ip) { //The -W option makes host wait for wait seconds. //If wait is less than one, the wait interval is set to one second. //When the -w option is used, host will effectively wait forever for a reply. //DONT USE IT HERE. $output = `host -W 1 $ip`; if( ereg('.* pointer ([A-Za-z0-9.-]+)\..*', $output, $regs) ) { return $regs[1]; //hostname } elseif( ereg('.* address ([A-Za-z0-9.-]+).*', $output, $regs) ) { return $regs[1]; //IP address } return $ip; } ?>
Keep checking this article for more updates.
Hello man, I was searching google and searching for an article to see and stumbled on your weblog. I am really glad I did, you posted some great info. Did a fast bookmark on this article and will be checking back every once in a while to see if you submit any more stuff. Cool stuff keep up the wonderful work.
Like or Dislike:
0
0
Aw, this was a really quality post. In theory I’d like to write like this too – taking time and real effort to make a good article… but what can I say… I procrastinate alot and never seem to get something done.
Like or Dislike:
1
1
Hey admin, very informative blog post! Pleasee continue this awesome work..
Like or Dislike:
1
0
Please tell me it worked right? I dont want to sumit it again if i do not have to! Either the blog glitced out or i am an idiot, the second option doesnt surprise me lol. thanks for a great blog!
Like or Dislike:
0
0
Ofcourse, it works. You might need to change some variables like your email id etc. I can help you out in setting them correct, if you want.
Like or Dislike:
1
0
Usually I do not post on blogs, but I would like to say that this article really forced me to do so! Thanks, really nice article.
Like or Dislike:
0
0
Wow – now that’s perspective! I think we often react in agreement or disagreement because of our emotions, but hearing another side, passionately presented, really makes us think!
Like or Dislike:
0
0
I love everything about this! Woo!
Like or Dislike:
0
0
Hello I was just checking If somebody could help me out with this , I view this blog a fair bit but sometimes the background keeps messing up and I cant read the text. PLease help me
Like or Dislike:
0
0
What problem are you facing … can you please gimme some more details? Your browser and it version. Computer screen size etc.
Like or Dislike:
0
0
hi,now, i saw this site,is good
Like or Dislike:
0
0
Now that’s what exactly I was looking for. If it is really working then you need applause. I’ll try running it and give you feedback.
Like or Dislike:
0
0
One more thing…Will it work for Yahoo and Bing crawlers too?
Like or Dislike:
0
0
As a Newbie, I’m usually searching online for articles that can help me. Thank you.
Like or Dislike:
0
0
Glad to see that this site works well on my Google phone , everything I want to do is functional. Thanks for keeping it up to date with the latest.
Like or Dislike:
1
0
Your blog is very interesting. Thanks for the information.
Like or Dislike:
0
0
Good work! Your post is an excellent example of why I keep comming back to read your excellent quality content that is forever updated. Thank you!
Like or Dislike:
0
0
Great post. I will be bookmarking and sharing it with my social networks.
Like or Dislike:
0
0
I’ve just subscribed to your RSS feed. I love your content.
Like or Dislike:
0
0
That is some inspirational stuff. Never knew that opinions could be this varied. Thanks for all the enthusiasm to offer such helpful information here.
Like or Dislike:
0
0
Loved it, thanks.
Like or Dislike:
0
0
thanks a lot the information
Like or Dislike:
0
0
wow… thanks a lot writing this
Like or Dislike:
0
0
howdy, I view all your blog posts, keep them coming.
Like or Dislike:
0
0
Wow, thanks for publishing this great article. I’m a real enthusiast and get pleasure from reading your web-site. Believe it or not it’s generally rather tough to expose this information penned with any actual sense of quailty like it is here. Thanks.
Like or Dislike:
0
0
You made some good points there. I did a search on the topic and found most people will agree with your blog.
Like or Dislike:
0
0
gr8 resrch bro…
Like or Dislike:
0
0
You you should make changes to the blog name title Send mail when Googlebot crawls a webpage in PHP | Experts Guide to more suited for your blog post you create. I liked the post nevertheless.
Like or Dislike:
0
0
what do you suggest?
Like or Dislike:
0
0
i want it
Like or Dislike:
0
0
I really don’t accept this article. Nevertheless, I did researched in Google and I’ve found out you are correct and I seemed to be thinking in the wrong way. Keep on producing good quality articles like this.
Like or Dislike:
0
0
Hi thanks for this. If I put this at the beginning of my current index.php I get errors about session headers already having been sent. I understand what this means but am not sure where to relocate this code. Thanks!
Like or Dislike:
0
0
This code doesn’t send any headers or any output. It should not give you any error.
Try putting this somewhere else (eg at the bottom of index.php or in some other page).
Like or Dislike:
1
0
Hey thanks admin. Silly me, I had placed an echo in there when I was testing and my subsequent coding tries to send headers.
That being said, I cannot get the script to work at all in terms of functionaility. The hostname lookups work fine, so does the mailit function but I have had hits from at least these two without anything happening: crawl-66-249-67-184.googlebot.com
b3091002.crawl.yahoo.net
I added ‘Test’=>’.wp.shawcable.net’ to the array definition and an echo before the mailit call and nothing happens. No error, nothing is triggered.
Like or Dislike:
0
0
ooops my test didn’t make sense that’s my host not useragent. Will do another test against:
Mozilla/5.0 (compatible; Googlebot/2.1; http://www.google.com/bot.html)
Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)
Like or Dislike:
0
0
Ahhh yes so your code is of course fine, however your array definitions ought to be tweaked ;
'.googlebot.com');
foreach($crawlers as $crawler_name)
{
if (strpos($test, $crawler_name) !== false )
{
echo "BOOM!";
}
}
?>
does indeed go boom
Like or Dislike:
0
0
you should check the ip also, as I did. Then only you be sure that it was a genuine googlebot. That is reverse DNS lookup. Refer the link for more info, google.com/support/webmasters/bin/answer.py?hl=en&answer=80553
Like or Dislike:
2
0
Nice work. Code really works.
Like or Dislike:
0
0