WEBINAR: Live Event Date: September 20, 2017 @ 1:00 p.m. ET / 10:00 a.m. PT
Designing a Proactive Approach to Information Security with Cyber Threat Hunting REGISTER >
It crawls the Web without malice seeking out every possible bit of content. It's name is Googlebot, and sometimes it gets to see things on the Web that the rest of don't.
Unless of course you pretend to be Googlebot.
Superficially spoofing Googlebot, Google's Web crawler, is not a difficult thing to do and was recently the subject of a very popular post on the Digg site. Since at least September of 2006 however, Google has made efforts to help webmasters protect themselves against spoofed Googlebots. That doesn't mean people still aren't trying to be Googlebot, if the popularity of the Digg post is any indication.
When Googlebot visits a site, the user agent is Googlebot. So to appear as Googlebot to a site, all you need to do is identify yourself, by way of the user agent, as Googlebot.
Doing so on Mozilla Firefox is a simple matter of using the User Agent Switcher extension, which allows Firefox users to be any user agent they choose. An even easier approach is to take advantage of BeTheBot, which enables users to see Web sites as Googlebot sees them.
Though using the Googlebot user agent may trick some sites into thinking you're actually Google's all-seeing Web crawler, it could also end up getting your IP address banned from sites if you get caught.
Since at least September, Google has provided webmasters with a definitive way to verify Googlebot. In addition to the user agent, there is at least one other key identifier for Googlebot, which is IP address.
"Any interested Web site owner can tell if a visitor is the real Googlebot," a Google spokesperson told internetnews.com. "Anyone can set their browser to pretend to be any "user agent" that they want. How a particular Web server decides to handle a particular browser or user agent is a choice that the Web site owner makes."
Google recommends that webmasters use DNS (define) to verify the identity of the user agent defined "googlebot" on a case-by-case basis doing a reverse DNS lookup that would verify that the suspect crawler is in the googlebot.com domain.