More Powerful Robots.txt Exclusion For Google

Posted by nullbit on November 7, 2005, 9:15 pm

SEO Book points to Dan Thies's finding of a useful but non-standard robots.txt feature supported by the Google spider:

[...] Google has introduced increased flexibility to the robots.txt file standard through the use asterisks. Disallow patterns may include "*" to match any sequence of characters, and patterns may end in "$" to indicate the end of a name. To remove all files of a specific file type (for example, to include .jpg but not .gif images), you'd use the following robots.txt entry:

User-agent: Googlebot-Image
Disallow: /*.gif$

Although Google's explanation only mentions the wildcard syntax in context to their image bot, standard Googlebot also seems to understand it as well. For example:

User-agent: Googlebot
Disallow: /*.php$

Would block all files ending with the php extension.

Add to del.icio.us | Post Comment



Discuss

No comments


Submit Comment