More Powerful Robots.txt Exclusion For Google
Posted by nullbit on November 7, 2005, 9:15 pm
SEO Book points to Dan Thies's finding of a useful but non-standard robots.txt feature supported by the Google spider:
[...] Google has introduced increased flexibility to the robots.txt file standard through the use asterisks. Disallow patterns may include "*" to match any sequence of characters, and patterns may end in "$" to indicate the end of a name. To remove all files of a specific file type (for example, to include .jpg but not .gif images), you'd use the following robots.txt entry:
User-agent: Googlebot-Image Disallow: /*.gif$
Although Google's explanation only mentions the wildcard syntax in context to their image bot, standard Googlebot also seems to understand it as well. For example:
User-agent: Googlebot Disallow: /*.php$
Would block all files ending with the php extension.
Add to del.icio.us | Post Comment
Discuss
No comments
Submit Comment
