# /robots.txt file for http://webcrawler.com/ # mail webmaster@webcrawler.com for constructive criticism User-agent: * Disallow: /Internal/ Disallow: /Internal2/ Disallow: /Templates/ Disallow: /blast-fgpc/ Disallow: /fugu-bin/ Disallow: /fgpc-bin/ Disallow: /testfgpc-bin/ Disallow: /testfugu-bin/ #User-agent: webcrawler #Disallow: #User-agent: lycra #Disallow: / # The first two lines, starting with '#', specify a comment # The first paragraph specifies that the robot called 'webcrawler' # has nothing disallowed: it may go anywhere. # The second paragraph indicates that the robot called 'lycra' has all # relative URLs starting with '/' disallowed. Because all relative URL's # on a server start with '/', this means the entire site is closed off. # The third paragraph indicates that all other robots should not visit # URLs starting with /tmp or /log. Note the '*' is a special token, # meaning "any other User-agent"; you cannot use wildcard patterns or # regular expressions in either User-agent or Disallow lines. # Two common errors: # Wildcards are _not_ supported: instead of 'Disallow: /tmp/*' just say # 'Disallow: /tmp'. # You shouldn't put more than one path on a Disallow line (this may change # in a future version of the spec) #What if I can't make a /robots.txt file? # Sometimes you cannot make a /robots.txt file, because you don't # administer the entire server. All is not lost: there is a new standard # for using HTML META tags to keep robots out of your documents. # The basic idea is that if you include a tag like: # # in your HTML document, that document won't be indexed. # If you do: # #the links in that document will not be parsed by the robot.