Yahoo! crawls billions of pages from the Web and uses a large number of crawler systems to accomplish this task. Therefore, your web server could log requests from a number of different Yahoo! web crawlers. The various Yahoo! crawler systems are coordinated to limit the activity on any single web server.
NOTE: Yahoo! determines a single "web server" by IP address, so if your host is serving multiple IP addresses you could see higher levels of crawler activity on your server.
Exclusion Rules
If there are directories on your web server which you do not want crawled and displayed in the web search results, use the crawler (robot) exclusion rules as described in "How do I prevent my site or certain subdirectories from being crawled?" Creating an exclusion rule can reduce the number of pages the Yahoo! crawler systems read from your server.
Yahoo! recommends that you restrict total crawler activity on your server by disallowing unimportant content with robots.txt exclusion rules in your User-agent: Slurp section (See "How do I prevent my site or certain subdirectories from being crawled?").
Dynamic URL Rewrite
Sites using dynamic URLs can generate a very large number of different URLs for the same page content, which leads to excessive crawler activity. The Dynamic URLs control in Site Explorer allows you to define query string elements that the Yahoo! Slurp can rewrite to avoid generating large numbers of duplicate URLs for a page. See Dynamic URLs help page for more information.
Crawl-delay
Yahoo! has a Slurp-specific robots.txt extension that allows you to set a limit on our crawler request rate. This extension is set in the robots.txt file by adding a "Crawl-delay: x.x" rule, where "x.x" is a "delay value". This "delay value" increases the time between successive Yahoo! crawler activities, and lowers the access rate of Slurp to your server.
Setting the "delay value" in robots.txt to a high value, for example 5 or 10, sets a greater delay for Yahoo! web crawlers accessing your server. Yahoo! suggests that you start with small values (0.5–1), and increase the "delay value" only as needed for an acceptable and comfortable crawling rate for your server. Larger "delay values" add more latency between successive crawling and results in decreasing the optimum Yahoo! web search results of your web server.
NOTE: This decrease in crawling rate can affect the way new, updated, or modified content on your server is discovered and displayed in the web search results. If you do feel that a crawl delay is necessary, use small values (0.5–1) to avoid blocking Yahoo! Slurp discovery and refresh of your key content.
robot.txt Examples:
A rule for a delay value of 0.5 would look like:
User-agent: SlurpCrawl-delay: 0.5
A robots.txt rule to set a higher crawl-delay of 5 for Yahoo! Slurp looks like:
User-agent: SlurpCrawl-delay: 5
If you have continuing issues regarding the higher or lower frequency of Yahoo! crawlers, please use the "Contact Customer Care" button below to open an inquiry form. Copy snippets from your most recent web server access log file that shows the Yahoo! crawler activity, and the URL of the host affected, into the feedback area. Yahoo! requires this information to process your request.