As our index contains billions of webpages we cannot manually make changes to the index and rely on our automated web crawler systems to update the search index. If you want the status of pages that have been crawled and indexed to change, you must make changes to the site content or control documents that communicate to our crawler how these pages should be handled by the search engine. When changes are made to a webpage, those changes are properly reflected in our database the next time the page is crawled and indexed.
There are several ways to prevent our crawler from indexing your site or portions of your site:
- Create a robots.txt file on your website to prevent our crawler from indexing your site.
- Add a noindex metatag to your page.
- Add a X-Robots-Tag: noindex directive to your HTTP header.
- Remove the original document from your website.
- Host the document on an access restricted section of your website.
The Yahoo! Slurp crawler observes access restrictions per robots.txt rules and the Robots Exclusion Standard. Since the contents of robots.txt is subject to change, we occasionally re-fetch the file. We do not crawl or index content from disallowed pages.
After you have made these changes to the site content or robots.txt to stop your pages from being crawled, you might still see the pages listed in our databases for some time. The changes take effect in our search index when the information is updated during our next refresh cycle. When a site adds disallow rules, previously indexed content remains in the search database through a normal database refresh cycle. When we update the page content in the index, a disallowed page changes status to having no content and normally disappears from the web search index. However, though the content of a URL is not available, the URL itself might be included in the web search index on the basis of information about that URL published on other webpages.
The links and text of pages from other websites are part of the public World Wide Web content that is crawled and indexed for web search. When content from other pages provides enough information about a URL, that URL might appear in web search results even though none of the content of that URL is included.
To remove or disallow content from being accessible through the cache, you can use the noarchive meta-tag or X-Robots-Tag.
For more information please see our FAQ: How to Keep Your Page From Being Cached in Yahoo! Search Results.
Content can be removed from the web by having the webmaster remove the page from the website so that attempts to read the URL return a 404 error. This also removes the page from the Yahoo! Web Search cache. Pages that no longer exist are removed from web search results and from the cache after our Yahoo! Slurp web crawler refreshes content and notices the 404 status.
What if the pages in question aren't yours?
If the page is not your content, please contact the site owner and ask them to follow the above instructions. YYahoo! does not have the means to validate each removal request.