As our index contains billions of web pages we cannot manually make changes to the index and rely on our automated crawl systems to update the search index. If you want the status of pages that have been crawled and indexed to change, you must make changes to the site content or control documents that communicate to our crawler how these pages should be handled by the search engine. When changes are made to a web page, those changes are properly reflected in our database the next time the page is crawled, indexed, and the index-update cycle is complete.
More about: How Yahoo! Search indexes web pages.
There are several ways to prevent our crawler from indexing your site or portions of your site:
Yahoo! Site Explorer can also be used to delete URLs or complete paths. See: How can I delete my URLs from the Yahoo! index?
More about: Yahoo! Slurp: Yahoo's Web-indexing Robot and Indexing FAQs.
The Yahoo! Slurp crawler observes access restrictions per robots.txt rules and the Robots Exclusion Standard. Since the contents of robots.txt is subject to change, we occasionally re-fetch the file. We do not crawl or index content from disallowed pages.
After you have made these changes to the site content or robots.txt to stop your pages from being crawled, you might still see the pages listed in our databases for some time. The changes take effect in our search index when the information is updated in our next refresh cycle. When a site adds disallow rules, previously indexed content remains in the search database through a normal database refresh cycle. When we update the page content in the index, a disallowed page changes status to having no content and normally disappears from the web search index. However, though the content of a URL is not available, the URL itself might be included in the web search index on the basis of information about that URL published on other web pages. The links and text of pages from other web sites are part of the public World Wide Web content that is crawled and indexed for web search. When content from other pages provides enough information about a URL, that URL might appear in web search results even though none of the content of that URL is included.
To remove or disallow content from being accessible through the cache, you can use the noarchive meta-tag or X-Robots-Tag.
For more information please see our FAQ: How do I keep my page from being cached in Yahoo! Search?
Content can be removed from the web by having the webmaster remove the page from the web site so that attempts to read the URL return a 404 error. This also removes the page from the Yahoo! Web Search cache. Pages that no longer exist are removed from web search results and from the cache after our web crawler Slurp refreshes content and notices the 404 status.
What if the pages in question aren't yours?
If the page is not your content, please contact the site owner and ask them to follow the above instructions. Yahoo! does not have the means to validate each removal request.
Also see: How can I have an offensive site/URL removed from your database?