Ask Yahoo!7 Search Help  
 
Enter keywords to search help.

How do I have my web site or web pages removed from the Yahoo!7 Search Index?

As our index contains billions of web pages we cannot manually make changes to the index and rely on our automated crawl systems to update the search index. If you want the status of pages that have been crawled and indexed to change, you must make changes to the site content or control documents that communicate to our crawler how these pages should be handled by the search engine. When changes are made to a web page, those changes are properly reflected in our database the next time the page is crawled, indexed, and the index-update cycle is complete.

There are several ways to prevent our crawler from indexing your site or portions of your site:

  • Create a "robots.txt" file on your web site to prevent our crawler from indexing your site
  • Add a "noindex" meta tag to your documents
  • Remove the original document from your web site
  • Host the document on an access restricted section of your web site.

More about: Yahoo!7 Slurp: Yahoo's Web-indexing Robot and Indexing FAQs.

The Yahoo!7 Slurp crawler observes access restrictions per /robots.txt rules and the Robots Exclusion Standard. Since the contents of /robots.txt is subject to change, we occasionally re-fetch /robots.txt. We do not crawl or index any of the content from "disallowed" pages.

After you have made these changes to the site content or control documents to stop your pages from being crawled, you might still see the pages listed in our databases for some time. The changes take effect in our search database when the information is updated in our next refresh cycle. When a site adds disallow rules, previously indexed content remains in the search database through a normal database refresh cycle. When we update the page content in the index, a "disallowed" page changes status to having "no content" and normally disappears from the web search index. However, though the content of a URL is not available, the URL itself might be included in the web search index on the basis of information about that URL published on other web pages. The links and text of pages from other web sites are part of the public World Wide Web content that is crawled and indexed for web search. When content from other pages provides enough information about a URL, that URL might appear in web search results even though none of the content of that URL is included.

To remove or block content from being accessible through the cache, you can use the NOARCHIVE meta-tag. Please note that after you have made these changes to the site content or control documents the change on Yahoo!7 occurs the next time the search engine crawls the page containing the NOARCHIVE tag (typically at least once per month.)

For more information please see our FAQ: How do I keep my page from being cached in Yahoo!7 Search?

Content can be removed from the web by having the webmaster make the page "404". Removing the page from the web site so that attempts to read the URL return a 404 error also removes the page from the Yahoo!7 Web Search cache. Pages that no longer exist are removed from web search results and from the cache after our web crawler "Slurp" refreshes content and notices the 404 status.

What if the pages in question aren't yours?
If the page is not your content, you must contact the site owner and ask them to follow the above instructions.

As Yahoo!7 does not have the means to verify the validity and authority of each request to remove a site, Yahoo!7 does normally not manually remove sites or pages from the search index.

Was this article helpful?

Yes   No
Click to contact Customer Care for further assistance.
Copyright © 2009 Yahoo!7 Pty Limited. All rights reserved.
Privacy Policy - Terms of Service