Skip to search.

Yahoo! Mobile Web Crawler

Last Updated: 31 May 2010
Text Size: A A A

Save to My Help

Save this article to My Help for easy reference. You can visit the article at any time from any computer.

Replace an article

You have reached the maximum number of saved articles. Your oldest saved article will be replaced with the new one.

Why are you crawling my site?

YahooSeeker/M1A1-R2D2 is a Mobile Web crawling robot. The YahooSeeker/M1A1-R2D2 crawler collects documents from the Mobile Web to build a searchable index for searching on handheld devices like mobile phones, PDA's, and others.
As part of the crawling effort, Yahoo!'s mobile web crawler will take robots.txt standards into account to ensure we do not crawl and index content from those pages whose content you do not want included in Yahoo! Mobile Search Technology. If a page is disallowed to be crawled by robots.txt standards, Yahoo! will not read or use the contents of that page. The URL of a protected page may be included in Yahoo! Mobile Search Technology as a "thin" document with no text content. Links and reference text from other public web pages provide identifiable information about a URL and may be indexed as part of mobile web search coverage.

How do I prevent my site or certain subdirectories from being crawled?

Robots.txt

YahooSeeker/M1A1-R2D2 crawler obeys the Robot Exclusion Standard. Specifically, YahooSeeker/M1A1-R2D2 adheres to the 1994 Robots Exclusion Standard (RES).

YahooSeeker/M1A1-R2D2 will obey the first entry in the robots.txt file with a User-agent containing "YahooSeeker/M1A1-R2D2". If there is no such record, it will obey the first entry with a User-agent of "*".

Disallowed Documents

Disallowed documents, including slash (the home page of the site), are not indexed, nor are links in those documents followed. YahooSeeker/M1A1-R2D2 does read the home page at each site and uses it internally, but if it is disallowed it is neither indexed nor followed.

Robots.txt examples:

  1. Yahoo! Mobile Web Crawler will not index anything from the site
    User-agent: YahooSeeker/M1A1-R2D2
    Disallow: /
  2. Yahoo! Mobile Web Crawler will not index anything in the <hostname>/cgi-bin/ path
    User-agent: YahooSeeker/M1A1-R2D2
    Disallow: /cgi-bin/

How can I reduce the number of requests you make on my web site?

There is a YahooSeeker/M1A1-R2D2 specific extension to robots.txt which allows you to set a lower limit on our crawler request rate.

You can add a


"Crawl-delay: xx"

instruction, where

"xx"

is a delay value between successive crawler accesses. If the crawler rate is a problem for your server, you can set the delay up to 5 or 20 or a comfortable value for your server.

Setting a crawl-delay of 20 for YahooSeeker/M1A1-R2D2 would look something like:

User-agent: YahooSeeker/M1A1-R2D2
Crawl-delay: 20

If you have continuing issues regarding the frequency of access, use the "Contact Customer Care" option below to open a support form. On the support form:

  • Select the subject: "Crawler politeness"
  • Copy your most recent weblog that lists the Yahoo! crawler:
    
    YahooSeeker/M1A1-R2D2
    
    
    and the URL of the host affected into the "Comments" box. We will need this information to process your request.

How do I prevent you from indexing certain pages?

YahooSeeker/M1A1-R2D2 obeys the "noindex" Meta tag. If you place:


<META NAME="robots" CONTENT="noindex">

in the head of your web document, YahooSeeker/M1A1-R2D2 will retrieve the document, but it will not index the document or place it in the search engine's database.

How do I allow you to index pages, but not place them in your cache?

YahooSeeker/M1A1-R2D2 obeys the "noarchive" Meta tag. If you place:


<META NAME="robots" CONTENT="noarchive"> 

in the head of your web document, YahooSeeker/M1A1-R2D2 will retrieve the document, but it will not cache or archive the document for use in the PageCache system.

I'm seeing repeated download requests, why is this?

In general, YahooSeeker/M1A1-R2D2 should only download one copy of each file from your site during a given crawl cycle. Occasionally the crawler is stopped and restarted, and it recrawls pages it has recently retrieved. Recrawls should happen infrequently, and should not be any cause for alarm.

YahooSeeker/M1A1-R2D2 will re-read /robots.txt fairly often so that any changes to the robots exclusion rules will be applied promptly.

Why can't I find my mobile web pages in your search engine?

This is because our crawlers have not yet discovered your domain. To submit your site to Yahoo!, please go to the Yahoo! Mobile Site Submit page.

Have you checked that you are not blocking


User-agent:YahooSeeker/M1A1-R2D2 or User-agent:*

YahooSeeker/M1A1-R2D2 will crawl and index your site eventually, subject to conditions.

Note: To contact Yahoo! Mobile Search about YahooSeeker/M1A1-R2D2, please select the "Contact Customer Care" option below to open a support form.

Was this information helpful?      

My Help

Forgot your ID or password?

Sign In

Sign in to see your account information saved articles and more.
  1. Recent Searches (0)

  2. Recently Viewed (0)

Still Need Help?

Copyright © 2013 Yahoo! UK Limited. All rights reserved.

Privacy Policy - Terms of Service