- Why are you crawling my site?
- How do I prevent my site or certain subdirectories from being crawled?
- How can I reduce the number of requests you make on my web site?
- How do I prevent you from indexing certain pages?
- How do I allow you to index pages, but not place them in your cache?
- I'm seeing repeated download requests, why is this?
- Why can't I find my web pages in your search engine?
YahooSeeker/M1A1-R2D2 is a Mobile Web crawling robot. The YahooSeeker/M1A1-R2D2 crawler collects documents from the Mobile Web to build a searchable index for searching on handheld devices like mobile phones, PDA's, and others.
As part of the crawling effort, Yahoo!'s mobile web crawler will take robots.txt standards into account to ensure we do not crawl and index content from those pages whose content you do not want included in Yahoo! Mobile Search Technology. If a page is disallowed to be crawled by robots.txt standards, Yahoo! will not read or use the contents of that page. The URL of a protected page may be included in Yahoo! Mobile Search Technology as a "thin" document with no text content. Links and reference text from other public web pages provide identifiable information about a URL and may be indexed as part of mobile web search coverage.
YahooSeeker/M1A1-R2D2 will obey the first entry in the robots.txt file with a User-agent containing "YahooSeeker/M1A1-R2D2". If there is no such record, it will obey the first entry with a User-agent of "*".
Disallowed documents, including slash (the home page of the site), are not indexed, nor are links in those documents followed. YahooSeeker/M1A1-R2D2 does read the home page at each site and uses it internally, but if it is disallowed it is neither indexed nor followed.
- Yahoo! Mobile Web Crawler will not index anything from the site
- Yahoo! Mobile Web Crawler will not index anything in the <hostname>/cgi-bin/ path
There is a YahooSeeker/M1A1-R2D2 specific extension to robots.txt which allows you to set a lower limit on our crawler request rate.
You can add a
"Crawl-delay: xx"instruction, where
"xx"is a delay value between successive crawler accesses. If the crawler rate is a problem for your server, you can set the delay up to 5 or 20 or a comfortable value for your server.
Setting a crawl-delay of 20 for YahooSeeker/M1A1-R2D2 would look something like:
If you have continuing issues regarding the frequency of access, use the "Contact Customer Care" option below to open a support form. On the support form:
- Select the subject: "Crawler politeness"
- Copy your most recent weblog that lists the Yahoo! crawler:
YahooSeeker/M1A1-R2D2and the URL of the host affected into the "Comments" box. We will need this information to process your request.
YahooSeeker/M1A1-R2D2 obeys the "noindex" Meta tag. If you place:
<META NAME="robots" CONTENT="noindex">
in the head of your web document, YahooSeeker/M1A1-R2D2 will retrieve the document, but it will not index the document or place it in the search engine's database.
YahooSeeker/M1A1-R2D2 obeys the "noarchive" Meta tag. If you place:
<META NAME="robots" CONTENT="noarchive">
in the head of your web document, YahooSeeker/M1A1-R2D2 will retrieve the document, but it will not cache or archive the document for use in the PageCache system.
In general, YahooSeeker/M1A1-R2D2 should only download one copy of each file from your site during a given crawl cycle. Occasionally the crawler is stopped and restarted, and it recrawls pages it has recently retrieved. Recrawls should happen infrequently, and should not be any cause for alarm.
YahooSeeker/M1A1-R2D2 will re-read /robots.txt fairly often so that any changes to the robots exclusion rules will be applied promptly.
This is because our crawlers have not yet discovered your domain. To submit your site to Yahoo!, please go to the Yahoo! Mobile Site Submit page.
Have you checked that you are not blocking
User-agent:YahooSeeker/M1A1-R2D2 or User-agent:*
YahooSeeker/M1A1-R2D2 will crawl and index your site eventually, subject to conditions.
Note: To contact Yahoo! Mobile Search about YahooSeeker/M1A1-R2D2, please select the "Contact Customer Care" option below to open a support form.