- Why are you crawling my site?
- How do I prevent my site or certain subdirectories from being crawled?
- How can I reduce the number of requests you make on my web site?
- How do I prevent Yahoo! from indexing certain pages?
- How do I allow you to index pages, but not place them in your cache?
- I'm seeing repeated download requests, why is this?
- Why can't I find my web pages in your search engine?
Why are you crawling my site?
YahooSeeker/M1A1-R2D2 is a Mobile Web crawling robot. The YahooSeeker/M1A1-R2D2 crawler collects documents from the Mobile Web to build a searchable index for searching on handheld devices like mobile phones, PDA's, and others. As part of the crawling effort, Yahoo!'s mobile web crawler will take robots.txt standards into account to ensure we do not crawl and index content from those pages whose content you do not want included in Yahoo! Mobile Search Technology. If a page is disallowed to be crawled by robots.txt standards, Yahoo! will not read or use the contents of that page. The URL of a protected page may be included in Yahoo! Mobile Search Technology as a "thin" document with no text content. Links and reference text from other public web pages provide identifiable information about a URL and may be indexed as part of mobile web search coverage.How do I prevent my site or certain subdirectories from being crawled?
Robots.txtYahooSeeker/M1A1-R2D2 crawler obeys the Robot Exclusion Standard. Specifically, YahooSeeker/M1A1-R2D2 adheres to the 1994 Robots Exclusion Standard (RES). YahooSeeker/M1A1-R2D2 will obey the first entry in the robots.txt file with a User-agent containing "YahooSeeker/M1A1-R2D2". If there is no such record, it will obey the first entry with a User-agent of "*". Disallowed Documents
Disallowed documents, including slash (the home page of the site), are not indexed, nor are links in those documents followed. YahooSeeker/M1A1-R2D2 does read the home page at each site and uses it internally, but if it is disallowed it is neither indexed nor followed. Robots.txt examples:
1) Yahoo! Mobile Web Crawler will not crawl anything from the site
User-agent: YahooSeeker/M1A1-R2D2
Disallow: /
2) Yahoo! Mobile Web Crawler will not crawl anything in the /cgi-bin/ path
User-agent: YahooSeeker/M1A1-R2D2
Disallow: /cgi-bin/
How can I reduce the number of requests you make on my web site?
There is a YahooSeeker/M1A1-R2D2 specific extension to robots.txt which allows you to set a lower limit on our crawler request rate. You can add a"Crawl-delay: xx" instruction, where "xx" is a delay value between successive crawler accesses. If the crawler rate is a problem for your server, you can set the delay up to 5 or 10 or a comfortable value for your server.
Setting a crawl-delay of 10 for YahooSeeker/M1A1-R2D2 would look something like:
User-agent: YahooSeeker/M1A1-R2D2Crawl-delay: 10 If you have continuing issues regarding the frequency of access, use the "Contact Customer Care" option below to open a support form. On the support form:
- Select the subject: "Crawler politeness"
- Copy your most recent weblog that lists the Yahoo! crawler:
YahooSeeker/M1A1-R2D2and the URL of the host affected into the "Comments" box. We will need this information to process your request.
How do I prevent Yahoo! from indexing certain pages?
YahooSeeker/M1A1-R2D2 obeys the "noindex" Meta tag. If you place:<META NAME="robots" CONTENT="noindex">
in the head of your web document, YahooSeeker/M1A1-R2D2 will retrieve the document, but it will not index the document or place it in the search engine's database.
How do I allow you to index pages, but not place them in your cache?
YahooSeeker/M1A1-R2D2 obeys the "noarchive" Meta tag. If you place:<META NAME="robots" CONTENT="noarchive"> in the head of your web document, YahooSeeker/M1A1-R2D2 will retrieve the document, but it will not cache or archive the document for use in the PageCache system.
I'm seeing repeated download requests, why is this?
In general, YahooSeeker/M1A1-R2D2 should only download one copy of each file from your site during a given crawl cycle. Occasionally the crawler is stopped and restarted, and it recrawls pages it has recently retrieved. Recrawls should happen infrequently, and should not be any cause for alarm. YahooSeeker/M1A1-R2D2 will re-read /robots.txt fairly often so that any changes to the robots exclusion rules will be applied promptly.Why can't I find my mobile web pages in your search engine?
This is because our crawlers have not yet discovered your domain. To submit your site to Yahoo!, please go to the Yahoo! Mobile Site Submit page. Check that you are not blockingUser-agent:YahooSeeker/M1A1-R2D2 or User-agent:*
YahooSeeker/M1A1-R2D2 will crawl and index your site eventually, subject to conditions.
Note: To contact Yahoo! Mobile Search about YahooSeeker/M1A1-R2D2, please select the "Contact Customer Care" option below to open a support form.