Robots.txt, Creating,
Editing, Need, Use and Carefulness needed-Robots.txt, is a very
small text file containing a few characters and phrases but its effect are very
crucial. A wrong Robots.txr file can totally spoil your blog, if fact it instructs search engines how they crawl
your blog. Using robots.txt file you can allow/disallow the search engines to
crawl your entire site, a specific section or item, a specific search engine.
Before adding a robots.txt to your blog you should be extremely careful because
a wrong robots. Txt may spoil your blog’s traffic totally. If you don’t edit the
robots.txt file of your blog, it will use the default robots.txt file that is
fine and the search engines will crawl your entire blog. Please note carefully
that robots.txt is always used only to block the search engines from crawling
something in your blog
![]() |
Robots.txt, Creating, Editing, Need, Use and Carefulness needed |
Before starting we must
learn about the following -
User-agent: - This
a declaration or instruction about search engine name/s. Here you have to use
the technical names of the search engines such as use ‘googlebot’ for Google
search engine, ‘bingbot’ for Bing search engine and ‘*’(an asterisk) for all.
Understand it with examples-(i) User-agent: googlebot (ii) User-agent: bingbot (iii)
User-agent: * etc. If you don’t want to block any search engine, there is no need
to edit or use ‘*’(The asterisk)’.
Disallow: - This
the instruction to the search engines to
neither crawl nor index your pages. This is used below User-agent:. For example if you write Disallow: /search, It
means you are disallowing crawling of your entire blog.
Allow: – If you
write Allow: /, this is an instruction to the search engine to allow search to
search pages.
How to use robots.txt to block search engines from crawling
To block crawling
of a particular blog post - Disallow: /“copy and paste
the post URL here”.html
To block crawling
of a particular blog page - Disallow: //“copy and paste
the page URL here”.html
If you want to get
indexed and crawled everything(entire contents) by all search engines your
robots.txt file must be like this. I have given two files but the effect the
both is same because both says all user agents. One uses ‘Disallow’ but
indicates none of the search engines and other uses ‘Allow’ and indicates all
search engines -
(1)
User-agent: *
Disallow:
OR
(2)
User-agent: *
Allow: /
If you don’t want to
get indexed and crawled anything(entire contents) by all search engines your
robots.txt file must be like this.
User-agent: *
Disallow: /
If you don’t want to
get indexed and crawled a specific post or page your robots.txt file must be like this.
User-agent: *
Disallow: /URL here/
If you have a folder containing 5 files and you want that
only one file is to be crawled and 4 must not be crawled and indexed your
robots.txt file must be like this.
User-agent: Googlebot
Disallow: /folder/
Allow: /folder/file.html
Special request – Please let us know through a comment that
how our effort was. Do not hesitate to give an dverse opinion, give your
opinion impartially. Please subscribe to our blog and like our Facebook page.
Subscribe to our YouTube channel “Start with Wikigreen”. Please share it to
your friends and beloveds. Thanks a lot for visiting our page.
2 टिप्पणियाँ:
भाई, आपका ईमेल मिल गया है। आपका बहुत बहुत धन्यवाद। आपकी हिंदी बहुत ही अच्छी है। बहुत ही अच्छी तरह और बल्कि यूं कहें कि बहुत ही वजनदार तरीके से लिखते हैं। आपके ब्लाग अंग्रेजी में हैं ऐसा ही टेक ब्लाग हिंदी में भी बना लीजिए। हमारे जैसे हिंदी भाषी लोगों की मदद हो सकेगी।
Thank you very much Kahakashan didi.
टिप्पणी पोस्ट करें