FAQ Database Discussion Community


Robotstxt Google Searchresults

sitemap,google-search,robots.txt
My search results index all the sites on Google. I only want to index the main page and not the /nl/, /en/ and /fr/ How can I prevent this in my robots.txt? I used Disallow: /nl/ Disallow: /fr/ Disallow: /en/ But what with my sitemap with the URL www.eikhoeve.be/nl/sitemap.xml how...

How to prevent search engines from indexing a span of text?

html,web-crawler,robots.txt,googlebot,noindex
From the information I have been able to find so far, <noindex> is supposed to achieve this, making a single section of a page hidden from search engine spiders. But then it also seems this is not obeyed by many browsers - so if that is the case, what markup...

Use `robots.txt` in a multilingual site

seo,sitemap,robots.txt
Having to manage a multilingual site, where users are redirected to a local version of site, like myBurger.com/en // for users from US, UK, etc.. myBurger.com/fr // for users from France, Swiss, etc... How should be organized the robots.txt file in pair with the sitemap? myBurger.com/robots.txt // with - Sitemap:...

Noindex or disallow in robots symfony

indexing,symfony-1.4,robots.txt
I'm working with Symfony 1.4 and i want deactivate Google's index in my web site, what's the best code i will have to use? robots: no-index,nofollow robots: disallow ...

robots.txt: Site still not showing up in Google

robots.txt
I have the following robots.txt User-Agent: * Disallow: User-Agent: Googlebot Allow: / to disallow all Bots except Google's. I made this change last week and when I search for my domain name in Google I still get A description for this result is not available because of this site's robots.txt....

How to attach Sitecore context for controller action mappled to route robots.txt?

sitecore,robots.txt,sitecore-mvc,sitecore8
In Sitecore I'm trying to set up a way for our client to modify the robots.txt file from the content tree. I am attempting to set up a MVC controller action that is mappled to route "robots.txt" and will return the file contents. My controller looks like this: public class...

What does /*.php$ mean in robots.txt?

robots.txt
I came across a site that uses the following in its robots.txt file: User-agent: * Disallow: /*.php$ So what does it do? Will it prevent web crawlers from crawling the following URLs? https://example.com/index.php https://example.com/index.php?page=Events&action=Upcoming Will it block subdomains too? https://subdomain.example.com/index.php ...

Robots.txt file in MVC.NET 4

asp.net,asp.net-mvc-4,seo,robots.txt
I have read an article about ignoring the robots from some url in my ASP MVC.NET project. In his article author said that we should add some action in some off controllers like this. In this example he adds the action to the Home Controller: #region -- Robots() Method --...

Wordpress - customized pages with blocks - prohibit google seo index of blocks

wordpress,seo,woocommerce,robots.txt,google-sitemap
I'm using Wordpress and WooCommerce for my online shop. With the theme I'm using you can customize the product-category pages by adding "blocks". So if I want to have a text on the top of a product category page I simply create a block page, lets say its called "category-info"....

Random IP Address Accessing my Website [closed]

http,robots.txt
I have been hosing a Flask website and a random IP address tried to access my server: 70.188.129.128 - - [08/Feb/2015 21:55:46] "GET / HTTP/1.1" 200 - 70.188.129.128 - - [08/Feb/2015 21:55:46] "GET /apple-touch-icon-120x120-precomposed.png HTTP/1.1" 404 - 70.188.129.128 - - [08/Feb/2015 21:55:46] "GET /apple-touch-icon-120x120.png HTTP/1.1" 404 - 70.188.129.128 - -...

Google still indexing unique URLs

indexing,robots.txt,google-webmaster-tools
I have a robots.txt file set up as such User-agent: * Disallow: /* For a site that is all unique URL based. Sort of like https://jsfiddle.net/ when you save a new fiddle it gives it a unique URL. I want all of my unique URLs to be invisible to Google....

Different domains, different languages, same content, 1 robots.txt

robots.txt
I'm in this situation: Domains: www.example.com www.example.it that point at the same content in different languages. E.g.: www.example.com/audi-car.html www.example.it/audi-auto.html and I have only one robots.txt in the root domains. My question is: How can set my robots.txt to disallow crawling of www.example.it to all bots coming from www.example.com and reverse?...

Is the beginning of a path enough in robots.txt?

robots.txt
I have the following files on my server: /file /file.html /file/bob.html I want to exclude them all from being indexed. Is the following robots.txt enough? User-Agent: * Disallow: /file Or even just: User-Agent: * Disallow: /f Note: I understand that Google's bots would accept /file as disallowing them from all...

How to block spider if he's disobeying the rules of robots.txt

php,robots.txt
Is there any way to block a crawler/spider search bots if they're not obeying the rules written in robots.txt file. If yes, where can I find more info about it? I would prefer some .htaccess rule, if not then PHP....

robots.txt allow all except few sub-directories

seo,search-engine,cpanel,robots.txt,shared-hosting
I want my site to be indexed in search engines except few sub-directories. Following are my robots.txt settings: robots.txt in the root directory User-agent: * Allow: / Separate robots.txt in the sub-directory (to be excluded) User-agent: * Disallow: / Is it the correct way or the root directory rule will...

How do i tell google to not crawl a domain completly

seo,opencart,robots.txt,multistore
I have a site in opencart say abc.com and i have opened a multi store with it xyz.com and i have found that google has started crawling xyz.com too which i don't want . Both the domains are pointing to the same directory so i suppose there can only be...

Prevent Google indexing an AngularJS route

javascript,html,angularjs,seo,robots.txt
Usually, if I didn't want Google to crawl a page, I would add the page to my robots.txt file like so: User-agent: * Disallow: /my-page To prevent Google indexing that page, I would remove the page from my sitemap.xml add the following meta tag to the <head> of the page:...

Couple of questions about robots and content blocking

php,seo,bots,robots.txt
I'm configuring the robots.txt file for robots, and can't really understand what dirs I should block from them. Of course, I've read some infos at the internet, but yet there's some gap between what I want to know and what I've been found so far. So, it would be nice...

What is a 'disallowed entry' when nmap scans through the Robots.txt file?

robots.txt,nmap
I have been using nmap to scan an IP address, and one part of the output is: | http-robots.txt: 1 disallowed entry What does this mean? | http-robots.txt: 1 disallowed entry ...

robots.txt URL patterns with @@

robots.txt
I want to disallow /book-search and currently there is a rule in robots.txt like below: Disallow: /@@book-search* When I try with Webmasters Tool robots.txt tester, it says /book-search is still allowed. Is it because of @@? What is the meaning of @@?...

What is the “unique” keyword in robots.txt?

robots.txt
I have the following code in robots.txt to allow crawling from everyone for now User-agent: * Disallow: Before I changed this the layout of the file was this below. I've been looking for details about unique and I can't find it. Anyone see this before and what is "unique" doing...

Disallow specific folders in robots.txt with wildcards

seo,search-engine,robots.txt,google-crawlers
Can i hide specific folders from crawlers with wildcards like: User-agent: * Disallow: /system/ Disallow: /v* I want to hide all folders starts with "v" character. It will work this way?...

Wordpress - Robotx.txt allows admin login?

wordpress,seo,robots.txt
First, i've searched by robots.txt for Wordpress, but, no one told me where is this file. So, I read that the robots.txt in Wordpress is virtual. Ok, no problem. But, where i find this to edit? My Wordpress is allowing the /author/admin and i don't want this. In dashboard, the...

EHow to Disallow few list of URL crawled by google crawler using robots.txt

url,robots.txt,googlebot
I have couple of pages and URL which I do not want to be crawled by Google crawler. I know it can be done via robots.txt. I search Google and found this way we need to arrange the whole things in robots.txt for disallow crawler but I am not sure...

where to put robots.txt for a CodeIgniter

codeigniter,robots.txt
Where to place the robots.txt file in codeigniter. I dont no how to put where folder. User-agent: * Disallow: / ...

How to customize DNN robots.txt to allow a module specific sitemap to be crawled by search engines?

seo,dotnetnuke,robots.txt,googlebot
I am using the EasyDNN News module for the blog, news articles, etc. on our DNN website. The core DNN sitemap does not include the articles generated by this module, but the module creates its own sitemap. For example: domain.com/blog/mid/1005/ctl/sitemap When I try to submit this sitemap to Google, it...

is it fine with respect to seo if we add same product to 5 different categories

magento,seo,robots.txt
I followed this procedure : system > configuration > catalog > seo > Use Categories Path for Product URLs : NO Use Canonical Link Meta Tag For Products : yes in our site we added same product to 5 different categories. we have some 1000 products. if i follow the...