FAQ Database Discussion Community


How to prevent search engines from indexing a span of text?

html,web-crawler,robots.txt,googlebot,noindex
From the information I have been able to find so far, <noindex> is supposed to achieve this, making a single section of a page hidden from search engine spiders. But then it also seems this is not obeyed by many browsers - so if that is the case, what markup...

Prevent Google indexing an AngularJS route

javascript,html,angularjs,seo,robots.txt
Usually, if I didn't want Google to crawl a page, I would add the page to my robots.txt file like so: User-agent: * Disallow: /my-page To prevent Google indexing that page, I would remove the page from my sitemap.xml add the following meta tag to the <head> of the page:...

How do i tell google to not crawl a domain completly

seo,opencart,robots.txt,multistore
I have a site in opencart say abc.com and i have opened a multi store with it xyz.com and i have found that google has started crawling xyz.com too which i don't want . Both the domains are pointing to the same directory so i suppose there can only be...

where to put robots.txt for a CodeIgniter

codeigniter,robots.txt
Where to place the robots.txt file in codeigniter. I dont no how to put where folder. User-agent: * Disallow: / ...

Robotstxt Google Searchresults

sitemap,google-search,robots.txt
My search results index all the sites on Google. I only want to index the main page and not the /nl/, /en/ and /fr/ How can I prevent this in my robots.txt? I used Disallow: /nl/ Disallow: /fr/ Disallow: /en/ But what with my sitemap with the URL www.eikhoeve.be/nl/sitemap.xml how...

Robots.txt file in MVC.NET 4

asp.net,asp.net-mvc-4,seo,robots.txt
I have read an article about ignoring the robots from some url in my ASP MVC.NET project. In his article author said that we should add some action in some off controllers like this. In this example he adds the action to the Home Controller: #region -- Robots() Method --...

is it fine with respect to seo if we add same product to 5 different categories

magento,seo,robots.txt
I followed this procedure : system > configuration > catalog > seo > Use Categories Path for Product URLs : NO Use Canonical Link Meta Tag For Products : yes in our site we added same product to 5 different categories. we have some 1000 products. if i follow the...

What does /*.php$ mean in robots.txt?

robots.txt
I came across a site that uses the following in its robots.txt file: User-agent: * Disallow: /*.php$ So what does it do? Will it prevent web crawlers from crawling the following URLs? https://example.com/index.php https://example.com/index.php?page=Events&action=Upcoming Will it block subdomains too? https://subdomain.example.com/index.php ...

Google still indexing unique URLs

indexing,robots.txt,google-webmaster-tools
I have a robots.txt file set up as such User-agent: * Disallow: /* For a site that is all unique URL based. Sort of like https://jsfiddle.net/ when you save a new fiddle it gives it a unique URL. I want all of my unique URLs to be invisible to Google....

Noindex or disallow in robots symfony

indexing,symfony-1.4,robots.txt
I'm working with Symfony 1.4 and i want deactivate Google's index in my web site, what's the best code i will have to use? robots: no-index,nofollow robots: disallow ...

How to block spider if he's disobeying the rules of robots.txt

php,robots.txt
Is there any way to block a crawler/spider search bots if they're not obeying the rules written in robots.txt file. If yes, where can I find more info about it? I would prefer some .htaccess rule, if not then PHP....

Disallow specific folders in robots.txt with wildcards

seo,search-engine,robots.txt,google-crawlers
Can i hide specific folders from crawlers with wildcards like: User-agent: * Disallow: /system/ Disallow: /v* I want to hide all folders starts with "v" character. It will work this way?...

Is the beginning of a path enough in robots.txt?

robots.txt
I have the following files on my server: /file /file.html /file/bob.html I want to exclude them all from being indexed. Is the following robots.txt enough? User-Agent: * Disallow: /file Or even just: User-Agent: * Disallow: /f Note: I understand that Google's bots would accept /file as disallowing them from all...

What is the “unique” keyword in robots.txt?

robots.txt
I have the following code in robots.txt to allow crawling from everyone for now User-agent: * Disallow: Before I changed this the layout of the file was this below. I've been looking for details about unique and I can't find it. Anyone see this before and what is "unique" doing...

Wordpress - customized pages with blocks - prohibit google seo index of blocks

wordpress,seo,woocommerce,robots.txt,google-sitemap
I'm using Wordpress and WooCommerce for my online shop. With the theme I'm using you can customize the product-category pages by adding "blocks". So if I want to have a text on the top of a product category page I simply create a block page, lets say its called "category-info"....

robots.txt allow all except few sub-directories

seo,search-engine,cpanel,robots.txt,shared-hosting
I want my site to be indexed in search engines except few sub-directories. Following are my robots.txt settings: robots.txt in the root directory User-agent: * Allow: / Separate robots.txt in the sub-directory (to be excluded) User-agent: * Disallow: / Is it the correct way or the root directory rule will...

How to attach Sitecore context for controller action mappled to route robots.txt?

sitecore,robots.txt,sitecore-mvc,sitecore8
In Sitecore I'm trying to set up a way for our client to modify the robots.txt file from the content tree. I am attempting to set up a MVC controller action that is mappled to route "robots.txt" and will return the file contents. My controller looks like this: public class...

Random IP Address Accessing my Website [closed]

http,robots.txt
I have been hosing a Flask website and a random IP address tried to access my server: 70.188.129.128 - - [08/Feb/2015 21:55:46] "GET / HTTP/1.1" 200 - 70.188.129.128 - - [08/Feb/2015 21:55:46] "GET /apple-touch-icon-120x120-precomposed.png HTTP/1.1" 404 - 70.188.129.128 - - [08/Feb/2015 21:55:46] "GET /apple-touch-icon-120x120.png HTTP/1.1" 404 - 70.188.129.128 - -...

EHow to Disallow few list of URL crawled by google crawler using robots.txt

url,robots.txt,googlebot
I have couple of pages and URL which I do not want to be crawled by Google crawler. I know it can be done via robots.txt. I search Google and found this way we need to arrange the whole things in robots.txt for disallow crawler but I am not sure...

Different domains, different languages, same content, 1 robots.txt

robots.txt
I'm in this situation: Domains: www.example.com www.example.it that point at the same content in different languages. E.g.: www.example.com/audi-car.html www.example.it/audi-auto.html and I have only one robots.txt in the root domains. My question is: How can set my robots.txt to disallow crawling of www.example.it to all bots coming from www.example.com and reverse?...

Use `robots.txt` in a multilingual site

seo,sitemap,robots.txt
Having to manage a multilingual site, where users are redirected to a local version of site, like myBurger.com/en // for users from US, UK, etc.. myBurger.com/fr // for users from France, Swiss, etc... How should be organized the robots.txt file in pair with the sitemap? myBurger.com/robots.txt // with - Sitemap:...

Couple of questions about robots and content blocking

php,seo,bots,robots.txt
I'm configuring the robots.txt file for robots, and can't really understand what dirs I should block from them. Of course, I've read some infos at the internet, but yet there's some gap between what I want to know and what I've been found so far. So, it would be nice...

robots.txt URL patterns with @@

robots.txt
I want to disallow /book-search and currently there is a rule in robots.txt like below: Disallow: /@@book-search* When I try with Webmasters Tool robots.txt tester, it says /book-search is still allowed. Is it because of @@? What is the meaning of @@?...

robots.txt: Site still not showing up in Google

robots.txt
I have the following robots.txt User-Agent: * Disallow: User-Agent: Googlebot Allow: / to disallow all Bots except Google's. I made this change last week and when I search for my domain name in Google I still get A description for this result is not available because of this site's robots.txt....

What is a 'disallowed entry' when nmap scans through the Robots.txt file?

robots.txt,nmap
I have been using nmap to scan an IP address, and one part of the output is: | http-robots.txt: 1 disallowed entry What does this mean? | http-robots.txt: 1 disallowed entry ...

How to customize DNN robots.txt to allow a module specific sitemap to be crawled by search engines?

seo,dotnetnuke,robots.txt,googlebot
I am using the EasyDNN News module for the blog, news articles, etc. on our DNN website. The core DNN sitemap does not include the articles generated by this module, but the module creates its own sitemap. For example: domain.com/blog/mid/1005/ctl/sitemap When I try to submit this sitemap to Google, it...

Wordpress - Robotx.txt allows admin login?

wordpress,seo,robots.txt
First, i've searched by robots.txt for Wordpress, but, no one told me where is this file. So, I read that the robots.txt in Wordpress is virtual. Ok, no problem. But, where i find this to edit? My Wordpress is allowing the /author/admin and i don't want this. In dashboard, the...