seo,dotnetnuke,robots.txt,googlebot , How to customize DNN robots.txt to allow a module specific sitemap to be crawled by search engines?


How to customize DNN robots.txt to allow a module specific sitemap to be crawled by search engines?

Question:

Tag: seo,dotnetnuke,robots.txt,googlebot

I am using the EasyDNN News module for the blog, news articles, etc. on our DNN website. The core DNN sitemap does not include the articles generated by this module, but the module creates its own sitemap.

For example: domain.com/blog/mid/1005/ctl/sitemap

When I try to submit this sitemap to Google, it says my Robots.txt file is blocking it.

Looking at the Robots.txt file that ships with DNN, I noticed the following lines under the Slurp and Googlebot user-agents:

Disallow: /*/ctl/       # Slurp permits *
Disallow: /*/ctl/       # Googlebot permits *

I'd like to submit the module's sitemap, but I'd like to know why the /ctl is disallowed for these user-agents, and what would the impact be if I just removed these lines from the file? Specifically, as it pertains to Google crawling the site.

As an added reference, I have read the article below about avoiding a duplicate content penalty by disallowing specific urls that contain /ctl such as login, register, terms, etc. I'm wondering if this is why DNN just disallowed any url with /ctl.

http://www.codeproject.com/Articles/18151/DotNetNuke-Search-Engine-Optimization-Part-Remov


Answer:

The proper way to do this would be to use the DNN Sitemap provider, something that is pretty darn easy to do as a module developer.

I don't have a blog post/tutorial on it, but I do have sample code which can be found in

http://dnnsimplearticle.codeplex.com/SourceControl/latest#cs/Providers/Sitemap/Sitemap.cs

This will allow custom modules to add their own information to the DNN Sitemap.

The reason /CTL is disallowed is because the normal way to load the Login/Registration/Profile controls is to do site?ctl=login and that is typically not something that people want to have indexed.

The other option is just edit the robots.txt file.


Related:


SEO and user-friendly URLs for multi-language website


url,seo,multilingual,usability
Let's say I have a website that has 2 languages, one uses Latin and the second one Cyrillic transcriptions in URLs. For example: example.com/link example.com/ссылка My question is which is more user and SEO friendly, if I leave them as is or if I add the language prefix, so they'd...

Dotnetnuke migration from SQL 2005 to SQL 2012


sql-server-2005,sql-server-2012,dotnetnuke,database-migration
I'm trying to upgrade my DNN v6 from a SQL Server 2005 to a SQL Server 2012. My problem is, after modifying the web.config to match the new appSettings, my website automaticaly runs the install wizard. My IIS was running on a 2003 server and is now on a 2012...

How can I get better google indexing results?


seo,google-search,pagerank
I have just launched a new domain, www.nextlevelsmf.com and it's not showing for some keywords I would like it to. Can anyone give me some advice to help it rank better please? I'd like it to show on the first 2 pages for: Managed SMF hosting SMF host/hosting Managed SMF...

Heading order in HTML5


html5,seo,semantic-markup
This is a webpage example of my site: <html> <title> artilce header </title> <body> <header> <h1> nme of website</h1></header> <section> <h2> name of section</h2> <article> <h3>article header</h3> </article> </section> </body> </html> I want to know if this order is correct? Or does it maybe have a bad effect on SEO?...

Robots.txt file in MVC.NET 4


asp.net,asp.net-mvc-4,seo,robots.txt
I have read an article about ignoring the robots from some url in my ASP MVC.NET project. In his article author said that we should add some action in some off controllers like this. In this example he adds the action to the Home Controller: #region -- Robots() Method --...

SPDY on shared host & SEO Sematics


seo,semantics,shared-hosting,spdy,http2
NodeSPDY on shared host I got a webspace hosted by uberspace and want to use NodeSPDY, but there is a loadbalancer in between which cuts off the tls connection. On uberspace one can request a Port to be opened. With this port it is possible to request resources directly by...

How to avoid the multiple path to same file in php using htaccess?


php,html,apache,.htaccess,seo
My url is www.abc.com/cbn/either/index.php and I want to access this by only current url. When I change the path of the index.php file, i.e. www.abc.com/cbn/index.php Still, I am accessing index.php file which is a bad approach for SEO point of view because now google will index two URLs of the...

Removing the number of first page in Yii2 Pagination from the URL


.htaccess,pagination,seo,yii2
For SEO purposes I need to remove the first page number from the URL. i.e I have the following: example.com/pages/view/1 and example.com/pages/view the two URLs points to the same contents of the view action. I want to make the pagination free from 1 in the URL. i.e first Page link...

Multiple modals with galleries vs. a single dynamic one


javascript,dom,seo,image-gallery,bootstrap-modal
Lets say we have a long list of posts on a single page. Each of those posts has a hidden div with multiple img tags inside it. When a user clicks on the post, the images inside the hidden div should be showcased in a modal gallery. Which approach is...

My website Images not indexed by Google, Yahoo and Bing [closed]


php,codeigniter,seo
I'm using codeigniter framework. why Search Engine's not indexed my website images ? My website has been made since 2013. My website is : www.shadyab.com. It likes groupon website(Offering daily deals at restaurants, retailers and service providers.). An image url : http://www.shadyab.com/assests/images/upload/kaktoos4.jpg What should I do to tell search engines...

si​tem​ap-​tax​-po​st_​tag​.xm​l not found - webmaster tools


wordpress,seo
I'm a newbie in webmaster tools. I get 3 errors in webmaster tools: 1.2: We encountered an error while trying to access your Sitemap. Please ensure your Sitemap follows our guidelines and can be accessed at the location you provided and then resubmit. *General HTTP error: 404 not found Sitemap:...

Block “cloner” servers rendering content from our server


apache,seo,clone,cracking
I have a website of mine (freeofficefinder.com) that is being cloned (see here: thelawyerserviceratings.org) There are actually over 25 websites that are currently cloning our website. Obviously this is hurting our SEO ranking greatly due to "duplicate content". Is there something that I could add to the Apache config file...

Slidershow jquery and convert to css


jquery,css,html5,seo,slider
I downloaded script for slider show and it work without problems but after implemented this slide show i have problems with seo optimalization in HTML5. Because this code using this <div u=""> or <img u=""> and its still write me that i cant use this combination div with tag "u"....

fullPage.js: Make all slides and sections visible in search engine results


jquery,seo,web-crawler,single-page-application,fullpage.js
I'm using fullpage.js jQuery plugin for a Single page application. I'm using mostly default settings and the plugin works like a charm. When I got to the SEO though I couldn't properly make Google crawl my website on a "per slide" basis. All my slides are loaded at the page...

Disallow specific folders in robots.txt with wildcards


seo,search-engine,robots.txt,google-crawlers
Can i hide specific folders from crawlers with wildcards like: User-agent: * Disallow: /system/ Disallow: /v* I want to hide all folders starts with "v" character. It will work this way?...

Best JSON-LD practices: using multiple