web-crawler,sitemap,meta-tags,google-webmaster-tools,noindex , How to get Google to re-index a page after removing noindex metatag?


How to get Google to re-index a page after removing noindex metatag?

Question:

Tag: web-crawler,sitemap,meta-tags,google-webmaster-tools,noindex

By accident, I had put <meta name="robots" content="noindex"> into lots of pages on my domain. I have now removed this meta-tag, but how can I get these pages to be re-indexed by Google? Any tip?

I have tried re-submitting my sitemap.xml in Webmaster Tools, but I'm not sure if it works.

Also, if Google will re-index, how long do I have to wait?


Answer:

Google usually fairly quickly crawls your pages. Inclusion into index is a bit slower, and getting reasonable search rank takes time.

Look at your web server log to confirm that google bot did crawl your pages, you can search for exact page in google and it usually comes up, but to start showing up for relevant terms takes time.

For example if you search for 'dilution calculator' it is unlikely that my page will come up, but if you search for 'bugaco.com dilution calculator' the page on the site is first hit Dilution Calculator - Bugaco

Hope this helps.


Related:


Scrapy collect data from first element and post's title


python,web-scraping,web-crawler,scrapy,scrapy-spider
I need Scrapy to collect data from this tag and retrieve all three parts in one piece. The output would be something like: Tonka double shock boys bike - $10 (Denver). <span class="postingtitletext">Tonka double shock boys bike - <span class="price">$10</span><small> (Denver)</small></span> Second is to collect data from first span tag....

My Java program reaches 80% cpu usage after 20-30 min


java,database,web-crawler,cpu
I have a java program that crawls for some data on some sites and inserts it into the database. The Program keeps doing this : Get the html Extract the relevant data with some splits Insert into to database For the first 5-10 min it runs perfectly and very fast...

Get all links from page on Wikipedia


python,python-2.7,web-crawler
I am making a Python web-crawler program to play The Wiki game. If you're unfamiliar with this game: Start from some article on Wikipedia Pick a goal article Try to get to the goal article from the start article just by clicking wiki/ links My process for doing this is:...

Web Crawler - TooManyRedirects: Exceeded 30 redirects. (python)


python,web-crawler
I've tried to follow one of the youtube tutorial however I've met some issue. Anyone able to help? I'm new to python, I understand that there is one or two similar question, however, I read and don't understand. Can someone help me out? Thanks import requests from bs4 import BeautifulSoup...

Why scrapy not giving all the results and the rules part is also not working?


python,xpath,web-scraping,web-crawler,scrapy
This script is only providing me with the first result or the .extract()[0] if I change 0 to 1 then next item. Why it is not iterating the whole xpath again? The rule part is also not working. I know the problem is in the response.xpath. How to deal with...

Apache Nutch REST api


api,rest,web-crawler,nutch
I'm trying to launch a crawl via the rest api. A crawl starts with injecting urls. Using a chrome developer tool "Advanced Rest Client" I'm trying to build this POST payload up but the response I get is a 400 Bad Request. POST - http://localhost:8081/job/create Payload { "crawl-id":"crawl-01", "type":"INJECT", "config-id":"default",...

Heritrix not finding CSS files in conditional comment blocks


java,web-crawler,heritrix
The Problem/evidence Heritrix is not detecting the presence of files in conditional comments that open & close in one string, such as this: <!--[if (gt IE 8)|!(IE)]><!--> <link rel="stylesheet" href="/css/mod.css" /> <!--<![endif]--> However standard conditional blocks like this work fine: <!--[if lte IE 9]> <script src="/js/ltei9.js"></script> <![endif]--> I've identified the...

Authorization issue with cron crawler inserting data into Google spreadsheet using Google API in Ruby


ruby,cron,google-api,web-crawler,google-api-client
My project is to crawl the certain web data and put them into my Google spreadsheet every morning 9:00. And it has to get the authorization to read & write something. That's why the code below is located at the top. # Google API CLIENT_ID = blah blah CLIENT_SECRET =...

want to keep running my single ruby crawler that dont need html and nothing


ruby-on-rails,ruby,web-crawler
first of all, I'm a newbie. I just made a single ruby file, which crawls something on the certain web and put data into my google spreadsheet. But I want my crawler to do its job every morning 9:00 AM. Then what do I need? Maybe a gem and server?...

SgmlLinkExtractor in scrapy


web-crawler,scrapy,rules,extractor
i need some enlightenment about SgmlLinkExtractor in scrapy. For the link: example.com/YYYY/MM/DD/title i would write: Rule(SgmlLinkExtractor(allow=[r'\d{4}/\d{2}/\d{2}/\w+']), callback='parse_example')] For the link: example.com/news/economic/title should i write: r'\news\category\w+'or r'\news\w+/\w+' ? (category changes but the url contains always news) For the link: example.com/article/title should i write: r'\article\w+' ? (the url contains always article)...

focused crawler by modifying nutch


web-crawler,nutch
I want to create a focused crawler using nutch. Is there any way to modify nutch so as to make crawling faster? Can we use the metadata in nutch to train a classifier that would reduce the number of urls nutch has to crawl for a given topic??

fullPage.js: Make all slides and sections visible in search engine results


jquery,seo,web-crawler,single-page-application,fullpage.js
I'm using fullpage.js jQuery plugin for a Single page application. I'm using mostly default settings and the plugin works like a charm. When I got to the SEO though I couldn't properly make Google crawl my website on a "per slide" basis. All my slides are loaded at the page...

how to check whether a program using requests module is dead or not


python,web-crawler,downloading
I am trying to using python download a batch of files, and I use requests module with stream turned on, in other words, I retrieve each file in 200K blocks. However, sometimes, the downloading may stop as it just gets stuck (no response) and there is no error. I guess...

How to crawl links on all pages of a web site with Scrapy


website,web-crawler,scrapy,extract
I'm learning about scrapy and I'm trying to extract all links that contains: "http://lattes.cnpq.br/andasequenceofnumbers" , example: http://lattes.cnpq.br/0281123427918302 But I don't know what is the page on the web site that contains these information. For example this web site: http://www.ppgcc.ufv.br/ The links that I want are on this page: http://www.ppgcc.ufv.br/?page_id=697 What...

Unable to click in CasperJS


javascript,web-crawler,phantomjs,casperjs
I want to crawl the HTML data. And, I tried headless browser in CasperJS. But, Can't able to click. - The following is tried code in CapserJS. var casper = require('casper').create(); var mouse = require('mouse').create(casper); casper.start('http://sts.kma.go.kr/jsp/home/contents/climateData/smart/smartStatisticsSearch.do', function() { this.echo('START'); }); casper.then(function() { this.capture("1.png"); this.mouse.click('li[class="item1"]'); casper.wait(5000, function() { this.capture("2.png"); }); });...

Python: Transform a unicode variable into a string variable


python,unicode,casting,web-crawler,unicode-string
I used a web crawler to get some data. I stored the data in a variable price. The type of price is: <class 'bs4.element.NavigableString'> The type of each element of price is: <type 'unicode'> Basically the price contains some white space and line feeds followed by: $520. I want to...

How to evidence a particular page on Google SERP?


seo,sitemap,googlebot,google-sitemap
I noticed that some results of searching on Google are not a single url but a single url with a two-column list of what I call 'important links' of this website. For example: If you open Google and search for "amazon.it", without the double quote, you got this: As you...

Workload balancing between akka actors


multithreading,scala,web-crawler,akka,actor
I have 2 akka actors used for crawling links, i.e. find all links in page X, then find all links in all pages linked from X, etc... I want them to progress more or less at the same pace, but more often than not one of them becomes starved and...

PHP web crawler, check URL for path


php,url,path,web-crawler,bots
I'm writing a simple web crawler to grab some links from a site. I need to check the returned links to make sure I selectively collect what I want. For example, here's a few links returned from http://www.polygon.com/ [0] http://www.polygon.com/2015/5/15/8613113/destiny-queens-wrath-bounties-ether-key-guide#comments [1] http://www.polygon.com/videos [2] http://www.polygon.com/2015/5/15/8613113/destiny-queens-wrath-bounties-ether-key-guide [3] http://www.polygon.com/features so link 0 and...

How can I get the value of a Monad without System.IO.Unsafe? [duplicate]


haskell,web-crawler,monads
This question already has an answer here: How to get normal value from IO action in Haskell 2 answers I just started learning Haskell and got my first project working today. Its a small program that uses Network.HTTP.Conduit and Graphics.Rendering.Chart (haskell-chart) to plot the amount of google search results...

Google snippet tree structure


sitemap,rich-snippets,google-rich-snippets
How can I get a snippet like in the picture below in google page results? I submitted a sitemap.xml in google webmaster tools 3 months ago, but there is no change until now? Do you know how I should proceed to get this Result? Or the name (keyword) of this...

New to Python, what am I doing wrong and not seeing tag (links) returned with BS4


python,beautifulsoup,web-crawler,bs4
I'm new to python and learning it. Basically I am trying to pull all the links from my e-commerce store products that is stored in the html below. I'm getting no results returned though and I can't seem to figure out why not. <h3 class="two-lines-name"> <a title="APPLE IPOD IPOD A1199...

Regex for URL to sites


regex,sitemap,regex-negation,regex-greedy,regedit
I have two URLs with the patterns: 1.http://localhost:9001/f/ 2.http://localhost:9001/flight/ I have a site filter which redirects to the respective sites if the regex matches. I tried the following regex patterns for the 2 URLs above: http?://localhost[^/]/f[^flight]/.* http?://localhost[^/]/flight/.* Both URLS are getting redirected to the first site, as both URLs are...

How to increse number of sitemapindex


xml,seo,sitemap,googlebot
I'm intereset if i can have many sitemapindex like this: <sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <sitemap> <loc>https://domain.com/sitemap/destinatieTag.xml</loc> <lastmod>2015-02-01T05:00:34+02:00</lastmod> </sitemap> </sitemapindex> I mean 1 sitemapindex to refer athor sitemapindex , or what is maxium limit for a sitemap? Example if destinatieTag.xml is athor <sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <sitemap>...

Check if element exists in fetched URL [closed]


javascript,jquery,python,web-crawler,window.open
I have a page with, say, 30 URLS, I need to click on each and check if an element exists. Currently, this means: $('area').each(function(){ $(this).attr('target','_blank'); var _href = $(this).attr("href"); var appID = (window.location.href).split('?')[1]; $(this).attr("href", _href + '?' + appID); $(this).trigger('click'); }); Which opens 30 new tabs, and I manually go...

Presta Sitemap Bundle Install Error


php,xml,symfony2,composer-php,sitemap
I have followed the presta sitemap bundle documentation and I am still having issues. I have this line in my composer.json file: "presta/sitemap-bundle": "dev-master" But I get this error: A typo in the package name The package is not available in a stable-enough version according to your minimum-stability setting On...

Howto use scrapy to crawl a website which hides the url as href=“javascript:;” in the next button


javascript,python,pagination,web-crawler,scrapy
I am learning python and scrapy lately. I googled and searched around for a few days, but I don't seem to find any instruction on how to crawl multiple pages on a website with hidden urls - <a href="javascript:;". Basically each page contains 20 listings, each time you click on...

how to download image in Goutte


php,web-crawler,guzzle,goutte
I want to download an image in this page. The image source ishttp://i2.pixiv.net/c/600x600/img-master/img/2015/01/19/12/17/13/48258889_p0_master1200.jpg. I try to download it use this: $client = new Goutte\Client (); $client->getClient->get($img_url, array('save_to' => $img_url_save_name)); But I failed, then I realize if I directly accesshttp://i2.pixiv.net/c/600x600/img-master/img/2015/01/19/12/17/13/48258889_p0_master1200.jpg, I are denied by CDN nginx server. I have to access...

Scrapy middleware setup


python,web-scraping,web-crawler,scrapy
I am trying to access public proxy using scrapy to get some data. I get the following error when i try to run the code: ImportError: Error loading object 'craiglist.middlewares.ProxyMiddleware': No module named middlewares I've created middlewares.py file with following code: import base64 # Start your middleware class class ProxyMiddleware(object):...

Python 3.3 TypeError: can't use a string pattern on a bytes-like object in re.findall()


python-3.x,web-crawler
I am trying to learn how to automatically fetch urls from a page. In the following code I am trying to get the title of the webpage: import urllib.request import re url = "http://www.google.com" regex = '<title>(,+?)</title>' pattern = re.compile(regex) with urllib.request.urlopen(url) as response: html = response.read() title = re.findall(pattern,...

Scrapy CrawlSpider not following links


python,web-scraping,web-crawler,scrapy,scrapy-spider
I am trying to crawl some attributes from all(#123) detail pages given on this category page - http://stinkybklyn.com/shop/cheese/ but scrapy is not able to follow link pattern I set, I checked on scrapy documentation and some tutorials as well but No Luck! Below is the code: import scrapy from scrapy.contrib.linkextractors...

TYPO3 Extbase build own Sitemap


typo3,sitemap,extbase
There are a lot of sitemap Generators for TYPO3 in the TER available. But none of them can handle Sites created by Extbase, which are not shown in the TYPO3 page tree. Edit Thanks to biesior, detailed informations: Unconventional I switch in TypoScript the GET Parameter for my Extbase extension[globalVar...

XML Sitemap Parsing Error With PHP


php,xml,parsing,sitemap
I'm using php to create a sitemap xml file for google submission but I'm getting an error in my code, which is: <?php $get_posts_sql = "SELECT * FROM posts ORDER BY added DESC"; $get_posts_res = mysqli_query($con, $get_posts_sql); while($post = mysqli_fetch_assoc($get_posts_res)){ $post_id = $post["id"]; $post_title = $post["title"]; $post_added = $post["added"]; $post_date...

Selenium pdf automatic download not working


python,selenium,selenium-webdriver,web-scraping,web-crawler
I am new to selenium and I am writing a scraper to download pdf files automatically from a given site. Below is my code: from selenium import webdriver fp = webdriver.FirefoxProfile() fp.set_preference("browser.download.folderList",2); fp.set_preference("browser.download.manager.showWhenStarting",False) fp.set_preference("browser.download.dir", "/home/jill/Downloads/Dinamalar") fp.set_preference("browser.helperApps.neverAsk.saveToDisk", "application/pdf") browser = webdriver.Firefox(firefox_profile=fp)...

How to keep a web crawler running?


javascript,node.js,web-crawler
I want to write my own web crawler in JS. I am thinking of using a node.js solution such as https://www.npmjs.com/package/js-crawler The objective is to have a "crawl" every 10 minutes - so every 10 minutes I want my crawler to fetch data from a website. I understand that I...

What should be the name of the sitemap file for Google SEO?


seo,sitemap,google-search
I created a sitemap for my website that contains the below code: <?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd"> <url> <loc>http://www.example.com/</loc> </url> <url> <loc>http://www.example.com/aboutus.html</loc> </url> <url>...

Ruby - WebCrawler how to visit the links of the found links?


ruby,url,hyperlink,web-crawler,net-http
I try to make a WebCrawler which find links from a homepage and visit the found links again and again.. Now i have written a code w9ith a parser which shows me the found links and print there statistics of some tags of this homepage but i dont get it...

Scrapy follow link and collect email


python,web-scraping,web-crawler,scrapy
i need help with saving email with Scrapy. The row in .csv file where emails are supposed to be collected is blank. Any help is very appreciated. Here is the code: # -*- coding: utf-8 -*- import scrapy # item class included here class DmozItem(scrapy.Item): # define the fields for...

Crawling & parsing results of querying google-like search engine


java,parsing,web-crawler,jsoup
I have to write parser in Java (my first html parser by this way). For now I'm using jsoup library and I think it is very good solution for my problem. Main goal is to get some information from Google Scholar (h-index, numbers of publications, years of scientific carier). I...

google sitemap for tx_news records with dd_googlesitemap_dmf (or alternative)


typo3,sitemap,typo3-6.2.x
I try to let typo3 generate a sitemap for all the news records. For that I tried the dd_googlesitemap_dmf extention. The dd_googlesitemap works (it creates a sitemap for all the typo3 pages - but not for extensions). I filled in the basic infos into the configuration and called the url...

How to iterate over many websites and parse text using web crawler


python,web-crawler,sentiment-analysis
I am trying to parse text and run an sentiment analysis over the text from multiple websites. I have successfully been able to strip just one website at a time and generate a sentiment score using the TextBlob library, but I am trying to replicate this over many websites, any...

Web Scraper for dynamic forms in python


python,web-scraping,web-crawler,mechanize
I am trying to fill the form of this website http://www.marutisuzuki.com/Maruti-Price.aspx. It consists of three drop down lists. One is Model of the car, Second is the state and third is city. The first two are static and the third, city is generated dynamically depending upon the value of state,...

Python: urllib2 get nothing which does exist


python,web-scraping,web-crawler,urllib2
I'm trying to crawl my college website and I set cookie, add headers then: homepage=opener.open("website") content = homepage.read() print content I can get the source code sometimes but sometime just nothing. I can't figure it out what happened. Is my code wrong? Or the web matters? Does one geturl() can...

Heritrix single-site scrape, including required off-site assets


java,web-crawler,heritrix
I believe need help compiling Heritrix decide rules, although I'm open to other Heritrix suggestions: https://webarchive.jira.com/wiki/display/Heritrix/Configuring+Crawl+Scope+Using+DecideRules I need to scrape an entire copy of a website (in the crawler-beans.cxml seed list), but not scrape any external (off-site) pages. Any external resources needed to render the current website should be downloaded,...

Scrapy not entering parse method


python,selenium,web-scraping,web-crawler,scrapy
I don't understand why this code is not entering the parse method. It is pretty similar to the basic spider examples from the doc: http://doc.scrapy.org/en/latest/topics/spiders.html And I'm pretty sure this worked earlier in the day... Not sure if I modified something or not.. from selenium.webdriver.support.wait import WebDriverWait from selenium.webdriver.common.by import...

cannot create google sitemap for Magento 2nd storefront


xml,magento,sitemap
Wondering if anyone can shed some light on this problem... I have my magento install at showcarsign.com and i'm also running a 2nd storefront with the install at showcarboards.com. I can generate a sitemap for showcarsign.com no problem (showcarsign.com/sitemap/sitemap.xml). But when I go to create the sitemap for showcarboards.com I...

T_STRING error in my php code [duplicate]


php,web-crawler
This question already has an answer here: PHP Parse/Syntax Errors; and How to solve them? 10 answers I have this PHP that is supposed to crawl End Clothing website for product IDs When I run it its gives me this error Parse error: syntax error, unexpected 'i' (T_STRING), expecting...

Scrapy delay request


python,web-crawler,scrapy
every time i run my code my ip gets banned. I need help to delay each request for 10 seconds. I've tried to place DOWNLOAD_DELAY in code but it gives no results. Any help is appreciated. # item class included here class DmozItem(scrapy.Item): # define the fields for your item...

Distinguishing between HTML and non-HTML pages in Scrapy


python,html,web-crawler,scrapy,scrapy-spider
I am building a Spider in Scrapy that follows all the links it can find, and sends the url to a pipeline. At the moment, this is my code: from scrapy import Spider from scrapy.http import Request from scrapy.http import TextResponse from scrapy.selector import Selector from scrapyTest.items import TestItem import...

Scrapy returning a null output when extracting an element from a table using xpath


python,xpath,web-scraping,web-crawler,scrapy
I have been trying to scrape this website that has details of oil wells in Colorado https://cogcc.state.co.us/cogis/FacilityDetail.asp?facid=12307555&type=WELL Scrapy scrapes the website, and returns the URL when I scrape it, but when I need to extract an element inside a table using it's XPath (County of the oil well), all i...