delete,web-crawler,scrapy,scrapy-spider,scrapinghub , delete spiders from scrapinghub


delete spiders from scrapinghub

Question:

Tag: delete,web-crawler,scrapy,scrapy-spider,scrapinghub

I am a new user of scrapinghub. I already searched on googled and had read the scrapinghub docs but I could not find any information about removing spiders from a project. Is it possible, how? I do not want to replace a spider, I want to delete/remove it from scrapinghub spider list.


Answer:

You just need to remove the spider from your project, and deploy the project again, via shub deploy, or scrapyd-deploy.


Related:


postLink() in cakePHP 3.x


cakephp,hyperlink,delete
I just created the CRUD operations in cakephp 3.x. I am deleting the records using postLink() function. $this->Form->postLink("<i class='fa fa-remove'></i>", ['action' => 'delete', $role->id], ['escape' => false],['title' => 'Delete', 'class' => 'users'])]); It doesn't set the class for my delete icon. I need set the class name for this delete...

Ruby - WebCrawler how to visit the links of the found links?


ruby,url,hyperlink,web-crawler,net-http
I try to make a WebCrawler which find links from a homepage and visit the found links again and again.. Now i have written a code w9ith a parser which shows me the found links and print there statistics of some tags of this homepage but i dont get it...

How to crawl links on all pages of a web site with Scrapy


website,web-crawler,scrapy,extract
I'm learning about scrapy and I'm trying to extract all links that contains: "http://lattes.cnpq.br/andasequenceofnumbers" , example: http://lattes.cnpq.br/0281123427918302 But I don't know what is the page on the web site that contains these information. For example this web site: http://www.ppgcc.ufv.br/ The links that I want are on this page: http://www.ppgcc.ufv.br/?page_id=697 What...

Delete item “Object required” excel VBA


excel,vba,excel-vba,dynamic,delete
I understand what the problem is, but i don't have a clue on how to solve it... So what I am doing is that I click on a button (addSceneButton) in my worksheet("costing") and it is going to insert a copy of another sheet("Scene Template") just above of the...

Python: Transform a unicode variable into a string variable


python,unicode,casting,web-crawler,unicode-string
I used a web crawler to get some data. I stored the data in a variable price. The type of price is: <class 'bs4.element.NavigableString'> The type of each element of price is: <type 'unicode'> Basically the price contains some white space and line feeds followed by: $520. I want to...

how to download image in Goutte


php,web-crawler,guzzle,goutte
I want to download an image in this page. The image source ishttp://i2.pixiv.net/c/600x600/img-master/img/2015/01/19/12/17/13/48258889_p0_master1200.jpg. I try to download it use this: $client = new Goutte\Client (); $client->getClient->get($img_url, array('save_to' => $img_url_save_name)); But I failed, then I realize if I directly accesshttp://i2.pixiv.net/c/600x600/img-master/img/2015/01/19/12/17/13/48258889_p0_master1200.jpg, I are denied by CDN nginx server. I have to access...

Python 3.3 TypeError: can't use a string pattern on a bytes-like object in re.findall()


python-3.x,web-crawler
I am trying to learn how to automatically fetch urls from a page. In the following code I am trying to get the title of the webpage: import urllib.request import re url = "http://www.google.com" regex = '<title>(,+?)</title>' pattern = re.compile(regex) with urllib.request.urlopen(url) as response: html = response.read() title = re.findall(pattern,...

C - deleting n-ary tree nodes


c,tree,delete,n-ary-tree
I have implemented in C an m,n,k-game with AI. The game works fine but when I have to free the decision tree it always throws an "Access violation reading location" exception. This is the implementation of the decision tree structure: typedef struct decision_tree_s { unsigned short **board; status_t status; struct...

SQL Oracle | How to delete records from a table when they match another table?


sql,oracle,delete
How would I delete records from a table where they match a delete table? As in, I have a table of record keys that say what need to be deleted from my main table. How would I write a delete to say "delete anything from my main table where this...

delete an entry from a table A if that entry is already present in a table B


sql,oracle,oracle11g,delete
My question is somewhat similar to this one SQL Delete Rows Based on Another Table except for the fact two entries match if 4 columns match! So, I have: Table1: Field | Type | Null | Key | Default | Extra f1 | int(32) unsigned | NO | PRI |...

How to automatically delete file older than 24 hours of a .pdf format in a specific folder with PHP


php,delete,directory,file-manipulation
I keep getting many pdf file in my directory root/files/pdfs. I want a PHP script to automatically delete only the .pdf files from the pdfs folder that are older than 24 hours (86400 seconds). What permissions will the .php file require? Where to put the file? Should I have to...

jqGrid onDelete event handler


events,jqgrid,delete,handler,extend
I'm using a jqGrid with datatype: 'local'. The data of the grid is being set dynamically via addRowData. I don't use the asynchronous ajax stuff such as url + datatype: json because the grid has to display only client state. Now I want to use the jqGrid delete row functionality...

Deleting dynamic char** in C++


c++,arrays,dynamic,delete,char
Disclosure: Hi, I'm trying to solve a challenge with strict time and memory limits. I would normally use vectors and strings, but here I need the fastest and smallest solution (with vectors it actually ran above the time limit), so I turned to dynamic arrays of char*. The relevant parts...

Unable to click in CasperJS


javascript,web-crawler,phantomjs,casperjs
I want to crawl the HTML data. And, I tried headless browser in CasperJS. But, Can't able to click. - The following is tried code in CapserJS. var casper = require('casper').create(); var mouse = require('mouse').create(casper); casper.start('http://sts.kma.go.kr/jsp/home/contents/climateData/smart/smartStatisticsSearch.do', function() { this.echo('START'); }); casper.then(function() { this.capture("1.png"); this.mouse.click('li[class="item1"]'); casper.wait(5000, function() { this.capture("2.png"); }); });...

node remove a directory after gzipping it


node.js,delete,gzip
I got the following code from SO to gzip a directory: fstream.Reader({'path':'mydir','type':'Directory'}).pipe(tar.Pack()).pipe(zlib.Gzip()).pipe(fstream.Writer({'path': 'mygz.tar.gz'})); And to delete a directory: rm_rf('mydir',function(error){}); I need to put them together, so that I can gzip a dir and delete the original directory. To do this, I need to find a way to listen to the...

Linux/shell - Remove all (sub)subfolders from a directory except one


linux,delete,find,folder,rm
I've inherited a structure like the below, a result of years of spaghetti code... gallery ├── 1 │   ├── deleteme1 │   ├── deleteme2 │   ├── deleteme3 │   └── full │   ├── file1 │   ├── file2 │   └── file3 ├── 2 │   ├── deleteme1 │   ├── deleteme2 │   ├── deleteme3 │  ...

Set Wordpress post status to 'Draft' from front end similar to get_delete_post_link


wordpress,post,delete,frontend,status
I am using the below code to allow a logged in user to set delete their own posts from the front end. Is there a way to do the same thing but setting the post to 'draft' rather than deleting it completely? <?php if ($post->post_author == $current_user->ID) { ?> <p><a...

PHP login system help needed for deletion


php,login,delete,user
Help with user deletion: Hello I am creating a user creation system for a project of mine, I am still very new to PHP, my issue is getting the user from the MySQL database and then deleting it, I will show you my code below: <?php require_once("config/db.php"); if ($login->isUserLoggedIn() ==...

Destroy an object with variables (free memory)


c++,object,memory,delete,free
I am trying to create an event/date orginaizer in C++. The overview is like a calender show one month and every Day in this calender is an Object (type: EventCell) The class EventCell stores the events for his day in a vector (name: eventData with type: "EventInfo": class for storing...

How can I get the value of a Monad without System.IO.Unsafe? [duplicate]


haskell,web-crawler,monads
This question already has an answer here: How to get normal value from IO action in Haskell 2 answers I just started learning Haskell and got my first project working today. Its a small program that uses Network.HTTP.Conduit and Graphics.Rendering.Chart (haskell-chart) to plot the amount of google search results...

Selenium pdf automatic download not working


python,selenium,selenium-webdriver,web-scraping,web-crawler
I am new to selenium and I am writing a scraper to download pdf files automatically from a given site. Below is my code: from selenium import webdriver fp = webdriver.FirefoxProfile() fp.set_preference("browser.download.folderList",2); fp.set_preference("browser.download.manager.showWhenStarting",False) fp.set_preference("browser.download.dir", "/home/jill/Downloads/Dinamalar") fp.set_preference("browser.helperApps.neverAsk.saveToDisk", "application/pdf") browser = webdriver.Firefox(firefox_profile=fp)...

Heritrix single-site scrape, including required off-site assets


java,web-crawler,heritrix
I believe need help compiling Heritrix decide rules, although I'm open to other Heritrix suggestions: https://webarchive.jira.com/wiki/display/Heritrix/Configuring+Crawl+Scope+Using+DecideRules I need to scrape an entire copy of a website (in the crawler-beans.cxml seed list), but not scrape any external (off-site) pages. Any external resources needed to render the current website should be downloaded,...

Check if element exists in fetched URL [closed]


javascript,jquery,python,web-crawler,window.open
I have a page with, say, 30 URLS, I need to click on each and check if an element exists. Currently, this means: $('area').each(function(){ $(this).attr('target','_blank'); var _href = $(this).attr("href"); var appID = (window.location.href).split('?')[1]; $(this).attr("href", _href + '?' + appID); $(this).trigger('click'); }); Which opens 30 new tabs, and I manually go...

$_ROW 'id' delete user with Get doesn't work


php,delete
I have the following code to load users in a table: register.php here I want to delete users from. Only the second script deleteUsers.php doesn't work. Can any body help me out? register.php` <?php $edit = 'edit:'; $account = 'Username:'; $account = '<font size="4">'.$account.'</font>'; $password1 = 'Password:'; $password1 = '<font...

trying to delete automatically added and sorted answers from a VBA textbox into excel cells


excel,vba,delete,duplicates,add
I have code that adds and that if an answer is provided based on a certain criteria it adds itself to a list. as i have been troubleshooting the rest of the program i have accrued a lot of answers that have been added. if i select the cells it...

SgmlLinkExtractor in scrapy


web-crawler,scrapy,rules,extractor
i need some enlightenment about SgmlLinkExtractor in scrapy. For the link: example.com/YYYY/MM/DD/title i would write: Rule(SgmlLinkExtractor(allow=[r'\d{4}/\d{2}/\d{2}/\w+']), callback='parse_example')] For the link: example.com/news/economic/title should i write: r'\news\category\w+'or r'\news\w+/\w+' ? (category changes but the url contains always news) For the link: example.com/article/title should i write: r'\article\w+' ? (the url contains always article)...

Removing Alert When Using DeleteFile API


vb.net,vba,api,delete
I'm writing a VBA application which involves looping a large number of directories recursively. I am using the FindFirstFile API to to achieve this, as it offers a substantial performance boost over the FileSystemObject. In order to remove the FSO from my code entirely, I need a routine to delete...

Hot to delete all files by a specific name? CryptoWall trojan [closed]


windows,delete,trojan
how can I delete all file that by this name: HELP_DECRYPT.url on internet there is a way but it doesn't work. that solution is this: in cmd del /s HELP_DECRYPT.url

Scrapy CrawlSpider not following links


python,web-scraping,web-crawler,scrapy,scrapy-spider
I am trying to crawl some attributes from all(#123) detail pages given on this category page - http://stinkybklyn.com/shop/cheese/ but scrapy is not able to follow link pattern I set, I checked on scrapy documentation and some tutorials as well but No Luck! Below is the code: import scrapy from scrapy.contrib.linkextractors...

Where do I delete objects? I'm out of scope


c++,object,delete,stack
From what I have gathered it's imperative to delete anything that has been allocated with new. Having said that I feel I'm out of scope in my program to be able to access & delete those objects. GameStateStack.h #include <iostream> class node { public: std::string gameState; node * nextGameState; };...

WA_DeleteOnClose delete all members?


c++,delete,heap,qt5,destructor
I'm in trouble with Qt5's WA_DeleteOnClose attribute. This is the situation: I have a class M that extends QMainWindow, and in this class I use an heap-allocated array. I read that with WA_DeleteOnClose when the window M is closed, and the destructor called, every member with M as parent is...

Get all links from page on Wikipedia


python,python-2.7,web-crawler
I am making a Python web-crawler program to play The Wiki game. If you're unfamiliar with this game: Start from some article on Wikipedia Pick a goal article Try to get to the goal article from the start article just by clicking wiki/ links My process for doing this is:...

find and remove all closed files that are not modified in some-time


linux,delete,find,filesystems
I'm building a script in linux that will remove files from the disc that aren't in used corrently by the OS. I want to use find command so i can execute rm for all the files that i find that are not open. I tried so far this command without...

Adding modify and delete to insert trigger


sql-server,triggers,insert,delete,dml
Good morning everyone, I've been tasked with pushing records from one table (T1) to another (T2). I have the insert portion complete as follows: CREATE TRIGGER [dbo].[CP_to_TW2] ON [dbo].[TEST_PROJ] FOR INSERT AS BEGIN INSERT INTO dbo.TEST_TW (PROJECT_ID,PROJECT_DESC,PROJECT_MANAGER) SELECT PROJ_ID,PROJ_ID+PROJ_NAME,PROJECT_MANAGER FROM inserted END TEST_PROJ is T1 and TEST_TW is T2. The...

How to iterate over many websites and parse text using web crawler


python,web-crawler,sentiment-analysis
I am trying to parse text and run an sentiment analysis over the text from multiple websites. I have successfully been able to strip just one website at a time and generate a sentiment score using the TextBlob library, but I am trying to replicate this over many websites, any...

Web Crawler - TooManyRedirects: Exceeded 30 redirects. (python)


python,web-crawler
I've tried to follow one of the youtube tutorial however I've met some issue. Anyone able to help? I'm new to python, I understand that there is one or two similar question, however, I read and don't understand. Can someone help me out? Thanks import requests from bs4 import BeautifulSoup...

Distinguishing between HTML and non-HTML pages in Scrapy


python,html,web-crawler,scrapy,scrapy-spider
I am building a Spider in Scrapy that follows all the links it can find, and sends the url to a pipeline. At the moment, this is my code: from scrapy import Spider from scrapy.http import Request from scrapy.http import TextResponse from scrapy.selector import Selector from scrapyTest.items import TestItem import...

incompatible types when assigning to type 'char[50]' from type 'char *'


c,list,hyperlink,crash,delete
I am trying to create a link list. I have a function that deletes (delete function) stuff from my link list. But it seems to crash when I try to compare strings. It works up until the last random printf statement. Here is the code: #include <stdio.h> #include <stdlib.h> #include...

Web Scraper for dynamic forms in python


python,web-scraping,web-crawler,mechanize
I am trying to fill the form of this website http://www.marutisuzuki.com/Maruti-Price.aspx. It consists of three drop down lists. One is Model of the car, Second is the state and third is city. The first two are static and the third, city is generated dynamically depending upon the value of state,...

Workload balancing between akka actors


multithreading,scala,web-crawler,akka,actor
I have 2 akka actors used for crawling links, i.e. find all links in page X, then find all links in all pages linked from X, etc... I want them to progress more or less at the same pace, but more often than not one of them becomes starved and...

Postgres SQL: how to delete rows of Table1 where category = x (but Category is defined in Table 2)?


database,postgresql,join,delete
I have a Postgres Database. I am trying to delete rows in Table 1, based on a condition expressed in Table 2. Table 1: id, object_id, time, action_type Table 2: object_id, object_name, object_category I would like to delete all rows in Table 1, where object_category = x. Thanks!...

Heritrix not finding CSS files in conditional comment blocks


java,web-crawler,heritrix
The Problem/evidence Heritrix is not detecting the presence of files in conditional comments that open & close in one string, such as this: <!--[if (gt IE 8)|!(IE)]><!--> <link rel="stylesheet" href="/css/mod.css" /> <!--<![endif]--> However standard conditional blocks like this work fine: <!--[if lte IE 9]> <script src="/js/ltei9.js"></script> <![endif]--> I've identified the...

T_STRING error in my php code [duplicate]


php,web-crawler
This question already has an answer here: PHP Parse/Syntax Errors; and How to solve them? 10 answers I have this PHP that is supposed to crawl End Clothing website for product IDs When I run it its gives me this error Parse error: syntax error, unexpected 'i' (T_STRING), expecting...

Scrapy collect data from first element and post's title


python,web-scraping,web-crawler,scrapy,scrapy-spider
I need Scrapy to collect data from this tag and retrieve all three parts in one piece. The output would be something like: Tonka double shock boys bike - $10 (Denver). <span class="postingtitletext">Tonka double shock boys bike - <span class="price">$10</span><small> (Denver)</small></span> Second is to collect data from first span tag....

LinkedList iterator remove


java,linked-list,delete,iterator
I have a question on linkedlist iterator If I'm using next , previous and remove methods for example : name.add("Alvin") name.add("Keven") name.add("Jack") ListIterator<String> iterator = name.listIteraot(); //|AKJ iterator.next(); // A|KJ iterator.next(); // AK|J iterator.add("Nina") // AKN|J iterator.next(); // AKNJ| iterator.remove(); // AKN| In the next and then remove method we...

Can I delete an item of a queryset in python but without deleting that item on the database?


python,delete,django-queryset
I am creating an app in django, and I have the next problem: I get a queryset using the next command line: queryset = Persons.objects.all() Assume the resulting list is the next one: ['x', 'y', 'z'] And I want to remove an element x of that list, so that the...

Howto use scrapy to crawl a website which hides the url as href=“javascript:;” in the next button


javascript,python,pagination,web-crawler,scrapy
I am learning python and scrapy lately. I googled and searched around for a few days, but I don't seem to find any instruction on how to crawl multiple pages on a website with hidden urls - <a href="javascript:;". Basically each page contains 20 listings, each time you click on...

Scrapy not entering parse method


python,selenium,web-scraping,web-crawler,scrapy
I don't understand why this code is not entering the parse method. It is pretty similar to the basic spider examples from the doc: http://doc.scrapy.org/en/latest/topics/spiders.html And I'm pretty sure this worked earlier in the day... Not sure if I modified something or not.. from selenium.webdriver.support.wait import WebDriverWait from selenium.webdriver.common.by import...

Why scrapy not giving all the results and the rules part is also not working?


python,xpath,web-scraping,web-crawler,scrapy
This script is only providing me with the first result or the .extract()[0] if I change 0 to 1 then next item. Why it is not iterating the whole xpath again? The rule part is also not working. I know the problem is in the response.xpath. How to deal with...

Apache Nutch REST api


api,rest,web-crawler,nutch
I'm trying to launch a crawl via the rest api. A crawl starts with injecting urls. Using a chrome developer tool "Advanced Rest Client" I'm trying to build this POST payload up but the response I get is a 400 Bad Request. POST - http://localhost:8081/job/create Payload { "crawl-id":"crawl-01", "type":"INJECT", "config-id":"default",...