FAQ Database Discussion Community


Scraping Javascript webpage (script error occurred)

javascript,html,vb.net,web,scrape
I am scraping a dynamic webpage which is a javascript based webpage. I have done codes which is used to load the webpage first in the program: Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load WebBrowser1.Navigate("http://www.changiairport.com/flight-info/flight-status/passenger-departures") End Sub However, each time I run the program, script...

Node can't scrape certain pages

node.js,request,scrape
I don't know if this is something to do with coldfusion pages or what but I can't scrape these .cfm pages In the command line in a directory with request run: node> var request = require('request'); node> var url = 'http://linguistlist.org/callconf/browse-conf-action.cfm?confid=173395'; node> request(url, function (err, res, body) { if (err)...

BeautifulSoup - Select String Based on Dictionary Key

python,html,beautifulsoup,scrape
I am using BeautifulSoup to scrape an HTML page and looking to select a string based on an array key not an element tag. In this case I am looking to use "fmt_headline" as the key to grab "Founder and CEO at SolarThermoChemical LLC". <div id="srp_main_" class=""> <code id="voltron_srp_main-content" style="display:none;">...

iMacros TAG to Find TXT and Click Nearby (previous) Link

javascript,dom,web-scraping,scrape,imacros
Below is the example code of Wordpress Backend tag management section. I'm trying to write an iMacros to find a tag and delete it. However the Tag text doesn't below in any HTML's tag. <div class="tagchecklist"> <span> <a id="post_tag-check-num-0" class="ntdelbutton" tabindex="0">X</a> &nbsp;Orange </span> <span> <a id="post_tag-check-num-1" class="ntdelbutton" tabindex="0">X</a> &nbsp;Apple </span>...

Unable to access value with PHP simple HTML DOM

php,html,arrays,scrape
I'm trying to fetch some html values with PHP Simple HTML DOM and store them in an PHP array. Inside the HTML page I want to parse/fetch the following : <li id="1" data-name="Jason" class="result-names"> <li id="2" data-name="John" class="result-names"> <li id="3" data-name="Elco" class="result-names"> <li id="5" data-name="Dana" class="result-names"> I am able to...

Python web scraping for javascript generated content

javascript,python,web-scraping,scrape
I am trying to use python3 to return the bibtex citation generated by http://www.doi2bib.org/. The url's are predictable so the script can work out the url without having to interact with the web page. I have tried using selenium, bs4, etc but cant get the text inside the box. url...

simple_html_dom: trying to find height in google search

php,web-scraping,simple-html-dom,scrape
Anyone can explain to me what is wrong with the code and how do i get the height value? I am trying to get the height of celebrities. Any suggestions? Thanks. My code (Updated with CURL user agent setting as advised): $url='https://www.google.com/webhp?ie=UTF-8#q=ailee+height'; //Set CURL user agent $ch = curl_init(); curl_setopt($ch,...

Scrapy crawler not processing XHR Request

python,web-scraping,xmlhttprequest,scrapy,scrape
My spider is only crawling the first 10 pages, so I am assuming it is not entering the load more button though the Request. I am scraping this website: http://www.t3.com/reviews. My spider code: import scrapy from scrapy.conf import settings from scrapy.http import Request from scrapy.selector import Selector from reviews.items import...

Got the right node with Nokogiri, but need to search further

ruby-on-rails,ruby,nokogiri,scrape
I am using this. doc = Nokogiri::HTML(open(url)) pic = doc.search "[text()*='hiRes']" to get this script node: <script type="text/javascript"> var data = { 'colorImages': { 'initial': [{"hiRes":"http://ecx.images-joes.com/images /I/71MBTEP1W9L._UL1500_.jpg","thumb":"http://ecx.images-joes.com/images /I/41xE2XADIvL._US40_.jpg","large":"http://ecx.images-joes.com/images /I/41xE2XADIvL.jpg","main":{"http://ecx.images-joes.com/images /I/71MBTEP1W9L._UX395_.jpg":[395,260],"http://ecx.images-joes.com/images...

With Xpath, how do you select these elements but not those?

xpath,lxml,scrape
With a general XPath (or with specific functions of lxml in python), how do you select a set of elements that have a set of tags like this? <div class="cl1 a"> <div class="cl1 b"> but not <div class="cl1"> ...

How to use GoogleScraper package to scrape link from different search engines in Python

python,web-scraping,scrape
I want to scrape link from different search engine for my search query in python. For eg Query :- "who is Sachin Tendulkar" Output : Want link from google search , bing search. After digging many link i found google Scrapper packege . Google Scrapper Link https://pypi.python.org/pypi/GoogleScraper/0.1.37 But I didn't...

BeautifulSoup webscrape, isolate specific tag with random html class

python,eclipse,beautifulsoup,scrape
new to web scraping here. I've managed to successfully scrape a website, however i've encountered one problem. Within the article class there is usually only one 'p' tag, however sometimes randomly in an article class there will be two or three 'p' tags with some irrelevant text. The tag I...

Error trying to image scrape

ruby,image,mechanize,scrape
I'm trying to make a ruby program which will automatically download the most recent Penny-Arcade. Here's the code I have: require 'mechanize' agent = Mechanize.new date_string = Date.today.to_s page = agent.get('http://www.penny-arcade.com/comic/') puts page art_link = page.at('div#comicFrame > a > img')['src'] File.open(date_string, 'wb') do |fo| fo.write open(art_link).read end And the output...

R Rvest for() and Error server error: (503) Service Unavailable

r,loops,error-handling,scrape,rvest
I'm new to webscraping, but I am excited using rvest in R. I tried to use it to scrape particular data of companies. I have created a for loop (171 urls), and when I am running it stops on 6th or 7th url with an error Error in parse.response(r, parser,...

preg_match file url from jwplayer

php,html,web-scraping,scrape
I used simple HTML DOM Parser to get the html from a page. Now I want to scrape the file URL from the <script></script> tags. This what I got: <script type="text/javascript"> jwplayer("ContainerFlashPlayer").setup({ 'autostart': 'true', 'primary': 'html5', 'flashplayer': '/images/embed/player.5.10.swf', 'file':'/zxdfgdfr44444/afrah/Basem_elkerbelay/selawat/guivvahpasjp.mp3', 'duration': '356.64975', 'image': '/images/flashimg.png', 'volume': '75', 'height': '240', 'width': '330', 'controlbar':...

Achievements rescrape fail. Error #3403 Achievement hasn't been Registered

php,facebook,facebook-php-sdk,scrape,achievements
Let me start by saying that this is an established app with 51 established achievement that were all working for the past couple of years until a few days ago. I believe I created this mess by making some small changes to the page scraped by the Facebook achievements system....

Mechanize submit result is not the correct page

ruby,mechanize,scrape
I was trying to scrape booking.com as an exercise to learn Mechanize, but I can't get past an issue. I am trying to get a hotel's prices trough Mechanize using the following code: hotel_name = "Hilton New York" date = Date.today day_after_date = date + 1 agent = Mechanize.new homepage...

Converting a str to a float (python34)

python,string,scrape
There is a part in my python script where I recieve this error: TypeError: unsupported operand type(s) for +: 'float' and 'str' code: for proj in data['daily_projections']: proj['nba_player_id'] = float(proj['nba_player_id']) print(proj['fanduel_fp'] + ' ' + proj['nba_player_id']) this what I currently have and it is not working properly. 'proj['fanduel_fp']' is the...

Using Ruby and Twitter can I gather ALL of a user's timeline?

ruby,twitter,scrape,timeline,tweets
I'm trying to retrieve a user's timeline. The api says you can get (at the most) 3,200 tweets. I only seem to know how to get 20 using this code: def gather_tweets_from(user) tweets = [] file = File.open("tweets_from.txt","w") client.user_timeline(user).each { |tweet| file.puts tweet.text } end gather_tweets_from(user) Please help me out,...

Can SPARQL handle blank results for specific cells?

web-scraping,sparql,scrape,dbpedia
I am writing a SPARQL query and cant figure out how to allow blank results for specific columns. My current request is: select * where { ?game a dbpedia-owl:Game ; dbpprop:name ?name ; dbpedia-owl:publisher ?publisher . } Some Games have an owl for publisher while others do not. The above...

Python Selenium - 'Unable to locate element' after made visible

python,selenium,selenium-webdriver,web-scraping,scrape
I need your help. I'm trying to scrape some data from tripadvisor using Selenium in Python 2.7. However, I'm getting stuck at one point. After browsing to the correct page, I'm trying to filter the hotels on certain prices. To do this, you do a mouse over or click on...

R, Xpath, Scrape

r,xpath,scrape
I want to scrape a website using Xpath references and R. I am new to this, but as far as I learned, I write the following code,, A <- "http://www.strompreis.elcom.admin.ch/ShowCat.aspx?placeNumber=5661&OpID=2&Period=2015" doc <- htmlParse(A) A <- xpathApply(A,path="//tr[1]/td/span",fun=xmlAttrs) However, I got the following error, Error in UseMethod("xpathApply") : no applicable method for...

Python/Scrapy: Scraping Nasdaq's data? [closed]

javascript,python,selenium,scrapy,scrape
I am comfortable scraping most sites with Scrapy, however I have never tried getting dynamic content from javascript and I am running into a lot of arguments in regard to how to start learning. I am attempting to scrape revenue data from the table at: http://www.nasdaq.com/symbol/scmp/revenue-eps I have heard a...

Scrapy: how can I get the content of pages whose response.status=302?

web-scraping,scrapy,scrape,scrapy-spider
I get the following log when crawling: DEBUG: Crawled (302) <GET http://fuyuanxincun.fang.com/xiangqing/> (referer: http://esf.hz.fang.com/housing/151__1_0_0_0_2_0_0/) DEBUG: Scraped from <302 http://fuyuanxincun.fang.com/xiangqing/> But it actually returns nothing. How can I deal with these response with status=302? Any help would be much appreciated !...

How can I get data from a specific class of a html tag using beautifulsoup?

python,beautifulsoup,scrape
I want to get data located(name, city and address) in div tag from a HTML file like this: <div class="mainInfoWrapper"> <h4 itemprop="name">name</h4> <div> <a href="/Wiki/Province/Tehran"></a> city <a href="/Wiki/City/Tehran"></a> Address </div> </div> I don't know how can I get data that i want in that specific tag. obviously I'm using python...

Python Beautiful Soup Scraping Exact Content From Charts

python,table,data,beautifulsoup,scrape
In python using beautiful soup I want to be able to grab specific text/numbers from a sortable table online. http://www.nfl.com/stats/categorystats?archive=false&conference=null&role=OPP&offensiveStatisticCategory=null&defensiveStatisticCategory=INTERCEPTIONS&season=2014&seasonType=REG&tabSeq=2&qualified=false&Submit=Go I have attempted this about a million times and can't figure it out. This is the best i could do: from bs4 import BeautifulSoup import urllib2 import requests import pymongo...