FAQ Database Discussion Community


Parsing through python using beautiful soup

python,html,beautifulsoup,screen-scraping
I'm trying to parse through the poorly structured website of a restaurant and print out just the menu headers like: "Bento Box", "Bara Chirashi set", etc I'm using the Python library Beautiful Soup, but I'm having trouble getting the proper output: import requests from bs4 import BeautifulSoup url = ('http://www.sushitaro.com/menu-lunch.html')...

Override style of webpage in webview - android

android,webview,screen-scraping
I am trying to override an existing style for a webpage in a webview. The following webViewClient has not effect on the webpage. public class WebClient extends WebViewClient { @Override public boolean shouldOverrideUrlLoading(WebView view, String url) { return super.shouldOverrideUrlLoading(view, url); } @Override public void onPageFinished(WebView view, String url) { view.loadUrl("javascript:document.getElementsByClassName('main-container').style.paddingTop...

Extracting links with scrapy that have a specific css class

python,web-scraping,scrapy,screen-scraping,scrapy-spider
Conceptually simple question/idea. Using Scrapy, how to I use use LinkExtractor that extracts on only follows links with a given CSS? Seems trivial and like it should already be built in, but I don't see it? Is it? It looks like I can use an XPath, but I'd prefer using...

Selenium and PhantomJS takes 30 seconds to open each link

c#,performance,selenium,phantomjs,screen-scraping
I am trying to open a website and grab some data using Selenium with PhantomJS, however it takes a lot of time to open a website (about 30 second). And every time I open other link I have to wait 30+Seconds. What is wrong with my code ? static void...

Extract URLs from Google search result page

html,go,screen-scraping
I'm trying to grab all the URLs off of a Google search page and there are two ways I think I could do it, but I don't really have any idea how to do them. First, I could simply scrape them from the .r tags and get the href attribute...

scraping multiple table out of webpage in R

r,table,data,screen-scraping
I am trying to pull mutual funds data into R, My way of code works for single table but when there are multiple tables in a webpage, it doesn't work. Link - https://in.finance.yahoo.com/q/pm?s=115748.BO My Code url <- "https://in.finance.yahoo.com/q/pm?s=115748.BO" library(XML) perftable <- readHTMLTable(url, header = T, which = 1, stringsAsFactors =...

Python, beautifulsoup scraping specific or exact numbers from a stat table

python,table,statistics,beautifulsoup,screen-scraping
On a player stat page. How can I make my anchor point the year "2014" and grab specific numbers in the 2014 column (scrape numbers to the right of 2014) The code below is skipping the "Passing" table (with all of the career passing stats) and trying to grab stats...

Multiple page scraping [closed]

php,screen-scraping
I'm looking for a method to make the following script scrape multiple pages located in an array and write the selected content into a text or excel document. Any ideas? Is this possible? And another question would be why the script works on localhost but not when placed on the...

What are the best practices to screen scrape thousands of pages in ruby? [closed]

ruby,screen-scraping
I am building a ruby script that screen scrape a lot of items getting its product url (over 200k items). Now I have to access each item's page and copy some data. What are best practices to open over 200k pages faster (in terms of code and server)? Beyond the...

How to take data from variable and put it into another

python,web-scraping,beautifulsoup,screen-scraping
i'm having a little bit of an issue: I would like to take this data, for item in g_data: print item.contents[1].find_all("a", {"class":"a-link-normal s-access-detail-page a-text-normal"})[0]["href"] print item.contents[1].find_all("a", {"class":"a-link-normal s-access-detail-page a-text-normal"})[1]["href"] print item.contents[1].find_all("a", {"class":"a-link-normal s-access-detail-page a-text-normal"})[2]["href"] print item.contents[1].find_all("a", {"class":"a-link-normal s-access-detail-page a-text-normal"})[3]["href"] and use the...

Find input submit element by multiple class values in Splinter?

python,dom,web-scraping,screen-scraping,splinter
I would like to find such element: <input type="submit" value="login" class="button button-line navy" onclick="..."> I'm using such method but it finds nothing: browser.find_by_css('.button .button-line .navy').first().click()...

Scraping data through paginated table using python

python,beautifulsoup,screen-scraping
I am scraping data through google finance's historical page for a stock (http://www.google.com/finance/historical?q=NSE%3ASIEMENS&ei=PLfUVIDTDuSRiQKhwYGQBQ). I can scrape the 30 rows on the current page. The issue I am facing is that I am unable to scrape through the rest of data in the table (31-241 rows). How do I go to...

Scraping Data from Silverlight Control within Browser

silverlight,excel-vba,screen-scraping
I have been using Excel VBA and WPF applications to scrape data from various websites, and all has gone well. But now I have run into something I cant get past. The website is displaying its data within a Silverlight control: <OBJECT width="100%" height=400 id=rnSilverlightGrid data="data:application/x-oleobject;base64,QfXq3+...blah blah thousands of characters...AAAA=="...

data scraping ASPX page with curl [closed]

php,asp.net,perl,curl,screen-scraping
I'm looking for scraping script either php,perl,bash using curl,wget to post request to the webform in the below search page that using aspx and get the result that result contain description for ebook since I'm NOT familiar with .NET so I Having difficulty getting it to work. Here the website...

Scraping only when a change is detected?

javascript,phantomjs,screen-scraping,casperjs
My bank has a really simple login system. Using casperjs I have been able to pull my latest account balance and my last transaction. There are many ways to scrape data off the Internet, I just used Casperjs to test out its capabilities.I checked with the bank and they say-...