FAQ Database Discussion Community


Mechanize search unable to find CSS selector (it's definitely present)

ruby,css-selectors,nokogiri,mechanize
I have a long CSS selector that works perfectly fine when actually used in CSS, jQuery etc. But this very same selector will not work on a Mechanize::Page object - it simply returns an empty array. The selector targets a paragraph and in my other case a header1. I also...

Selenium interpret javascript on mac?

selenium,web-crawler,mechanize
I'm trying to make a web crawler that click on ads (yes, i know), it's very sophisticated, but, I realise that Google Ads aren't showed when javascript is disabled. Today, i use Mechanize, and it doesn't "accept" javasript. I heard selenium use another system to crawl the net. The only...

Fix Character encoding of webpage using python Mechanize

python,mechanize
I am trying to submit a form on this page using Mechanize. br.open("http://mspc.bii.a-star.edu.sg/tankp/run_depth.html") #selecting form to fill br.select_form(nr = 0) #input for the form br['pdb_id'] = '1atp' req = br.submit() This however gives the following error mechanize._form.ParseError: expected name token at '<! INPUT PDB FILE>\n\t' I figure this is because...

crawling through pagination mechanize python

python,mechanize
I am using mechanize & python to crawl website and get data. So far I am able to submit the form and get the content from that page. But I am unable to trigger click on "Next Page" link and get data. My code is follows: import re import mechanize...

ruby how to close a mechanize connection

ruby-on-rails,ruby,mechanize,open-uri
I have the problem with Too many connection on mechanize and I wounder how I close a connection since I want to build a scraper with proxy. I did find the agent.shutdown but for somereason I cant get that to work. any help ? 10.times { minion = Mechanize.new {...

Hidden HTML elements using Mechanize Python

python,html,web-scraping,mechanize,hidden
So I'm writing a Python script that checks Blackboard (school interface site) for updates. But the HTML I receive back from my script is not completely the same as the HTML when viewed in my browser. I'm unsure if this is a cookie issue or what I'm missing. USERNAME =...

Writing loop over multiple pages with BeautifulSoup

python,loops,beautifulsoup,mechanize,bs4
I'm attempting to scrape several pages of results from the county search tool here: http://www2.tceq.texas.gov/oce/waci/index.cfm?fuseaction=home.main But I can't seem to figure out how to iterate over more than just the first page. import csv from mechanize import Browser from bs4 import BeautifulSoup url = 'http://www2.tceq.texas.gov/oce/waci/index.cfm?fuseaction=home.main' br = Browser() br.set_handle_robots(False) br.open(url)...

Why am I getting an unsupportedSchemeError

ruby,mechanize
I am writing a program that uses Mechanize to scrape a student's grades and classes from edline.net using a student account and return the data I need. However, after logging in, from the homepage I have to access a link (called 'Private Reports') which will then dynamically return a page...

Submitting a Form using Mechanize (PubChem)

python,mechanize
I am trying to write a chemical property scraper for PubChem. I am pretty new to mechanize, and just programming in general, so I got stuck on how to submit the form for this website: https://pubchem.ncbi.nlm.nih.gov/. br.submit() is producing an error (it just says httperror_seek_wrapper), and I am unsure on...

Map two Nokogiri objects

ruby,nokogiri,mechanize
A quick question: <table> <tr> <th>foo</th> <td><p>bar</p></td> </tr> </table> details = doc.css('table > tr > th') details2 = doc.css('table > tr > td > p') details = details.map { |n| { name: n.text }} details2 = details2.map { |n| { value: n.text }} How can I merge those Nokogiri objects...

Messy Python install? (OS X)

python,osx,python-2.7,mechanize,python-3.4
I am a complete beginner to both Python and the OS X Terminal, and I have attempted to install some packages for both Python 2.7.3 and Python 3.4. I can't get mechanize to work with neither Python 2 or Python 3 after install. I get: >>> from mechanize import *...

href does not want to get printed although I followed the path

html,ruby,xpath,nokogiri,mechanize
I want to enter a link in a webpage. This is its location in the inspect element: As You can see, to reach the link in the < body >, I have to pass through: 1) < div class = "container" > 2) < div id = "result content >...

BeautifulSoup parse 'findAll' run error

python,beautifulsoup,mechanize
I am trying to parse a site using mechanize and BeautifulSoup and not having any luck, I know can access the site table because I can read and print the entire page...user agent not posted here. html = page.read() soup = BeautifulSoup(html) table = soup.find("table", id="table-hover") for row in table.findAll('tr')[1:]:...

Web Scraper for dynamic forms in python

python,web-scraping,web-crawler,mechanize
I am trying to fill the form of this website http://www.marutisuzuki.com/Maruti-Price.aspx. It consists of three drop down lists. One is Model of the car, Second is the state and third is city. The first two are static and the third, city is generated dynamically depending upon the value of state,...

Mechanize select from dropdown

python,mechanize
I want to mechanize to check if the current value of selected dropdown = the default value, then mechanize will choose another value in the list instead. The html of the dropdown is as follow: <td class="label">List</td> <td> <select name="list" id="list" onchange="list()"> <option>---</option> <option value='1'>1</option> <option value='2'>2</option> ---other options--- My...

Error logging into instagram with python

python,beautifulsoup,mechanize
I am trying to log into my instagram via a python script using argparse. It seems to connect but it prints out "This page could not be loaded. If you have cookies disabled in your browser, oryou are browsing in Private Mode, please try enabling cookies or turning off Private...

How can I log into a simple web access login using Python?

python,web,browser,login,mechanize
I'm trying to create a little Python script that'll log into a web access authentication page for me automatically for the purposes of convenience (the login appears each time the computer is disconnected from the network). My attempt so far has been to use the module mechanize, but running this...

Ruby Mechanize form input field text

ruby,csv,automation,web-scraping,mechanize
Resolved - the "abc = list.scan(/[([^)]+)]/).last.first" line was correct but also included the quotes, which the website search form did not accept. Corrected it to abc = list.scan(/\"([^)]+)\"/).join. Thanks for all the help. I have to automate a search using a list of 100 keywords that is in a csv...

HTTP Error 999: Request denied

python,web-scraping,beautifulsoup,linkedin,mechanize
I am trying to scrape some web pages from LinkedIn using BeautifulSoup and I keep getting error "HTTP Error 999: Request denied". Is there a way around to avoid this error. If you look at my code, I have tried Mechanize and URLLIB2 and both are giving me the same...

Recovering from HTTPError in Mechanize

python,mechanize,http-error
I am writing a function for some existing python code that will be passed a Mechanize browser object as a parameter. I fill in some details in a form in the browser, and use response = browser.submit() to move the browser to a new page, and collect some information from...

Python mechanize is not handling form exception

python,exception,mechanize
I am writing a web scraper using Python and mechanize. The scraper looks for the "Next" button and loops until it comes to the last page, which does not have a "Next" button. That gives the FormNotFoundError: exception, which stops the loop. When I try to catch the exception, I...

How is Ruby Mechanize fast after first get request?

ruby,web-scraping,mechanize
I recently programmed a scraper with Ruby's Mechanize gem for the first time. It had to hit the server (some 'xyz.com/a/number') where the number will be generated by the script. Like 'xyz.com/a/2' and 'xyz.com/a/3'. It turned out that the first request took a lot of time -- around 1.5s on...

Mechanize: Unable to redirect to final destination

ruby,mechanize,mechanize-ruby
I am using mechanize to try to login to a site. However when logging in with mechanize, I cant seem to get to final destination. Below is the example. Mechanize go to (1) https://app.abc.com/users/login If login successful, webpage will momentarily post to page link below (2) https://paid.abc.com/Login.aspx?CID=123&XRALID=321&LanguageID=1&ExecuteLogin=1 From there automatically...

Python/Mechanize doesn't recognize input form

python,python-2.7,mechanize
Newbie here. I'm trying to mechanize to input text into a search box on a website. For some reason, it seems like the search box doesn't count as a form. The "form" looks like this: <th align="left" scope="col"> <input type="text" name="searchbox" id="searchboxid" size="40" class="search_box ac_input" autocomplete="off"> I get this error...

can't access form from website using python mechanize

python,forms,mechanize,mechanize-python
I am tring to access the forms in a particular website, here is the HTML Code of the form. <form name="calendarForm" method="post" action="/ibook/loginSelection.do"><div><input name="org.apache.struts.taglib.html.TOKEN" value="489be5fa2613d2f762b6389c3dd5ea3f" type="hidden"></div> <table border="0" cellpadding="0" cellspacing="0"> <tbody><tr><td>Please select one of the following services: </td></tr><tr><td> <select name="apptDetails.apptType"...

how do I build a hash when using serialization?

ruby-on-rails,ruby-on-rails-4,mechanize
how do I build a hash when using serialization? { amount => [{year => total_cost}], amount => [{year => total_cost}], amount => [{year => total_cost}] } ...

Loop over all the
tags and extract specefic information via Mechanize/Nokogiri

html,ruby,nokogiri,mechanize
I know the basic things of accessing a website and so (I just started learning yesterday), however I want to extract now. I checked out many tutorials of Mechanize/Nokogiri but each of them had a different way of doing things which made me confused. I want a direct bold way...

Scraping successive pages until the last page using Nokogiri and Mechanize

ruby,web-scraping,nokogiri,mechanize
I am trying to scrape multiple pages from a website. I want to scrape a page, then click on next, get that page, and repeat until I hit the end. I wrote this so far: page = agent.submit(form, form.buttons.first) #submitting a form while lien = page.link_with(:text=>'Next') # while I have...

Mechanize submit result is not the correct page

ruby,mechanize,scrape
I was trying to scrape booking.com as an exercise to learn Mechanize, but I can't get past an issue. I am trying to get a hotel's prices trough Mechanize using the following code: hotel_name = "Hilton New York" date = Date.today day_after_date = date + 1 agent = Mechanize.new homepage...

How to get option values from second select list?

ruby,web-scraping,mechanize
I have a problem with Mechanize/Ruby. I can't get the second select list options when I am selecting the first ones. If I understood correctly from Google, there are some Ajax magic there. At the moment I have something like this: require 'rubygems' require 'mechanize' require 'nokogiri' HOME_URL = 'http://www.parkers.co.uk/'...

How can I login this page and read it?

python,html,google-app-engine,login,mechanize
I know there are alot of question about this matter but I try most of them. my goal is to get the article from this page and use this in gae. If I try to log in, it redirects to a long url ,after I log in there it redirects...

Error trying to image scrape

ruby,image,mechanize,scrape
I'm trying to make a ruby program which will automatically download the most recent Penny-Arcade. Here's the code I have: require 'mechanize' agent = Mechanize.new date_string = Date.today.to_s page = agent.get('http://www.penny-arcade.com/comic/') puts page art_link = page.at('div#comicFrame > a > img')['src'] File.open(date_string, 'wb') do |fo| fo.write open(art_link).read end And the output...