ruby,csv,automation,web-scraping,mechanize , Ruby Mechanize form input field text

Question:

Tag: ruby,csv,automation,web-scraping,mechanize

Resolved - the "abc = list.scan(/[([^)]+)]/).last.first" line was correct but also included the quotes, which the website search form did not accept. Corrected it to abc = list.scan(/\"([^)]+)\"/).join.

Thanks for all the help.

I have to automate a search using a list of 100 keywords that is in a csv file.

With Mechanize, I can submit the search using this example (http://mechanize.rubyforge.org/GUIDE_rdoc.html):

agent = Mechanize.new
page = agent.get('http://google.com/')
google_form = page.form('f')
google_form.q = 'ruby mechanize'
page = agent.submit(google_form)
pp page


However, when I make it loop through the csv file, it returns an error (in this example, the first csv entry would be 'ruby mechanize':

#i have already imported the csv list, now it is looping through the array "raw_list"

raw_list.each do |list|
abc = list.scan(/$([^\)]+)$/).last.first

# i tested a "puts abc" which returned "ruby mechanize", so I don't understand why the rest of this doesn't work

agent = Mechanize.new
page = agent.get('http://google.com/')
google_form = page.form('f')
google_form.q = abc

#even though abc = "ruby mechanize", an error occurs.

page = agent.submit(google_form)
pp page


It doesn't seem to take the variable "abc", but works if you manually type in 'ruby mechanize' even though both are the same.

The error that appears is:

C:filename: in block (2 levels) in <top (required)>': undefined method text' for nil:NilClass (NoMethodError)
from C:/RailsInstaller/Ruby2.0.0/lib/ruby/gems/2.0.0/gems/mechanize-2.7.3/lib/mechanize.rb:442:in get'
from C:/Users/victor/RubymineProjects/untitled/scraper.rb:23:in block in <top (required)>'
from C:/Users/victor/RubymineProjects/untitled/scraper.rb:19:in each'
from C:/Users/victor/RubymineProjects/untitled/scraper.rb:19:in <top (required)>'
from -e:1:in load'
from -e:1:in <main>'


Any help would be appreciated.

Answer:

Your error is telling you that something on line 19 in your code is causing the issue for line 442 in mechanize.

I tried your sample out in IRB and it seems to work fine:

2.2.2 :001 > require 'mechanize'
=> true
2.2.2 :002 > agent = Mechanize.new
=> #<Mechanize:...
2.2.2 :003 > page = agent.get('http://google.com/')
=> #<Mechanize::Page
...
2.2.2 :004 > google_form = page.form('f')
=> #<Mechanize::Form
...
2.2.2 :005 > google_form.q
=> ""
2.2.2 :006 > abc = "ruby mechanize"
=> "ruby mechanize"
2.2.2 :007 > google_form.q = abc
=> "ruby mechanize"
2.2.2 :008 > page = agent.submit(google_form)
=> #<Mechanize::Page
...


Scan will return nil if nothing is found so your error is happening here:

abc = list.scan(/$([^\)]+)$/).last.first


http://ruby-doc.org/stdlib-2.2.0/libdoc/strscan/rdoc/StringScanner.html

You can replace that with:

abc = list.scan(/$([^\)]+)$/).join


You'll always get a string although it may be only "".

http://ruby-doc.org/core-2.2.0/Array.html#method-i-join

Related:

Convert strings of data to “Data” objects in R [duplicate]

r,date,csv
This question already has an answer here: as.Date with dates in format m/d/y in R 2 answers My problem is that the as.Date function does not convert the values in a "date" column of a data frame into Date objects. I have a data.frame nmmaps. Here is a short...

Rails shared controller actions

ruby-on-rails,ruby,ruby-on-rails-4
I am having trouble building a controller concern. I would like the concern to extend the classes available actions. Given I have the controller 'SamplesController' class SamplesController < ApplicationController include Searchable perform_search_on(Sample, handle: [ClothingType, Company, Collection, Color]) end I include the module 'Searchable' module Searchable extend ActiveSupport::Concern module ClassMethods def...

Ruby gsub group parameters do not work when preceded by escaped slashes

ruby,regex
I am trying to perform a trivial substitution, that in any other language I have come across, work as per the documentation. However, my substitution fails for some reason. The documentation examples list: "hello".gsub(/[aeiou]/, '*') #=> "h*ll*" "hello".gsub(/([aeiou])/, '<\1>') #=> "h<e>ll<o>" "hello".gsub(/./) {|s| s.ord.to_s + ' '} #=> "104 101...

Appending an element to a page in VoltRb

html,ruby,opalrb,voltrb

How could I padding spaces to a fix length

ruby
I need all strings' length with 5 Original [477, 4770,] Expected ["477 ", "4770 "] How could I do it with Ruby ?...

What is Rack::Utils.multipart_part_limit within Rails and what function does it perform?

ruby-on-rails,ruby,rack,multipart
Rack::Utils.multipart_part_limit is set to 128 by default. What purpose does the value have and what effect does it have within the Rails system?...

Rails Association Guidance [on hold]

ruby-on-rails,ruby,ruby-on-rails-4,ruby-on-rails-3.2
I am new to rails 4. I have gone through lots of tutorials and trying to solve below scenario. But still no success. Can anybody point me in the right direction. How to handle associations for below scenario. Scenario: 1. Patient can have many surgeries. 2. Surgery has two types...

Replace improper commas in CSV file

regex,r,csv
This may have been asked before, but I couldn't find it. I have a list of CSV files (439 or so) where, in a few of the files, someone also used commas in editorial comments. The result is that I can't put the files into a data frame, since the...

Python CSV reader/writer handling quotes: How can I wrap row fields in quotes? (Getting triple quotes as output)

python,csv
I have a problem with the csv reader and writer in python. Whenever I try to take one CSV file and par down the number of columns from roughly 37 to 6, this is the kind of output I am getting. Example of one row: 0,"JOHNSON, JOHN J.",JOHN J. JOHNSON,TECH879,INSPECTION...

Heroku RAM not increasing with upgraded dynos

ruby-on-rails,ruby,ruby-on-rails-3,memory,heroku
I have a massive function i have been calling manually through the heroku rails console. I have been receiving the error rapid fire in my logs: 2015-06-22T14:56:42.940517+00:00 heroku[run.9877]: Process running mem=575M(112.4%) 2015-06-22T14:56:42.940517+00:00 heroku[run.9877]: Error R14 (Memory quota exceeded) A 1X dyno is suppose to have 512 MB of RAM. I...

Heroku rake db:migrate failing - uninitialized constant

ruby-on-rails,ruby,heroku
My app is working fine locally and my push to Heroku was successful. But, when I run heroku run rake db:migrate, I get the following error: NameError: uninitialized constant AddWeightToExercises Here is the failed migration: class AddWeightToExercise < ActiveRecord::Migration def change add_column :exercises, :weight, :float end end edit: Thanks for...

Perl: Using Text::CSV to print AoH

arrays,perl,csv
I have an array of hashes (AoH) which looks like this: \$VAR1 = [ { 'Unit' => 'M', 'Size' => '321', 'User' => 'test' } { 'Unit' => 'M' 'Size' => '0.24' 'User' => 'test1' } ... ]; How do I write my AoH to a CSV file with separators,...

Ruby on Rails - Help Adding Badges to Application

ruby-on-rails,ruby,rest,activerecord,one-to-many
I'm creating a rails application that is a backend for a mobile application. The backend is implemented with a RESTful web API. Currently I am trying to add gamification to the platform through the use of badges that can be earned by the user. Right now the badges are tied...

How to handle backslash “\” escape characters in q string and heredocument

ruby
Ruby Newbie here. I do not understand why Ruby looks inside %q and escapes the \. I am using Ruby to generate Latex code. I need to generate \\\hline which is used in Latex for table making. I found \\\hline as input generated \hline even though the string was inside...

Map with accumulator on an array

ruby,inject
I'm looking to create a method for Enumerable that does map and inject at the same time. For example, calling it map_with_accumulator, [1,2,3,4].map_with_accumulator(:+) # => [1, 3, 6, 10] or for strings ['a','b','c','d'].map_with_accumulator {|acc,el| acc + '_' + el} # => ['a','a_b','a_b_c','a_b_c_d'] I fail to get a solution working. I...