## Question:

I try to make a WebCrawler which find links from a homepage and visit the found links again and again.. Now i have written a code w9ith a parser which shows me the found links and print there statistics of some tags of this homepage but i dont get it how to visit the new links in a loop and print there statistics too.

*

@visit = {}
@src = Net::HTTP.start(@url.host, @url.port) do |http|
http.get(@url.path)
@content = @src.body


*

def govisit
if @content =~ @commentTag
end

cnt = @content.scan(@aTag)
end

puts "Links on this site: "
end

if @visit.size >= 500
exit 0
end

printStatistics
end


First of all you need a function that accepts a link and returns the body output. Then parse all the links out of the body and keep a list of links. Check that list if you didn't visit the link yet. Remove those visited links from the new links list and call the same function again and do it all over.

To stop the crawler at a certain point you need to build in a condition the while loop.

@visited_links = []

@src = Net::HTTP.start(@url.host, @url.port) { |http|  http.get(@url.path) }
@src.body
end

# check if the content does not have the same link
end

end


# Related:

## Ruby access words in string

ruby
I don't understand the best method to access a certain word by it's number in a string. I tried using [] to access a word but instead it returns letter. puts s # => I went for a walk puts s[3] # => w ...

## Get the actual value of a boolean attribute

ruby,page-object-gem,rspec3,rspec-expectations
I have the span: <span disabled="disabled">Edit Member</span> When I try to get the value of the disabled attribute: page.in_iframe(:id => 'MembersAreaFrame') do |frame| expect(page.span_element(:xpath => "//span[text()='Edit Member']", :frame => frame).attribute('disabled')).to eq("disabled") end I get: expected: "disabled" got: "true" How do I get the value of specified attribute instead of a...

## Keep leading zeroes when converting string to integer

ruby
For no particular reason, I am trying to add a #reverse method to the Integer class: class Integer def reverse self.to_s.reverse.to_i end end puts 1337.reverse # => 7331 puts 1000.reverse # => 1 This works fine except for numbers ending in a 0, as shown when 1000.reverse returns 1 rather...

## Heroku rake db:migrate failing - uninitialized constant

ruby-on-rails,ruby,heroku
My app is working fine locally and my push to Heroku was successful. But, when I run heroku run rake db:migrate, I get the following error: NameError: uninitialized constant AddWeightToExercises Here is the failed migration: class AddWeightToExercise < ActiveRecord::Migration def change add_column :exercises, :weight, :float end end edit: Thanks for...

## why i am not able to read Html Content from a website in a file?

java,url
I have made a java program where in i can use any website to read its Html Content using Scanner class and Varargs.I am not able to get the output while i am using Scanner class and VarArgs. Below is the following Code. import java.io.FileWriter; import java.io.IOException; import java.io.InputStreamReader; import...

## RSpec test for rake task

I have created a custom rake task that deletes all items that are >= 7 days old. I am trying to write a RSpec test for this new task but it seems like my task isn't really running in the test. I have tested the task manually and it works...

## Rails Association Guidance [on hold]

ruby-on-rails,ruby,ruby-on-rails-4,ruby-on-rails-3.2
I am new to rails 4. I have gone through lots of tutorials and trying to solve below scenario. But still no success. Can anybody point me in the right direction. How to handle associations for below scenario. Scenario: 1. Patient can have many surgeries. 2. Surgery has two types...

## Loop until i get correct user

ruby,redis
I have users stored in Redis and want to be able to call only certain subsets from a set, if i don't get the correct user back i want to put it back in the set and then try again until i get one of the desired users @redis =...

ruby-on-rails,ruby,rest,activerecord,one-to-many
I'm creating a rails application that is a backend for a mobile application. The backend is implemented with a RESTful web API. Currently I am trying to add gamification to the platform through the use of badges that can be earned by the user. Right now the badges are tied...

## What is Rack::Utils.multipart_part_limit within Rails and what function does it perform?

ruby-on-rails,ruby,rack,multipart
Rack::Utils.multipart_part_limit is set to 128 by default. What purpose does the value have and what effect does it have within the Rails system?...

## Stack level too deep because recursion

I have a model named Tweet. The columns of the Tweet model are: -id -content -user_id -picture -group -original_tweet_id Every tweet can have one or multiple retweets. The relation happens with the help of original_tweet_id. All the tweets have original_tweet_id nil , whilst the retweets contain the id of the...

## What is Fragment URLs and why to use it

php,url,hash,fragment-identifier
I am new in PHP Development. Today I came across the interesting topic URL fragmentation specifically the '#' part of the URLs. I searched about that It says it's like www.example.com\foo.html#bar. But I don't understand why this "#bar" is needed. and how to read it by PHP?...

## Seeding fails validation for nested tables (validates_presence_of)

ruby-on-rails,ruby,validation,ruby-on-rails-4,associations
An Organization model has a 1:many association with a User model. I have the following validation in my User model file: belongs_to :organization validates_presence_of :organization_id, :unless => 'usertype==1' If usertype is 1, it means the user will have no organization associated to it. For a different usertype the presence of...

## Ruby boolean logic: some amount of variables are true

ruby
Let say I have 3 variables: a, b, c. How can I check that just zero or one of them is true?...

## Ruby- get a xml node value

ruby,xml
can someone help me in extracting the node value for the element "Name". Type 1: I am able to extract the "name" value for the below xml by using the below code <Element> <Details> <ID>20367</ID> <Name>Ram</Name> <Name>Sam</Name> </Details> </Element> doc = Nokogiri::XML(response.body) values = doc.xpath('//Name').map{ |node| node.text}.join ',' puts values...

## String#scan not capturing all occurences

ruby,regex
I'm facing a very strange behaviour with ruby String#scan method return. I have this code below and I can't find out why "scan" doesn't return 2 elements. str = "10011011001" regexp = "0110" p str.scan(/(#{regexp})/) ==> [["0110"]] String "str" clearly contains 2 occurences of pattern "0110". I want to fetch...

## Using Ruby Pathname to access relative directory

ruby,path,pathname
Given I have a relative path pointing to a directory how can I use it with Ruby's Pathname or File library to get the directory itself? p = Pathname.new('dir/') p.dirname => . p.directory? => false I have tried './dir/', 'dir/', 'dir'. What I want is p.dirname to return 'dir'. I...

## Allowing some enabled and disabled option on collection_select

ruby-on-rails,ruby
I am trying to populate a dropdown box on a view that has all the states. This works just fine: <%= f.collection_select :state_id, @states, :id, :name %> Now, I need to make the following: Some states are going to be disabled for choosing, but they still have to appear on...

## Htaccess rewrite URL with virtual directory and 2 variables

regex,apache,.htaccess,url,rewriting

## Can't map a range of dates in Ruby/Rails

ruby-on-rails,ruby
I'm trying to map a range of dates and pass them to my view as an array, as follows: from, to = Date.parse("2014-01-01"), Date.yesterday date_range = (from..to) @mapped_dates = date_range.map {|date| date.strftime("%b %e")} I reference them in some JS in my view as follows: dateLabels = <%= raw @mapped_dates.to_json %>;...

## Map with accumulator on an array

ruby,inject
I'm looking to create a method for Enumerable that does map and inject at the same time. For example, calling it map_with_accumulator, [1,2,3,4].map_with_accumulator(:+) # => [1, 3, 6, 10] or for strings ['a','b','c','d'].map_with_accumulator {|acc,el| acc + '_' + el} # => ['a','a_b','a_b_c','a_b_c_d'] I fail to get a solution working. I...

## How to handle backslash “\” escape characters in q string and heredocument

ruby
Ruby Newbie here. I do not understand why Ruby looks inside %q and escapes the \. I am using Ruby to generate Latex code. I need to generate \\\hline which is used in Latex for table making. I found \\\hline as input generated \hline even though the string was inside...

## Ruby: How to copy the multidimensional array in new array?

ruby-on-rails,arrays,ruby,multidimensional-array
seating_arrangement [ [:first, :second, :none], [:first, :none, :second], [:second, :second, :first], ] I need to copy this array into new array. I tried to do it by following code: class Simulator @@current_state def initialize(seating_arrangement) @@current_state = seating_arrangement.dup end But whenever I am making any changes to seating_arrangement current_state changes automatically....

## Call method to generate arguments in ruby works in 1.8.7 but not 1.9.3

ruby-on-rails,ruby,ruby-1.9.3
This is something that I had working in ruby 1.8.7, but no longer works in 1.9.3, and I am not sure what changes make this fail. Previously, I had something like this myFunction(submitArgs()) where submitArgs was a helper method that could be called with some options def submitArgs(args={}) #Some logic/manipulations...

## Rails shared controller actions

ruby-on-rails,ruby,ruby-on-rails-4
I am having trouble building a controller concern. I would like the concern to extend the classes available actions. Given I have the controller 'SamplesController' class SamplesController < ApplicationController include Searchable perform_search_on(Sample, handle: [ClothingType, Company, Collection, Color]) end I include the module 'Searchable' module Searchable extend ActiveSupport::Concern module ClassMethods def...

## is there an equivalent of the ruby any method in javascript?

javascript,arrays,ruby,iteration
Is there an equivalent of ruby's any method for arrays but in javascript? I'm looking for something like this: arr = ['foo','bar','fizz', 'buzz'] arr.any? { |w| w.include? 'z' } #=> true I can get a similar effect with javascript's forEach method but it requires iterating through the entire array rather...

ruby

## Rails basic auth not working properly

ruby-on-rails,ruby,authentication
I am building a small API that uses basic authentication. What I have done, is that a user can generate a username and password, that could be used to authenticate to the API. However I have discovered that it is not working 100% as intended. It appears that a request...

## Django: html without CSS and the right text

python,html,css,django,url
First of all, this website that I'm trying to build is my first, so take it easy. Thanks. Anyway, I have my home page, home.html, that extends from base.html, and joke.html, that also extends base.html. The home page works just fine, but not the joke page. Here are some parts...

## How to get return value from a forked / spawned process in Ruby?

ruby,process,output,fork,spawn
My simple test program: pid = Process.spawn("sleep 10;date") How can I place the output (eg stdout) of the "date" command in a variable when it is available? I don't want to use a file for the data exchange....

## Iterating over EncryptedDataBagItem in Chef Recipe

ruby,json,chef,devops
I would like to decrypt a chef data bag item (named passwords) and store all of its attributes in a temporary JSON file which is read (and then deleted) by a node.js app. Is there a way to iterate over attributes of a data bag ITEM and get their values?...

## What is the semantic HTML tag to display for URLs that are not links?

html,html5,url,tags,semantics
I have a search engine plugin that outputs results in a basic structure: <a href="page url"> <div class="searchResult"> <h3>Page title</h3> <p>Page url</p> <p>Page <meta> description</p> </div> </a> The plugin is designed to integrate into an existing website so it has no default styling. However, websites are supposed to be usable...

## Ruby gsub group parameters do not work when preceded by escaped slashes

ruby,regex
I am trying to perform a trivial substitution, that in any other language I have come across, work as per the documentation. However, my substitution fails for some reason. The documentation examples list: "hello".gsub(/[aeiou]/, '*') #=> "h*ll*" "hello".gsub(/([aeiou])/, '<\1>') #=> "h<e>ll<o>" "hello".gsub(/./) {|s| s.ord.to_s + ' '} #=> "104 101...

## Rails less url path change

ruby-on-rails,ruby,url,path,less
Developing a Rails application with the less-rails gem I found something unusual : // app/assets/common/css/desktop/typo.less @font-face{ font-family:'SomeFont'; src:url("fonts/db92e416-da16-4ae2-a4c9-378dc24b7952.eot?#iefix"); // ... } The requested font is app/assets/common/css/fonts/db92e416-da16-4ae2-a4c9-378dc24b7952.eot This font is compiled with less and the results is : @font-face { font-family: 'SomeFont'; src: url("desktop/fonts/db92e416-da16-4ae2-a4c9-378dc24b7952.eot?#iefix"); //... } Do you know why is...

## rails - NameError (undefined local variable or method while using has_many :through

ruby-on-rails,ruby,ruby-on-rails-4
My rails app gives following error: NameError (undefined local variable or method 'fac_allocs' for #): app/models/room.rb:4:in '' app/models/room.rb:1:in '' app/controllers/rooms_controller.rb:3:in 'index' room.rb file class Room < ActiveRecord::Base has_many :bookings has_many :fac_allocs has_many :facs, :through => fac_allocs end ...

## In Ruby how to put multiple lines in one guard clause?

ruby-on-rails,ruby
I have the following line of code : if params[:"available_#{district.id}"] == 'true' @deliverycharge = @product.deliverycharges.create!(districtrate_id: district.id) delivery_custom_price(district) end Rubocop highlight it and asks me to use a guard clause for it. How can I do it? EDIT : Rubocop highlighted the first line and gave this message Use a guard...