FAQ Database Discussion Community


Is there a way of iterating through a specific XML tag in Ruby?

ruby-on-rails,ruby,xml,nokogiri,rexml
Is it possible to iterate over a specific XML tag in Ruby? In my case I want iterate over the desc tag in the following XML code: <desc> <id>2408</id> <who name="Joe Silva">[email protected]</who> <when>Today</when> <thetext>Hello World</thetext> </desc> <desc> <id>2409</id> <who name="Joe Silva2">[email protected]</who> <when>Future</when> <thetext>Hello World Again</thetext> </desc> So far, here is...

Parsing with Nokogiri - can't iterate over rows

ruby,parsing,nokogiri
For some reason this code is not working: url = "http://www.ontariocourts.ca/decisions_index/2015.htm" doc = Nokogiri::HTML(open(url)) doc.css("table.judtbl tr").each do |i| title = i.at_css(".title p").content citation = i.at_css(".citation p").content p title p citation end I have been going nuts trying to figure out why. Please help me someone!! Why can't this iterate over...

Loop over all the
tags and extract specefic information via Mechanize/Nokogiri

html,ruby,nokogiri,mechanize
I know the basic things of accessing a website and so (I just started learning yesterday), however I want to extract now. I checked out many tutorials of Mechanize/Nokogiri but each of them had a different way of doing things which made me confused. I want a direct bold way...

Parsing HTML with Nokogiri not all tags are present

ruby,parsing,nokogiri
There is this dictionary: Russian dictionary In ruby I am trying to get the url of the next page - ">>" which is <a href="m.exe?a=110&sc=4&recno=3506774&dict=&l1=1&l2=2">>></a> When inspecting this element in browser, it is there and it is present. However, using link = "http://www.multitran.ru/c/m.exe?a=110&sc=4&recno=3506179&dict=&l1=1&l2=2" page = Nokogiri::HTML(open(link)) puts "#{page}" The link...

How to decode a string UTF-8 in ruby?

ruby,character-encoding,nokogiri
I'm parsing an xml file in ruby (file.rb) but my output doesn't work properly even if I encode the string in UTF-8 or "ISO-8859-1". Any clue or can I set my encoding? gist require 'test/unit' require 'nokogiri' class MyTest < Test::Unit::TestCase def test_sentence doc = Nokogiri::Slop <<-EOXML <?xml version='1.0' encoding='utf-8'?>...

How to get text element from a Nokogiri::XML::NodeSet?

ruby,xml,parsing,nokogiri
I am parsing some XML structure item which looks as follows: <customfield id="customfield_10004" key="com.atlassian.jira.plugin.system.customfieldtypes:float"> <customfieldname>Yada yada</customfieldname> <customfieldvalues> <customfieldvalue>8.0</customfieldvalue> </customfieldvalues> </customfield> in the following manner: puts item.xpath(".//customfield[@id='customfield_10004']").css('customfieldvalue') This returns <customfieldvalue>8.0</customfieldvalue> of...

Nokogiri Tag with id and a normal field

ruby-on-rails,ruby,xml,nokogiri
I am trying to create a line in an xml file that looks like this <Type id="Standard">Economy 3-5 Business Days</Type>. So far I have only been able to make it look like <Type id="Standard" value="Economy 3-5 Business Days"/>. Maybe I missed it in the nokogiri docs, but I couldn't find...

Error installing nokogiri in ubuntu 14.0.4 (Ruby 1.8.7)

ruby,ubuntu,gem,install,nokogiri
I'm trying to install the bundle(bundle install) in Ubuntu 14.0.4 installed Ruby 1.8.7.And it fails to install the bundle and displayed the error: An error occurred while installing nokogiri (1.4.7), and Bundler cannot continue. Make sure that `gem install nokogiri -v '1.4.7'` succeeds before bundling. so now I tried to...

Nokogiri - find the value inside a javascript array

javascript,ruby,xml,nokogiri
I'm trying to scrap something using nokogiri, I want to get the value inside JavaScript array, like the value of 'b' in this code. <script> var foo = [bar, [a, b, c , d], value, some value, . . ] </script> I got the script block by using doc.search("script")[18].content, How...

Nokogiri, replace XML node contents

ruby,xml,xml-parsing,nokogiri
I'm using RUBY (NOT rails) to read an XML and then update a single node (if found) with a new value. I'm scanning my XML files looking for the node.. and if it's found, I've a big ????? this URL http://www.nokogiri.org/tutorials/modifying_an_html_xml_document.html is not obvious to me on how to change...

Getting specific element in Url using Nokogiri

ruby,web-scraping,nokogiri
I have this kind of html structure : <table class="list"> <tbody> <tr> <td> </td> <td> <a href="club.do?codeClub=01670001&millesime=2015"></a> </td> </tr> </tbody> </table> I want to get the link contained in the second <td> of each <tr> contained in the table that has the class list. Then actually in each Url I...

How to get html text with line break by Nokogiri

ruby,nokogiri
There is a html text like this: html = '<div class="foo"><span class="bar">text<br>with line break</span></div>' doc = Nokogiri::HTML(html) And I want to get the text text<br>with line break. Currently I'm using doc.css("span").to_html.match(/<span .+?>(.*)<\/span>/){ $1 } Is there simpler way to make it? If possible I want to avoid using regular expression....

How to specify multiple column by xpath

ruby,xpath,nokogiri
I want to get multiple table data from HTML like this: html = <<EOF <table> <tr> <td>1</td> <td>2</td> <td>3</td> </tr> <tr> <td>4</td> <td>5</td> <td>6</td> </tr> </table> EOF I want to get two data from it like: noko = Nokogiri::HTML(html) noko.xpath("//tr[1]/td[2]").text #=> "2" noko.xpath("//tr[1]/td[3]").text #=> "3" What I expect from this...

Adding a XML Element to a Nokogiri::XML::Builder document

ruby,xml,nokogiri
How can I add a Nokogiri::XML::Element to a XML document that is being created with Nokogiri::XML::Buider? My current solution is to serialize the element and use the << method to have the Builder reinterpret it. orig_doc = Nokogiri::XML('<root xmlns="foobar"><a>test</a></root>') node = orig_doc.at('/*/*[1]') puts Nokogiri::XML::Builder.new do |doc| doc.another { # FIXME:...

Can't install Nokogiri on CentOS with higher Ruby Version than required

ruby,centos,nokogiri,gitlab
Below is what I tried on my CentOS6.5 server. I have already updated Git and Ruby, but still can't get it to install. The first section is where I tried to install it as a sudo user. The second section is when I Tried to install it by root. Can...

How can I use Nokogiri to get value inside element with certain id?

ruby,xml,parsing,nokogiri
I have an element structure consisting of item nodes that I parse like this with Nokogiri: @xml.css('item').each do |item| # do something end Now the item has a part which looks like this 9every item has this element with the id below): <customfield id="customfield_10004" key="com.atlassian.jira.plugin.system.customfieldtypes:float"> <customfieldname>Yada yada</customfieldname> <customfieldvalues> <customfieldvalue>8.0</customfieldvalue> </customfieldvalues>...

Got the right node with Nokogiri, but need to search further

ruby-on-rails,ruby,nokogiri,scrape
I am using this. doc = Nokogiri::HTML(open(url)) pic = doc.search "[text()*='hiRes']" to get this script node: <script type="text/javascript"> var data = { 'colorImages': { 'initial': [{"hiRes":"http://ecx.images-joes.com/images /I/71MBTEP1W9L._UL1500_.jpg","thumb":"http://ecx.images-joes.com/images /I/41xE2XADIvL._US40_.jpg","large":"http://ecx.images-joes.com/images /I/41xE2XADIvL.jpg","main":{"http://ecx.images-joes.com/images /I/71MBTEP1W9L._UX395_.jpg":[395,260],"http://ecx.images-joes.com/images...

Can I get Nokogiri to scrape text from span in Ruby?

html,ruby,web-scraping,nokogiri,curb
I'm trying to scrape info from a website using nokogiri and curb, but I can't seem to find the right name/title to find out where to scrape (I'm trying to scrape the api key, which is at the bottom of the html code as "xxxxxxx") or even how to, please...

How do you parse a HTML table representing time?

ruby,parsing,nokogiri
I am attempting to parse this HTML table representing a year's worth of temperature data, provided by an Australian government website. This table is set up in an unusual way: the columns are months, and the rows are days of the month (so the first row's cells are JAN 1,...

Controlling Vagrant plugin dependencies via Ansible

ruby,vagrant,ubuntu-12.04,nokogiri,ansible
I have an Ansible playbook that installs Vagrant, and then instructs Vagrant to install a specific plugin. Vagrant has trouble installing a gem it needs, and says: An error occurred while installing nokogiri (1.6.6.2), and Bundler cannot continue. Make sure that `sudo gem install nokogiri -v '1.6.6.2'` succeeds before bundling....

How to extract article content from a website/blog

ruby-on-rails,ruby,web-scraping,nokogiri
I'm trying to write a generic function for extracting article text from blog posts and websites. A few simplified examples I'd like to be able to process: Random website: ... <div class="readAreaBox" id="readAreaBox"> <h1 itemprop="headline">title</h1> <div class="chapter_update_time">time</div> <div class="p" id="chapterContent">article text</div> </div> ... Wordpress: <div id="main" class="site-main"> <div id="primary" class="site-content"...

Parsing HTML document

ruby,nokogiri
I am trying to parse the following HTML using Ruby and Nokogiri: <div class="vevent"> <table width="750"><tr> <td width="25"> </td> <td valign="top" width="200"> <font size="2" face="sans-serif"> <font color="black"><b>June 30, 2015</b></font> <br> <span class="dtstart"><span class="value-title" title="2015-06-30"></span></span><br><span class="summary"><font color="#92161" size="3"><b>Band...

Nokogiri not parsing XML in ruby - xmlns issue?

ruby,xml,web-services,nokogiri
Given the following ruby code : require 'nokogiri' xml = "<?xml version='1.0' encoding='UTF-8'?> <ProgramList xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xmlns:xsd='http://www.w3.org/2001/XMLSchema' xmlns='http://publisher.webservices.affili.net/'> <TotalRecords>145</TotalRecords> <Programs> <ProgramSummary> <ProgramID>6540</ProgramID> <Title>Matalan</Title> <Limitations>A bit of text </Limitations>...

Nokogiri Scraping In Rails

ruby-on-rails,ruby,nokogiri
So I have this code in my index action, would love to move it to a model, just a little confused on how to do it. Original Code def index urls = %w[http://cltampa.com/blogs/potlikker http://cltampa.com/blogs/artbreaker http://cltampa.com/blogs/politicalanimals http://cltampa.com/blogs/earbuds http://cltampa.com/blogs/dailyloaf http://cltampa.com/blogs/bedpost] @final_images = [] @final_urls = [] urls.each do |url| blog = Nokogiri::HTML(open(url))...

Scraping multiple table row siblings with Nokogiri

ruby,nokogiri
I’m trying to parse a table with the following markup. <table> <tr class="athlete"> <td colspan="2" class="name">Alex</td> </tr> <tr class="run"> <td>5.00</td> <td>10.00</td> </tr> <tr class="run"> <td>5.20</td> <td>10.50</td> </tr> <tr class="end"></tr> <tr class="athlete"> <td colspan="2" class="name">John</td> </tr> <tr class="run"> <td>5.00</td> <td>10.00</td>...

net/http automatically redirects webpage to another language

ruby,web-scraping,nokogiri,net-http,open-uri
I'm trying to scrape the data of webpage "https://www.zomato.com/grande-lisboa/fu-hao-massamá" using open-uri. But, the website is automatically redirecting it to "https://www.zomato.com/pt/grande-lisboa/fu-hao-massamá". I don't want the spanish version. I want the english one. How do I tell ruby to stop doing that. Please help...

Unable to install Nokogiri with dependencies on Redhat Linux

ruby-on-rails,ruby,nokogiri,libxml2,libxslt
I am using Ruby 1.9.3 and therefore have to use an older version of Nokogiri. I need to install Nokogiri v1.5.10. Initially I got an error that libxml2 is missing. After installing libxml2 I got the following error: ERROR: Error installing nokogiri-1.5.10.gem: ERROR: Failed to build gem native extension. /opt/ruby-1.9.3/bin/ruby...

Scraping table data with Nokogiri using style attribute Ruby

html,ruby,parsing,nokogiri
I want to scrape the text value of the tds in this webpage ex: 0.2197 British Pound <table border=1 cellpadding=5 cellspacing=0 style="font-weight: normal; font-size: 10.5;"><tr><td width=50>1791</td><td>0.2195 British Pound</td></tr><tr><td width=50>1792</td><td>0.2239 British Pound</td></tr><tr><td width=50>1793</td><td>0.2218 British Pound</td></tr><tr><td width=50>1794</td><td>0.2106...

Mechanize search unable to find CSS selector (it's definitely present)

ruby,css-selectors,nokogiri,mechanize
I have a long CSS selector that works perfectly fine when actually used in CSS, jQuery etc. But this very same selector will not work on a Mechanize::Page object - it simply returns an empty array. The selector targets a paragraph and in my other case a header1. I also...

Error installing Nokogiri on bundle install but already installed

ruby-on-rails,ruby,nokogiri,bundler
I'm having issues with bundling my Gemfile. I have Nokogiri installed already yet when I run bundle install it fails to load Nokogiri. Installing Nokogiri: gem install nokogiri Building native extensions. This could take a while... Successfully installed nokogiri-1.6.6.2 Parsing documentation for nokogiri-1.6.6.2 Done installing documentation for nokogiri after 2...

Rails nokogiri parse XML file

ruby-on-rails,ruby,xml,xpath,nokogiri
I'm a little bit confused: could not find in web good examples of parsing xml with nokogiri... example of my data: <?xml version="1.0" encoding="UTF-8"?> <root> <rows SessionGUID="6448680D1"> <row> <AnalogueCode>0451103079</AnalogueCode> <AnalogueCodeAsIs>0451103079</AnalogueCodeAsIs> <AnalogueManufacturerName>BOSCH</AnalogueManufacturerName> <AnalogueWeight>0.000</AnalogueWeight> <CodeAsIs>OC90</CodeAsIs>...

How to scrap data from a website using Nokogiri

ruby,nokogiri
When I try to scrape the table data from the following link it displays nothing.. ` I write the following code but it gives nothing..I want the table data i.e last Update, weather, temperature from that link which is i given please help me.. url = "http://w1.weather.gov/xml/current_obs/KM89.xml" docs = Nokogiri::HTML(open(url))...

Map two Nokogiri objects

ruby,nokogiri,mechanize
A quick question: <table> <tr> <th>foo</th> <td><p>bar</p></td> </tr> </table> details = doc.css('table > tr > th') details2 = doc.css('table > tr > td > p') details = details.map { |n| { name: n.text }} details2 = details2.map { |n| { value: n.text }} How can I merge those Nokogiri objects...

Xpath - How to navigate to a value (Ruby Nokogiri)

ruby,xml,xpath,nokogiri
If I want to grab a currencies rate, say "USD", given a certain time, say "2015-02-09", how would I go about doing this? I tried the following: /gesmes:Envelope/def:Cube/def:Cube[@time="2014-11-19"]/def:Cube[@currency="USD"]/@rate Though I suppose due a lack of understanding this is wrong, well at least, I know it is wrong because Nokogiri does...

Parse an HTML table with Nokogiri in Ruby

html,ruby,web-scraping,nokogiri
I have an HTML table that looks like the following: <table id="TTdata" border="0" cellspacing="0" cellpadding="3" align="center"> <tbody> <tr class="TTdata_ltblue"> <td class="ctr"><b>#</b></td> <td class="ctr"><b><a href="http://www.baseballprospectus.com/sortable/index.php?cid=1819124&amp;newsort1column=YEAR">YEAR</a><img src="/images/up.gif"></b></td> <td class="ctr" title="Player's name."><b><a...

How to make a parser for a web crawler maintainable

ruby,web-crawler,nokogiri
I wrote a Ruby web-crawler that retrieves data from a third-party website. I am using Nokogiri to extract information based on a specific CSS div and specific fields (accessing children and elements of the nodes I extract). From time to time, the structure of the third-party website changes which breaks...

Nokogiri and creating a hash from two items - Error retrieves all as string

ruby,hash,nokogiri
I am trying to retrieve two elements both with nine responses from an xml file with Nokogiri. Retrieval itself isn't hard: number = @doc.xpath('//Race/@RaceNumber').text # => "123456789" But it selects it as one string. And this is the other reduced for brevity: name = @doc.css('NameRaceFull') # => [#<Nokogiri::XML::Element:0x1775154 name="NameRaceFull" attributes=[#<Nokogiri::XML::Attr:0x1774448...

href does not want to get printed although I followed the path

html,ruby,xpath,nokogiri,mechanize
I want to enter a link in a webpage. This is its location in the inspect element: As You can see, to reach the link in the < body >, I have to pass through: 1) < div class = "container" > 2) < div id = "result content >...

Ruby: Extract and operate on partially extracted Nokogiri objects

ruby,xpath,nokogiri
require 'nokogiri' xml = DATA.read xml_nokogiri = Nokogiri::XML.parse xml widgets = xml_nokogiri.xpath("//Widget") dates = widgets.map { |widget| widget.xpath("//DateAdded").text } puts dates __END__ <Widgets> <Widget> <Price>42</Price> <DateAdded>04/22/1989</DateAdded> </Widget> <Widget> <Price>29</Price> <DateAdded>02/05/2015</DateAdded> </Widget> </Widgets> Notes: This is a contrived example I cooked up as its very inconvenient to post the actual code...

Ruby + Nokogiri - How to filter by date stored within XML element?

ruby,nokogiri
Given the following XML sample. <Widgets> <Widget> <Price>29</Price> <DateAdded>02/05/2015</DateAdded> </Widget> </Widgets> I'm trying to find all widgets added in the last 7 days. I tried the following: widgets.xpath("//Widget[DateAdded[text()>\"#{7.days.ago}\"]]") and got no love. Tried to be clever and did: widgets.xpath("//Widget[(DateAdded[DateTime.parse(text())>\"#{7.days.ago}\"]]") to no avail (not surprisingly, because it was a long shot!)....

Nokogiri help XPath not working

ruby,xpath,nokogiri
Trying to pull one tiny bit of text from very large HTML doc. However no matter what method of striping the HTML to get to the text I want, it still pulls all the HTML. The part of the the HTML I am trying to pull is below. All I...

Get parent node where child = x

ruby,arrays,nokogiri
I want to return a 'parent' node from an XML source when I find a matching child. If I search for 'ddd': @doc.xpath('//item[contains(., "ddd")]') I want to return 'Section 2' I can't find documentation on 'where' type code for Nokogiri. Is this even possible?? <entry> <match> <field>Section 1</field> <child> <item>aaa</item>...

Can't install Nokogiri 1.4.3 gem

ruby-on-rails,rubygems,rvm,nokogiri
Can't install nokogiri 1.4.3 gem. Nokogiri 1.6.6.2 installs without a problem. Using latest RVM on Ubuntu. [email protected]:~$ gem install nokogiri -v '1.4.3' Fetching: nokogiri-1.4.3.gem (100%) Building native extensions. This could take a while... ERROR: Error installing nokogiri: ERROR: Failed to build gem native extension. /home/pm/.rvm/rubies/ruby-2.2.1/bin/ruby -r ./siteconf20150519-7580-2yzgsg.rb extconf.rb *** extconf.rb...

Scraping successive pages until the last page using Nokogiri and Mechanize

ruby,web-scraping,nokogiri,mechanize
I am trying to scrape multiple pages from a website. I want to scrape a page, then click on next, get that page, and repeat until I hit the end. I wrote this so far: page = agent.submit(form, form.buttons.first) #submitting a form while lien = page.link_with(:text=>'Next') # while I have...

Recursive/nested yield

ruby,recursion,nokogiri,yield
I have the following couple of methods that edit a deeply nested XML structure using Nokogiri. I'd like to remove some of the boilerplate when drilling down into the structure, so I want to refactor these methods. Here are the methods def create_acl(acl_name, addresses) connection.rpc.edit_config do |x| # `x` is...

How to apply two templates and group the result by size and used template with xslt

xml,xslt,xslt-1.0,nokogiri
I am trying to parse an XML and produce an HTML which could be used for printing. The content of the elements in the XML are presented like cards and have information for the frontside as well as the backside. Eight cards would fit on a page. To make life...

'require': cannot load such file — 'nokogiri\nokogiri' (LoadError) when running `rails server`

ruby-on-rails,ruby,nokogiri
I'm running a clean install of Ruby 2.2.1 on Windows 8.1 with DevKit. After the installation I run: gem install rails rails new testapp cd testapp rails server leaving everything else at default. The process fails at the last line when instead of running the server, I get the error...

How to reject specify HTML tags by using css or xpath selector

javascript,css,ruby,xpath,nokogiri
I want to remove style and script tags and the contents of them by using css or xpath selector. This is a example HTML: <html> <head> <title>test</title> <style> // style </style> <script> /* some script */ </script> </head> <body> <p>text</p> <script> /* some script */ </script> <div>foo</div> </body> </html> I...

How to use “doc” tag in Nokogiri to build an XML document

ruby,nokogiri
I have a problem: I must build an XML document with a <doc> tag. I can use any custom tag except "doc". I need to use "doc". How can I fix this?...

Nokogiri on Windows with Ruby 2.1

ruby-on-rails,ruby,windows,nokogiri,ruby-2.1
When I try to run my application I get a nokogiri error (full trace below). I understand that nokogiri doesn't support Windows on ruby 2.2, but I'm using 2.1.5 so it seems like it shouldn't be a problem. The gem installs perfectly when I do gem install -v 1.6 so...

Find Value by position on plain HTML in ruby

ruby-on-rails,ruby,ruby-on-rails-3,ruby-on-rails-3.1,nokogiri
My Html file is not having any classes . I am trying to get the no. from the plain Html <html> <head></head> <body> PO Number : [4587958] </body> </html> I am able to find out the PO Number test by using require 'rubygems' require 'nokogiri' PAGE_URL = "a.html" page =...

Add prefix to XML root node

ruby,xml,nokogiri,xml-builder
I am using Nokogiri to generate XML. I would like to add a namespace prefix to the XML root node only but the problem is applying the prefix to the first element it get apply to all children elements. This is my expected result: <?xml version="1.0" encoding="UTF-8"?> <req:Request xmlns:req="http://www.google.com" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"...

How I can add an element?

ruby,nokogiri
I'm doing this: targets = @xml.xpath("./target") if targets.empty? targets << Nokogiri::XML::Node.new('target', @xml) end However the @xml is still without my target. What can I do in order to update the original @xml?...

Ruby, Nokogiri: how do i ensure UTF8 throughout nokogiri parsing, erb template, and encoding HTML file

html,ruby,parsing,utf-8,nokogiri
I finally managed to parse parts of a website: g et '/' do url = '<website>' data = Nokogiri::HTML(open(url)) @rows = data.css("td[valign=top] table tr") erb :muster end Now I am trying to extract a certain line in my view. Therefore i put in my HTML code: <%= @rows[2] %> And...

XML parsing using Ruby for provided URL

ruby,xml-parsing,nokogiri,libxml-ruby
I am trying to use Nokogiri to parse my XML which is I am getting from an URL, but I am not able to create an array of it so that it would be accessible all over the project. My XML: <component name="Hero"> <topic name="i1"> <subtopic name=""> <links> <link Dur=""...

Seach by class in Nokogiri nodeset

ruby,xpath,nokogiri
I got the name of a CSS class from a Nokogiri node. Now I want to find all the nodes that also have the same class attached. I don't know which HTML tag the element that I'm looking for has, and how deep it is. All i know is what...

Net/HTTPS not getting all the content

ruby,web-crawler,nokogiri,net-http,mechanize-ruby
I need to login into Jenkins through a crawler to collect some data, but Net/HTTPS gets an incomplete page in comparison to Jenkins' source, here are both sources: Net/HTTPS' HTML <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> <meta http-equiv="refresh" content="1;url=/login?from=%2F"> <script> window.location.replace('/login?from=%2F'); </script>...

using variable in xpath on ruby with nokogiri

ruby-on-rails,ruby,xpath,nokogiri,open-uri
require 'nokogiri' require 'open-uri' 1.upto(10) do |x| url = TOPSECRET page = Nokogiri::HTML(open(url)) title = page.xpath('//span[@class="tit"][#{x}]').inner_html puts "#{x}, #{title}" end the error occurs [#{x}] <= here how can I fix this?...

How can I copy nodes from one xml file to another, using Nokogiri?

ruby,xml,nokogiri
I am trying to do the following: I have the following xml_1 file, which I generated. <document> <TITLE>Computer Parts</TITLE> <header> <ITEM>Motherboard</ITEM> <MANUFACTURER>ASUS</MANUFACTURER> <MODEL>P3B-F</MODEL> <COST> 123.00</COST> </header> <part1> <ITEM>Video Card</ITEM> <MANUFACTURER>ATI</MANUFACTURER> <MODEL>All-in-Wonder Pro</MODEL> <COST> 160.00</COST> </part1> ..... <part5>...

Searching a list of similar elements for a given value

xpath,nokogiri
I have some XML like this: <Team ...some attributes...> <Name>My Team</Name> <Player uID="player1"> <Name>Name</Name> <Position>Goalkeeper</Position> <Stat Type="first_name">Name</Stat> <Stat Type="last_name">Last</Stat> <Stat Type="birth_date">bday</Stat> <Stat Type="birth_place">bplace</Stat> <Stat Type="weight">84</Stat> <Stat Type="height">183</Stat> <Stat Type="jersey_num">1</Stat>...

how would i make this work with nokogiri xml builder

ruby,xml,nokogiri
so starting with a specific line, it seems to cause errors in the lines below it, sorry the question seemed vague, i can't articulate what i am doing well on this one. builder = Nokogiri::XML::Builder.new do |xml| xml.send('document-id' => '{some-fake-id}', 'type' => 'documentType', 'iso-code' => 'BP', 'training' => 'false', 'send-type'...

Wait for selector to present

ruby,nokogiri,net-http
When doing web scraping with Nokogiri I occasionally get the following error message undefined method `at_css' for nil:NilClass (NoMethodError) I know that the selected element is present at some time, but the site is sometimes a bit slow to respond, and I guess this is the reason why I'm getting...

Ruby character encoding issue with scraped HTML

ruby,character-encoding,nokogiri
I'm having a character encoding issue with a Ruby script that does some HTML scraping and parsing with the Nokogiri gem. At one point in the script, I call join("\n") on an array of strings that have been pulled from some HTML, which causes this error: ./script.rb:333:in `join': incompatible character...

how to save StringIO (pdf) data into file

ruby,nokogiri
I want to save pdf file which is located in external remote server with ruby. The pdf file is coming in StringIO. I tried saving the data with File.write but it is not working. I received the below error . ArgumentError: string contains null byte How to save now ?...

How to parse and view properly with nokogiri?

ruby-on-rails,ruby,nokogiri
I am having problems to view my xml data correctly. Here is my code in controller, and view. def new doc = Nokogiri::XML(open('sample3.xml')) @link = doc.xpath('//match').map do |t| { 'home' => t.at('home').attr('name') } end @odds = doc.xpath('//odds/type[@name="1x2"]/bookmaker[@id="781"]').map do |t| { '1' => t.at('odd[@name="1"]').attr('value'), '2' => t.at('odd[@name="2"]').attr('value'), 'X' => t.at('odd[@name="X"]').attr('value') }...

How to get images from a saved html page

html,ruby,html-parsing,nokogiri
I have a huge amount of saved HTML pages in my PC. I had parsed the the HTML page and got the image src. I need to store the images in every HTML page in a specific structure in separate directory. I tried out NET::HTTP.get but i am getting a...

How can I make all XML tags lowercase in Nokogiri?

ruby,xml,nokogiri
I'm parsing some XML that I get from various feeds. Apparently some of the XML has an occasional tag that is all upper case. I'd like to normalize the XML to be all lower case tags to make searching, etc. easier. What I want to do is something like: parsed...

Ruby - Find Tag by ID

ruby,web-scraping,nokogiri
I'm using mechanize and nokogiri. I'm trying to find this tag. When I inspect the HTML it looks like this. <table class="matchupBox" id="MLB_5_block "> When I print it out in my console it looks like this #<Nokogiri::XML::Element:0x2cc1a1c name="table" attributes=[ #<Nokogiri::XML::Attr:0x2cc1940 name="class" value="matchupBox">, #<Nokogiri::XML::Attr:0x2cc192c name="id" value="MLB_5_block\r\n ">] I am using this...

Ingore tag using html-proofer with jekyll

ruby,continuous-integration,nokogiri,jekyll,jekyll-extensions
I have a site hosted with github pages built using Jekyll. One of the plugins I have installed in html-proofer. This was working fine until I switched my images to use picturefill. By using Picturefill, I am using the currently invalid <picture> tag. This causes html-proofer to fail when I...

Scraping with Nokogiri::HTML - Can't get text from XPATH

html,ruby,parsing,xpath,nokogiri
I'm trying to scrape html with Nokogiri. This is the html source: <span id="J_WlAreaInfo" class="wl-areacon"> <span id="J-From">山东济南</span> 至 <span id="J-To"> <span id="J_WlAddressInfo" class="wl-addressinfo" title="全国"> 全国 <s></s> </span> </span> </span> I need to get the following text: 山东济南 Checked shortest XPATH with firebug: //*[@id="J-From"] Here is my ruby code: doc =...

Nokogiri parsing missing element create issue

ruby-on-rails,ruby,ruby-on-rails-3,ruby-on-rails-4,nokogiri
I am having Plain html doc NO CSS . In which some of the content i need to pass to excel sheet. I tried with Nokogiri it works on Css basis. Do anybody tried this thing. <html> <head></head> <body> ***NOTE*** <br> Items <br> <br> Invoice Number : [78945824] PO Number...

Using Sidekiq with Nokogiri for scraping

ruby-on-rails,nokogiri,delayed-job,sidekiq
I'm using Rails with Nokogiri. I have some heavy scraping tasks that I would like to execute in the background with Sidekiq. The problem is, I followed the three steps mentioned on sidekiq.org but nothing happened. What am I missing? What follows is one of my scrapes without using Sidekiq,...

How to convert partial XML to hash in Ruby

ruby,xml,xml-parsing,nokogiri
I have a string which has plain text and extra spaces and carriage returns then XML-like tags followed by XML tags: String = "hi there. <SET-TOPIC> INITIATE </SET-TOPIC> <SETPROFILE> <KEY>name</KEY> <VALUE>Joe</VALUE> </SETPROFILE> <SETPROFILE> <KEY>email</KEY> <VALUE>[email protected]</VALUE> </SETPROFILE> <GET-RELATIONS> <COLLECTION>goals</COLLECTION> <VALUE>walk upstairs</VALUE> </GET-RELATIONS> So what do you think? Is it...

Saving to a database using a Nokogiri (json?) rake task

ruby-on-rails,ruby,json,nokogiri
RoR noob here! I have a rake task doing what I want to do I am just stuck on how to get the results saved to my language table. I want the results from this rake task to populate the values of the language field on my language table. I'm...

How to add the HTML contents of one node to another with Nokogiri

ruby,nokogiri
As the title says, I was wondering how to add the contents of one node to another so for example. Assume there is a node: <li> <a>I'm a link</a> <p>I'm a <b>paragraph</b></p> </li> And another node I want to add the contents of to the above: <p> <a>Link1</a> <a>Link2</a> <a>Link3</a>...