perl,curl,cookies , Efficient way to make thousands of curl requests


Efficient way to make thousands of curl requests

Question:

Tag: perl,curl,cookies

I am using CURL to make thousands of requests. In my code I set the cookie to a specific value and then read in the value on the page. Here is my Perl code:

#!/usr/bin/perl
my $site = "http://SITENAME/?id=";
my $cookie_name = "cookienum123";
print $fh "#\t\tValue\n";
for my $i ('1'..'10000') {
    my $output = `curl -s -H "Cookie: $cookie_name=$i" -L $site$i | grep -Eo "[0-9]+"`;
    print "$i\t\t$output\n";
}

So from 1 to 10000, I am setting cookienum123 to that value and reading in the whole response from the page. Then I use grep to just extract the #. The code I have now works fine but I am wondering if there is a faster or more efficient way I can do this.

Please note this does not have to be done as a Perl script (I can also use Windows batch file, Unix shell script, etc).

Edit Jan 18: Added bounty with the note "The desired answer should include a way in Perl to run through several thousand curl requests simultaneously but it needs to be run faster than the rate it is currently running at. It has to write the output to a single file in the end but the order does not matter." Some of the below comments mention fork but I am not sure how to apply it to my code. I am very new to Perl as this is my first program in it.


Answer:

What you have here is an embarrassingly parallel problem. These are great for parallelising, because there's no inter-thread dependency or communication needed.

There's two key ways of doing this in perl - threading or forking. I would generally suggest thread based parallel processing for the kind of thing you're doing. This is a matter of choice, but I think it's better suited for collating information.

#!/usr/bin/perl

use strict;
use warnings;

use threads;
use Thread::Queue;

my $numthreads = 20;

my $site        = "http://SITENAME/?id=";
my $cookie_name = "cookienum123";

my $fetch_q   = Thread::Queue->new();
my $collate_q = Thread::Queue->new();


#fetch sub sits in a loop, takes items off 'fetch_q' and runs curl. 
sub fetch {
    while ( my $target = $fetch_q->dequeue() ) {
        my $output =
            `curl -s -H "Cookie: $cookie_name=$target" -L $site$target | grep -Eo "[0-9]+"`;
        $collate_q->enqueue($output);
    }
}

#one instance of collate, which exists to serialise the output from fetch. 
#writing files concurrently can get very messy and build in race conditions. 
sub collate {
    open( my $output_fh, ">", "results.txt" ) or die $!;
    print {$output_fh} "#\t\tValue\n";

    while ( my $result = $collate_q->dequeue() ) {
        print {$output_fh} $result;
    }
    close($output_fh);
}


## main bit:

#start worker threads
my @workers = map { threads->create( \&fetch ) } 1 .. $numthreads;

#collates results. 
my $collater = threads->create( \&collate );

$fetch_q->enqueue( '1' .. '10000' );
$fetch_q->end();

foreach my $thr (@workers) {
    $thr->join();
}

#end collate_q here, because we know all the fetchers are 
#joined - so no more results will be generated. 
#queue will then generate 'undef' when it's empty, and the thread will exit. 
$collate_q->end;

#join will block until thread has exited, e.g. all results in the queue
#have been 'processed'. 
$collater->join;

This will spawn 20 worker threads, that'll run in parallel, and collect results as they exit to a file. As an alternative, you could do something similar with Parallel::ForkManager, but for data-oriented tasks, I personally prefer threading.

You can use the 'collate' sub to postprocess any data, such as sorting it, counting it, whatever.

I would also point out - using curl and grep as system calls isn't ideal - I've left them as is, but would suggest looking at LWP and allowing perl to handle the text processing, because it's pretty good at it.


Related:


Create unicode character with pack


perl,unicode
I am trying to understand how Perl handles unicode. use feature qw(say); use strict; use warnings; use Encode qw(encode); say unpack "H*", pack("U", 0xff); say unpack "H*", encode( 'UTF-8', chr 0xff ); Output: ff c3bf Why do I get ff and not c3bf when using pack ?...

What certificates does 'curl' use by default?


curl,certificate
What certificates does 'curl' use by default? Example: curl -I -L https://cruises.webjet.com.au fails on Ubuntu 15.04 with curl: (60) SSL certificate problem: unable to get local issuer certificate But when I add the root certificate (see https://www.ssllabs.com/ssltest/analyze.html?d=cruises.webjet.com.au&latest) and run curl -I -L --cacert downlaodedCert.pem https://cruises.webjet.com.au everything is fine. So I...

Perl would I use fc over uc?


perl
When would you ever need to use fc(), when would uc() ever fail? Perl fc documentation...

How to match and remove the content preceding it from a file in unix [closed]


mysql,perl,sed,solaris
I have a mysql dump file, and i want to remove the content of the file after "-- Final view structure for view view_oss_user" using sed/perl. The input file is something like this : Content : rom `target` */; /*!50001 SET character_set_client = @saved_cs_client */; /*!50001 SET character_set_results = @saved_cs_results...

Get ISO DateTime with only core modules in Perl?


perl
I would like to get a date-time string such as 2015-06-17 10:20:34 with only core modules. The reason of this is that cpan install DateTime takes ages on my machine because of the tests and the fetch of all the dependencies. If I give my Program to my colleagues. They...

Perl Debugging Using Flags


perl,debugging,script-debugging
So my goal is to find an easy way to turn on print statements in Perl a flag. In C/C++ you can use a #define to choose if certain code is run and it is a way to turn on and off debug print statements. Where if a #define DEBUG...

How do I silence the HEAD of a curl request while using the silent flag?


bash,shell,curl,command-line,pipe
When I run the curl command and direct the data to a file, I get back the content of the site as expected. $ curl "www.site.com" > file.txt $ head file.txt Top of site ... However, this command shows a progress bar, which I do not want: % Total %...

Why this exclusion not working for long sentences?


text-processing,perl
Command perl -ne 'print unless /.[240,]/' input.txt > output.txt which includes some sentences which are longer than 240 letters. Why? Example data Development of World Funny Society program on young people who are working hard for the sport and social life such that they have time to go pizzeria every...

Perl - an array content


arrays,perl
Can you explain me how to check if an element belongs to array? My script needs to know whether the element has wanted extension to make a shortcut and copy it to another directory. Here is an example: my @array = qw(avi mp4 mov); my $dir = "E:\Downloads"; opendir (my...

-M Script start time minus file modification time, in days


perl,perldoc
I would expect -M $_ to be negative, but it is zero for: perl -E 'qx(touch $_), sleep(5), say -M for "/tmp/file"' Does perldoc mentions such behavior?...

Counting occurrences of a word in a string in Perl


regex,perl
I am trying to find out the number of occurrences of "The/the". Below is the code I tried" print ("Enter the String.\n"); $inputline = <STDIN>; chop($inputline); $regex="\[Tt\]he"; if($inputline ne "") { @splitarr= split(/$regex/,$inputline); } [email protected]; print $scalar; The string is : Hello the how are you the wanna work on...

Server-Sent Events Polling causing long delays


javascript,php,jquery,curl,server-sent-events
I have a connector that will call a RESP API using cURL and PHP. I need to call one method every second to check for new messages and then process them. I used the following 2 approaches to handle the messages AJAX Polling using SetInterval(): call the php script once...

incessantly getting null values for reduce function


curl,couchdb
My data is as follows: { "_id": "33d4d945613344f13a3ee929337b1ca8", "_rev": "1-427c691a5c5f504c6b1d885b6b9ff4bc", "release": { "genres": { "genre": "Electronic" }, "identifiers": { "identifier": [ { "description": "Text", "value": "5 021603 054028", "type": "Barcode" }, { "description": "String", "value": 5021603054028, "type": "Barcode" }, { "value": "MAYKING WAP54CD", "type": "Matrix / Runout" } ] },...

Perl : Display perl variable awk sed echo


perl
When I am using below command directly its working fine but when I am trying to put this in perl script its giving lots of error. my $calculate = `echo "$value" | awk -F "SP=" '{print $2}' | awk -F ";" '{print $1}' | awk -F ":" '{print $2}' |...

How to execute POST using CURL


post,curl,asp-classic
How to execute POST using CURL, i have this basic .asp that receives the data via POST and show the result in .JSON format. <%@LANGUAGE="VBSCRIPT" CODEPAGE="65001"%> <% Response.ContentType = "application/json" Response.Write("{ ""responseCode"": " + Request("responseCode") + ", ""publication_id"": " + Request("publication_id") + ", ""version"": " + Request("version") + "}") %>...

unable to understand qr interpolation


regex,perl
I was reading Programming Perl where I learned qr interpolation of strings as regex as : $re = qr/my.STRING/is; print $re; # prints (?si-xm:my.STRING) and it says The /s and /i modifiers were enabled in the pattern because they were supplied to qr//. The /x and /m, however, are disabled...

Office 365 unified api Object reference not set to an instance of an object


php,curl,header,office365
I'm trying to use the new office365 unified api to query the users list and user file. I've created the application in azure management portal, and I gave the permission to the new api application (with the directory and files read) I've created both a client and a webapi application,...

Permission denied Setuptools


python,django,curl,setuptools
I'm trying install setuptools in my Mac, but when I run command curl https://bootstrap.pypa.io/ez_setup.py -o - | python show a message telling: Processing setuptools-17.1.1-py3.4.egg Removing /Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/setuptools-17.1.1-py3.4.egg Copying setuptools-17.1.1-py3.4.egg to /Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages Adding setuptools 17.1.1 to easy-install.pth file error: [Errno 13] Permission denied:...

Capture tee's argument inside piped Perl execution


perl,unix
How to capture piped command's argument ? I use : perl my_script.pl -some_args | tee arg_filename How to get arg_filename 's value inside my_script.pl ? CONTEXT I need to send this filename in a mail which my_script.pl sends at the end. I need to use tee because we dump huge...

Slow CURL CentOS7 with “same” link


curl,dns,centos,hosts
I just installed CentOS7 (3.10.0-229.4.2.el7.x86_64) with nginx (1.8.0). Here my hosts file: [[email protected]_main1 ~]# cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 127.0.0.1 arm.site.com kpp.site.com w.site.com server.site.com And problem: [[email protected]_main1 ~]# time curl http://arm.site.com/test/fad/site/site?siteId=152 {"OK"} real 0m0.162s user 0m0.003s sys 0m0.003s [[email protected]_main1 ~]# time curl...

Windows/Linux child process STDIN differences


linux,windows,perl,process,stdin
I built a simple text processing script at work to be used by another program. When I was done, someone remembered that the script needs to not block STDIN/STDOUT for the tool using it to work right, and modified the script accordingly. The script opens *nix's cat in a subprocess...

How to use curl return value in php script


php,wordpress,curl,login
I have two servers: A where WP is installed on one, and B where it's not. Application run on B is trying to use WP credentials for login. I have a login form on server B: <h1>Login</h1> <div> <form class="forma" id="form" action="login.php" method="POST"> <div class="form-group"> <label>Username</label> <input class="form-control" type="text" name="username"...

json response handling issue


php,json,curl
Hi guys I stuck with retrieving json response below is the json output .I novice in this your help would be highly appreciated. { "productHeader" : { "totalHits" : 684 }, "products" : [ { "name" : "Victoria Hotels", "productImage" : { "url" : "http://hotels.com/hotels/9000000/8640000/8633700/8633672/8633672_20_b.jpg" }, "language" : "en", "description"...

PHP - Using CURL to authenticate and download a remote File


php,curl
I'm trying to use php to post variables to another php on a remote server and on the file at the remote server I want to read the file and force its download. The file on the remote server is not on a public folder, that's why I need to...

CURL IMAP APPEND command


curl,imap
I would like to use CURL in order to APPEND emails at the given date. $ curl -kv -u [email protected]:user 'imap://IP:143' -X 'APPEND INBOX (Mon, 7 Feb > 1994 21:52:25 +0000) {310}' ' Date: Mon, 7 Feb 1994 21:52:25 -0800 (PST) From: Fred Foobar <[email protected]> Subject: afternoon meeting To: [email protected]

Plain text emails displayed as attachment on some email clients


perl,email,attachment,mime,plaintext
The email can be viewed normally using some email clients (Evolution, Thunderbird), but with other clients (e.g., GMX) the body of the message remains empty and an attachment containing the body of the message is sent. I would like to know how I can prevent this from happening, since the...

How to extract some text from an HTML doc using Web::Query


perl
I'm trying to extract the subject (between the h3 tags) in the following example using Web::Query. Find 'h3' returns the author text, but I want the h3 in the subject class instead. I tried .subject.div.h3 but it returns undef. #!/usr/bin/perl use strict; use warnings; use Web::Query; # libweb-query-perl use Data::Dumper;...

How to pass a hash as optional argument to -M in command line


perl,hash,package,command-line-interface
I know that when we need to pass some arguments to the use keyword after a package name we can pass them in the command line after the -M parameter. For example: use feature 'say'; say 'hello!'; can be invoked from the command line with >perl -Mfeature=say -e"say 'hello!'" But...

Perl: Using Text::CSV to print AoH


arrays,perl,csv
I have an array of hashes (AoH) which looks like this: $VAR1 = [ { 'Unit' => 'M', 'Size' => '321', 'User' => 'test' } { 'Unit' => 'M' 'Size' => '0.24' 'User' => 'test1' } ... ]; How do I write my AoH to a CSV file with separators,...

Looping variables


perl,scripting
I'm working with perl to make a script that will work with Dot products/assorted vector math. I've got a working script ( Still very much in progress/needs refinement ) that will do what I ask. #!/usr/bin/perl use strict; use warnings; use diagnostics; use Math::Vector::Real; use 5.010; use Math::Trig; my $source...

Find numbers in a file and change their value with perl


regex,perl
I have a file with some data in it but there are a bunch of annoying numbers that are less than one which I wanted to just change to 1 instead of manually doing it. I was wondering how you would do this in perl. I tried using something like...

How to get all the tags from the tag using PHP DOM?


php,html,dom,curl
I am using PHP's curl for getting webpage data, and for extracting <a> tags from the <body> I am using DOM Document, but it is creating an error. <?php $ch = curl_init(); curl_setopt_array($ch, array( CURLOPT_URL => "http://www.google.co.in/?gfe_rd=cr&ei=B5GBVezbDeHA8geU8pfYBw", CURLOPT_RETURNTRANSFER => 1, CURLOPT_USERAGENT => 'Webbot UA' )); $result = curl_exec($ch); curl_close($ch); if...

Check for decimal point and add it at the end if its not there using awk/perl


regex,perl,shell,awk
I have test.dat file with values given below: 20150202,abc,,,,3625.300000,,,,,-5,,,,,,,,,,,,,,,,,,,,,, 20150202,def,,,,32.585,,,,,0,,,,,,,,,,,,,,,,,,,,,, 20150202,xyz,,,,12,,,,,0.004167,,,,,,,,,,,,,,,,,,,,,, My expected output is shown below: 20150202,abc,,,,3625.300000,,,,,-5.,,,,,,,,,,,,,,,,,,,,,, ^. added here 20150202,def,,,,32.585,,,,,0.,,,,,,,,,,,,,,,,,,,,,, ^. added here 20150202,xyz,,,,12.,,,,,0.004167,,,,,,,,,,,,,,,,,,,,,, ^. added here So if column 6 and 11 doesn't have decimal point in it, then we should add '.' at the end of...

How to copy matches from an extremely large file if it contains no newlines?


python,linux,bash,perl,grep
The problem is I cannot avoid working with extremely big files which contain no newlines in them: <a>text1</a>...gigabytes of data here, all in one single line...[a text to extract b> What should I do if I want to copy matches from this file (putting every match in a separate line,...

Perl: Multiply loops, 1 hash and regex


arrays,regex,perl,hash,perl-data-structures
I got stuck with logic behind loops (while & foreach) and AoH. I have basic knowledge about loops and arrays of hashes, but I can't quite understand how to combine them into 1 single and simple solution. My task is to check regular user's password age, if it is older...

Taking multiple header (rows matching condition) and convert into a column


bash,perl,command-line,awk,sed
Hello I have a file that has multiple Headers in it that I need to have turned into column values. The file looks like this: Day1 1,Smith,London 2,Bruce,Seattle 5,Will,Dallas Day2 1,Mike,Frisco 4,James,LA I would like the file to end up looking like this: Day1,1,Smith,London Day1,2,Bruce,Seattle Day1,5,Will,Dallas Day2,1,Mike,Frisco Day2,4,James,LA The file...

Why Filter::Indent::HereDoc complain when blank line in middle of HereDoc


perl,heredoc
I am trying Filter::Indent::HereDoc which allows one to indent the HereDocument. This is very useful, to be able to have HereDoc that flows with the code logic. From the above link When a 'here document' is used, the document text and the termination string must be flush with the left...

Command line arguments in Perl


perl
I am working on an open source project for GSoC and I have this piece of Perl code with me. I need to create another Perl file for a similar task. However, I am having trouble understanding 3 lines of this file. More specifically, I am not able to understand...

Creating a sequence of unique random digits


arrays,perl,foreach,unique
I have the following code use strict; use warnings; use 5.22.0; # Generating random seed using # Programming Perl p. 955 srand( time() ^ ($$ + ($$ << 15 ) ) ); # Generating code that could have duplicates my @code = ( (int(rand(9)) + 1), (int(rand(9)) + 1), (int(rand(9))...

XML Post from form using curl PHP


php,xml,curl
What is the best way to post XML from a form using Curl. I have a HTML Form and i post the data to a new php page and all the fields are collected. How do i collect these fields in XML Format. I can process it from a xml...

Reading from DATA file handle


performance,perl
My perl module needs to use a look up table that's about 309,000 lines long. Currently the part that loads the table into an array looks (roughly) like this: use strict; use warnings; # load all the data from below my @ref_data; while (<DATA>) { push @ref_data, $_ } close...

Regex in Perl Uninitialized $1


regex,perl
My string looks like this: <File `../Path/To/My_File.gif'> I want to extract just "Path/To/My_File.gif". Here is the check I have: if ($row =~ /(?<=File `..\/).*(?=')/) { print "Found it!\n"; print "$1\n"; } I see "Found it!" printed to the console but also get an error saying that $1 is uninitialized. What...

What does this horribly ugly (yet somehow secretly beautiful) Perl code do?


perl,formatting,deobfuscation
I found this code in a Powerpoint presentation about Perl scripting and this was on a page demonstrating how ugly Perl code can get if you really wanted to make it that way. The presentation says nothing about what this code does. Honestly I am just very curious to know......

Understanding curl request


curl
I got following line along with the response. How to force that connection is closed after the response? Connection #0 to host localhost left intact curl command: curl -v --user uname:password -H "Accept: application/xml" http://localhost:8090/services/VariableService/variableService/...

calling cgi script from other cgi script


perl,cgi
I found a puzzling behavior using perl cgi for which -- I guess -- there is a perfectly valid explanation but I couldn't find one. There is this question on stackoverflow, but what it describes seems to fail for me. The situation: I have two perl cgi scripts. One of...

Version-dependent fallback code


perl
I have a script that needs to run on multiple servers, however, each server may not have the same version of Perl available and may have differing features. Perl v5.14 introduced the /r modifier for regular expressions which returns the result of a substitution and leaves the original text alone....

Deleting upto a line


bash,perl,shell,sed,scripting
I have a line that looks like: foo cat dog = -48.34277635 foo(horse->0) = -60.34277635 and I only want the last set of numbers: -60.34277635 The line is formatted with that exact spacing. I've looked everywhere for a simpler solution, but I can't find anything without chopping the file piece...

python requests with redirection


python,authentication,redirect,curl,python-requests
Trying to authenticate on http://72.ru site, noticed that there were a redirect to https://loginka.ru/auth/. Found that there were 302 POST with plain credentials in data form. Copying headers from Chrome can reproduce that in cURL, but still can't reach in requests module. Warning: page is full of russian letters, registration...

CORS, Client vs. Server & Rails API GET Request


ruby-on-rails,ruby,api,curl,client
I've built a GET Rails API that checks for an access token and that the registered request.env["HTTP_X_REAL_IP"] matches the IP address that is registered within the admin panel of the app. Example request: https://staging.mysite.com/api/v1/products?access_token=7b9f3cddd3914a6f45fa692997fe6dc9 The API works great when I'm making requests from a server by curling the request or...

Opening multiple files in perl array


arrays,perl
I have a perl script where by I assigned all the files with a .log extension to an array called @allfiles. How do I run my script for the files stored in each array? My idea is something like open(my $fn, '<', @allfiles) or die "Could not open file '@files':...