FAQ Database Discussion Community


What is the best way to use HTTP Keep-Alive in Python 2.7 [duplicate]

python,http,urllib2,keep-alive
This question already has an answer here: Python urllib2 with keep alive 7 answers There are several options to use Keep-Alive. I'm trying to use urllib2 to use Keep-Alive, but it doesn't officially support it in Python 2.7. I heard pycurl had the funciton. That's why I'm here to...

whats the best chunk size for a urllib2.urlopen read?

python,urllib2
I'm using this piece of code to download mp3 podcasts. req = urllib2.urlopen(item) CHUNK = 16 * 1024 with open(local_file, 'wb') as fp: while True: chunk = req.read(CHUNK) if not chunk: break fp.write(chunk) Which works perfectly - but I am wondering what is the optimal chunk size for best download...

Get File Upload Time From Server

python,urllib2,urllib
Is there a way, using urllib2 or something else, to check the time a file was uploaded to a URL? Or even the time the file on the server side was last modified? At the moment I'm manually using urllib2.urlopen() to read data from a url address. The arguments for...

Cannot Write Web Crawler in Python

python,web-crawler,beautifulsoup,urllib2
I'm having an issue writing a basic web crawler. I'd like to write about 500 pages of raw html to files. The problem is my search is either too broad or too narrow. It either goes too deep, and never gets past the first loop, or doesn't go deep enough,...

Display Specific Fields (Through Parse) from JSON response in Python

python,json,urllib2
I am looking to display specific fields from a JSON Response returned from the following URL in a printed list in the cli: http://www.cvedetails.com/json-feed.php?numrows=5&vendor_id=26&product_id=0&version_id=0&hasexp=1&opec=1&opov=1&opcsrf=1&opfileinc=1&opgpriv=0&opsqli=1&opxss=0&opdirt=0&opmemc=0&ophttprs=0&opbyp=0&opginf=0&opdos=0&orderby=2&cvssscoremin=0 I am able to output the JSON response using requests library as such: import urllib, json url =...

Can't send HTTPS request through proxy using urlib2

python,https,openssl,urllib2
I'm trying to create a Python script that sends a HTTPS request through a proxy (Burp, to be exact), but it keeps failing with ssl.CertificateError: hostname 'example.com:443' doesn't match u'example.com' Here's an abbreviated version of my code: proxy = urllib2.ProxyHandler({'https': '127.0.0.1:8080'}) opener = urllib2.build_opener(proxy) opener.addheaders = [ ("Host", "example.com"), ......

Python 3 urllib produces TypeError: POST data should be bytes or an iterable of bytes. It cannot be of type str

python,python-2.7,python-3.x,urllib2,urllib
I am trying to convert working Python 2.7 code into Python 3 code and I am receiving a type error from the urllib request module. I used the inbuilt 2to3 Python tool to convert the below working urllib and urllib2 Python 2.7 code: import urllib2 import urllib url = "https://www.customdomain.com"...

Send an image with flask by using urllib2

python,flask,urllib2
I'm new to Flask and I want to send an image to a client that was previously received from a external server with urllib2. Here is my example code for sending the google logo to the client: import urllib2 from flask import Flask, send_file app = Flask(__name__) @app.route('/getTestImage') def getTestImage():...

Is there a better way to retrieve webpage sizes with Python?

python,urllib2,python-requests,urllib
I'd like a sanity check on this Python script. My goal is to input a list of urls and get a byte size, giving me an indicator if the url is good or bad. import urllib2 import shutil urls = (LIST OF URLS) def getUrl(urls): for url in urls: file_name...

Kivy app (on android) crashes when attempting to use google directions api

urllib2,kivy,urlrequest,buildozer
I new to Kivy (and relatively new to Python) and I am having a problem getting UrlRequests to work. In particular, I am trying to use the google directions api in an App for android. First of all, the code works (completely) when I run the main.py file through python....

Overriding HTTP errors with urllib2

python,http,beautifulsoup,urllib2
I have this code, but it is not working. I want to use urllib2 to iterate through a list of urls. Upon opening each url, BeautifulSoup locates a class and extracts that text. The program stalls if there is an invalid url in the list. If there is any error,...

Proper way to fix a url without http://

python,url,urllib2,urllib
I'm trying to open a list of urls of this format, using in Urllib2: google.com facebook.com youtube.com yahoo.com baidu.com Using this method: urllib2.urlopen(url): And getting this error: File "fetcher.py", line 98, in fetch_urls_and_save response = urllib2.urlopen(url) File "urllib2.py", line 154, in urlopen return opener.open(url, data, timeout) File "urllib2.py", line 423,...

Python: urllib2 get nothing which does exist

python,web-scraping,web-crawler,urllib2
I'm trying to crawl my college website and I set cookie, add headers then: homepage=opener.open("website") content = homepage.read() print content I can get the source code sometimes but sometime just nothing. I can't figure it out what happened. Is my code wrong? Or the web matters? Does one geturl() can...

Python dictionary with same key for parsing in a url so must be ordered

python,urllib2
I have a problem trying to create a dictionary, ordering it and joining it for parsing with urllib2. This is my code: values = {'STR':'1', 'STR':'123', 'STR':'3456', 'BAT':'95'} ary_ordered_names = [] ary_ordered_names.append('STR') ary_ordered_names.append('STR') ary_ordered_names.append('STR') ary_ordered_names.append('BAT') queryString = "&".join( [ item+'='+urllib.pathname2url(values[item]) for item in ary_ordered_names ] ) print queryString url =...

Strange Output from Python urllib2

python,html,python-2.7,urllib2,urllib
I would like to read to source code of a webpage using urllib2; however, I'm seeing a strange output that I've not seen before. Here's the code (Python 2.7, Linux): import urllib2 open_url = urllib2.urlopen("http://www.elegantthemes.com/gallery/") site_html = open_url.read() site_html[50:] Which gives the output: '\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03\xe5\\ms\xdb\xb6\xb2\xfel\xcf\xe4?\xc0<S[\x9a\x8a\xa4^\xe28u,\xa5\x8e\x93\xf4\xa4\x93&\x99:9\xbdw\x9a\x8e\x07"' Does anyone know why it's showing...

urllib2 request returns a different page about 1 in 5 times

python,request,urllib2,user-agent
import urllib2 req = urllib2.Request('http://www.amazon.com/Sweet-Virgin-Organic-Coconut-13-5oz/dp/B00Q5CIL4Y', headers={ 'User-Agent': 'Mozilla/5.0' }) html = urllib2.urlopen(req).read() print len(html) That's the smallest example I can make. If you run that then ~1 in 5 times the length of the response will be 5769, and the other times it will be a normal usable response. Whats...

Simple script running very slow

python,urllib2
I wrote this simple script to check whether or not a set of Bitcoin addresses have had transactions. However I think it's running very slowly because it's processing 2 per second more or less. The file has over 60k addresses so... this is gonna take forever! Is that ok? import...

Incorrect output when downloading .html files

python,urllib2
I simply wish to download .html files in python. Code: import urllib2 hdr = {'User-Agent': 'Mozilla/5.0'} urls=['http://www.nydailynews.com/sports/soccer-fans-stampede-south-african-stadium-nigeria-north-korea-world-cup-warmup-article-1.179211'] path='C:/Users/sony/Desktop/Python' for i,site in enumerate(urls): print (site) req = urllib2.Request(site, headers=hdr) page = urllib2.build_opener(urllib2.HTTPCookieProcessor).open(req) page_content = page.read() with open(path+'/'+str(i)+'.html', 'w') as fid: fid.write(page_content) But this...

Python percent encoding only certain characters in a URL

python,python-3.x,urllib2,urllib,percent-encoding
I have to percent encode only # character if it appears in a given url. I know that we can encode a URL using urllib.quote. It takes a safe keyword to set a particular character to be safe for the URL. I am looking for something like: a = 'http://localhost:8001/3.0/lists/list_1.localhost.org/roster/owner#iammdkdkf'...

Urllib2 and JSON Object Error

python,django,python-3.x,urllib2,django-1.7
I am following a tutorial to add 'Search Function' in my project. However the tutorial is based on Python2x & Django 1.7 while I am using Python 3.4 & Django 1.7. The search code uses bing_search along with urllib2. However urllib2 is not supported in Python 3 and the same...

Coupling a SQL database through a urllib2 to a Python application

python,sql,urllib2
Momentarily I am creating a python based application through the programme Ren'py. Now I have to couple the game with a SQL database. The admin on the board of the programme recommended using urllib to do this. http://lemmasoft.renai.us/forums/viewtopic.php?f=8&t=29954 This is my thread. Now, I've managed to succesfully add the urllib...

Python urllib2 request error

python,urllib2
Python 2.7.3 (default, Mar 13 2014, 11:03:55) [GCC 4.7.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import urllib2 >>> req = urllib2.Request("http:///wp-login.php") >>> website='kseek.com.my' >>> req = urllib2.Request("http://"+website+"/wp-login.php") >>> req.add_header('User-agent', 'Mozilla 5.10') >>> req.add_header('Referer', 'http://'+website) >>> data = urllib2.urlopen(req, timeout=6).read() Traceback (most recent call last):...

Why does urllib2's .getcode() method crash on 404's?

python,urllib2
In the beginner Python course I took on Lynda it said to use .getcode() to get the http code from a url and that that can be used as a test before reading the data: webUrl = urllib2.urlopen('http://www.wired.com/tag/magazine-23-05/page/4') print(str(webUrl.getcode())) if (webURL.getcode() == 200): data = webURL.read() else: print 'error' However,...

Python 2.7.10 Trying to print text from website using Beautiful Soup 4

python,python-2.7,beautifulsoup,urllib2
I want my output to be like: count:0 - Bournemouth and Watford to go head-to-head for Abdisalam Ibrahim Olympiacos midfielder Abdisalam Ibrahim is a target for Premier League new-boys Bournemouth and Watford.The former Manchester City man is keen to leave Greece this summer, and his potential availability has alerted Eddie...

urllib.request.urlopen(url) how to use this function with ip address?

network-programming,urllib2,urllib,urlopen,urllib3
I'm working on Python3 with testing page load times so I created a local apache server for compare but the problem is I use urllib.request.urlopen(url) function which doesn't allow me to use my own ip address. Is there anything that helps me to get page with only ip address. Here's...

Can't get Python to download webpage source code: “browser version not supported”

python,html,browser,urllib2,source
So I'm trying to write a program that would download the source-code of a webpage in Python 2.7. The code looks like this: import urllib2 url = "https://scrap.tf/stranges/47" req = urllib2.Request(url, headers={'User-Agent' : "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/534.30 (KHTML, like Gecko) Ubuntu/11.04 Chromium/12.0.742.112 Chrome/12.0.742.112 Safari/534.30"}) con = urllib2.urlopen(req) data =...

Use python to access a site with PKI security

python,python-2.7,ssl,urllib2,pki
I have a site that has PKI security enabled. Each client used either a card reader to load their certificate, or the certificate is installed in the IE certificate storage on their box. So my question are: How can I use either the card reader certificate or the certificate stored...

Extract the URL of stored html file

python,urllib2,bs4
I have stored some html files and renamed them. Is there some possible way I can extract the URL of the html file in python. EDIT: I wish to find the URL of the .html file and not the links present in it. I am looking for a generalised approach...

Website Treating me as mobile when scraping from html in python

python-2.7,beautifulsoup,urllib2
I am attempting to scrape data off of a website using a combination of urllib2 and beautifulsoup. At the moment, here is my code: site2='http://football.fantasysports.yahoo.com/archive/nfl/2008/619811/draftresults' players=[] teams=[] response=urllib2.urlopen(site2) html=response.read() soup=BeautifulSoup(html) playername = soup.find_all('a', class_="name") teamname = soup.find_all('td', class_="last") My problem is, that when I view the source code in Chrome,...

I can not get the whole url when the server redirect me by using urllib2.urlopen(url).geturl()

python,urllib2
For Example, I can only get 'http://www.stackoverflow.com' if the whole url is 'http://www.stackoverflow.com?key=value&key1=value1'.

How to get the hidden input's value by using python?

python,python-2.7,urllib2,findall
How can i get input value from html page like <input type="hidden" name="captId" value="AqXpRsh3s9QHfxUb6r4b7uOWqMT" ng-model="captId"> I have input name [ name="captId" ] and need his value import re , urllib , urllib2 a = urllib2.urlopen('http://www.example.com/','').read() thanx update 1 I installed BeautifulSoup and used it but there some errors code import...

Python http get - cannot replicate a curl request with headers

python,curl,http-headers,urllib2,http-get
I have the following curl command: curl -H "User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Firefox/38.0" -H "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8" -H "Connection: keep-alive" -X GET http://example.com/en/number/111555000 Unfortunately I was not able to replicate it... I tried with: url = http://example.com/en/number/111555000 headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Firefox/38.0',...

Extract News article content from stored .html pages

python,urllib2,bs4
I am reading text from a html files and doing some analysis. These .html files are news articles. Code: html = open(filepath,'r').read() raw = nltk.clean_html(html) raw.unidecode(item.decode('utf8')) Now I just want the article content and not the rest of the text like advertisements, headings etc. How can I do so relatively...

gevent / requests hangs while making lots of head requests

python,urllib2,python-requests,gevent,grequests
I need to make 100k head requests, and I'm using gevent on top of requests. My code runs for a while, but then eventually hangs. I'm not sure why it's hanging, or whether it's hanging inside requests or gevent. I'm using the timeout argument inside both requests and gevent. Please...

urllib2 request randomly stops working without code changes

python,urllib2
I'm querying the Mixpanel API pretty consistently, but every so often, the request does not go through and I am given this error: urllib2.URLError: <urlopen error [Errno 8] nodename nor servname provided, or not known> I did some searching and there might be some caching issues, so I tried this...

Celery/RabbitMQ unacked messages blocking queue?

python,rabbitmq,celery,urllib2
I have invoked a task that fetches some information remotely with urllib2 a few thousand times. The tasks are scheduled with a random eta (within a week) so they all don't hit the server at the same time. Sometimes I get a 404, sometimes not. I am handling the error...

No JSON object could be decoded after retrieving JSON content using POST without Gzip encoding

python,json,python-2.7,urllib2,jira-zephyr
Using Python 2.7.8 we get "ValueError: No JSON object could be decoded", if running this script: from urllib2 import urlopen, Request from json import dumps, loads, load values = dumps({ "issueId": 10600, "versionId": "10000", "cycleId": "16", "projectId": 10000 }) headers = {"Content-Type": "application/json"} request = Request("http://private-anon-491d363a1-getzephyr.apiary-mock.com/jira_server/rest/zapi/latest/execution", data=values, headers=headers) print request.get_method()...

Uploading a zip file directly to AWS S3 using Python urllib2

python,amazon-s3,zip,urllib2,python-unicode
I am trying to upload a zip file directly to S3 using a Python script, but running into some Unicode Decode Errors. What I do is generate a Pre-Signed S3 Link and then upload data to it. I know the link works fine because the upload works when I use...

Connect python application to internet through proxy IP?

python,sockets,ip,urllib2,proxy-server
At my workplace we access internet through a proxy server IP entered in Preferences > Network > Connection > Manual Proxy configuration. where the Proxy IP is entered, I want to learn how to do that setup in python so for my internet based application .

How to get a response for a streaming url on google app engine (python)

python,google-app-engine,urllib2,urlfetch
I am trying to verify if a online radio url is delivering music and if the url was redirected or not (this happens if for some reason the request url is wrong or not active). I found some advices here Fetching url in python with google app engine. However, for...

urllib2 / requests does not display iframe of the webpage

python,iframe,beautifulsoup,urllib2,python-requests
I'm trying to scrap some book data from www.amazon.in http://www.amazon.in/Life-What-Make-Preeti-Shenoy/dp/9380349300/ref=sr_1_6?s=books&ie=UTF8&qid=1424652069&sr=1-6 I need the summary of that book which is located in an iframe, but the problem is that when I try to use 'requests' to open that url it does not contain iframe in it. for example, when I do...