python-3.x,web-crawler , Python 3.3 TypeError: can't use a string pattern on a bytes-like object in re.findall()


Python 3.3 TypeError: can't use a string pattern on a bytes-like object in re.findall()

Question:

Tag: python-3.x,web-crawler

I am trying to learn how to automatically fetch urls from a page. In the following code I am trying to get the title of the webpage:

import urllib.request
import re

url = "http://www.google.com"
regex = '<title>(,+?)</title>'
pattern  = re.compile(regex)

with urllib.request.urlopen(url) as response:
   html = response.read()

title = re.findall(pattern, html)
print(title)

And I get this unexpected error:

Traceback (most recent call last):
  File "C:\Users\Abhishek\Desktop\Crawler.py", line 11, in <module>
    title = re.findall(pattern, html)
  File "C:\Python33\lib\re.py", line 201, in findall
    return _compile(pattern, flags).findall(string)
TypeError: can't use a string pattern on a bytes-like object

What am I doing wrong?

Thanks!


Answer:

You want to convert html (a byte-like object) into a string using .decode, e.g. html = response.read().decode('utf-8').

See Convert bytes to a Python String


Related:


Installing Python 3 Docker Ubuntu error command 'x86_64-linux-gnu-gcc


python,python-3.x,amazon-web-services,docker
I'm trying to create a dockerfile that uses Python 3. FROM ubuntu:14.04 RUN apt-get update RUN apt-get install -y python3 python3-dev python-pip RUN apt-get install -y libxml2-dev libxslt1-dev libpq-dev libjpeg-dev libfreetype6-dev zlib1g-dev RUN cd /var/projects/apps && pip install -r requirements.txt I get the error fatal error: Python.h: No such file...

TCL parsing a list of arguments to an external call


python,python-3.x,tcl
Im trying to execute a call to a python script through aldec riviera-pro my call is python $python_app_name $python_app_args However my $python_app_args are passed as a single string and not multiple strings resulting in that the python application only sees it as one argument and its execution fails. I've tried...

Put a QLineEdit() into a QTreeWidgetItem()


python,python-3.x,pyqt,pyqt5
Is it possible to put a QLineEdit() into a QTreeWidgetItem() in order to modify the text of the QTreeWidgetItem ? Here is my code def addItemsToTree(self, parent, text, checkable=False, expanded=True): self.item = QTreeWidgetItem(parent, [text]) if checkable: self.item.setCheckState(0, Qt.Unchecked) else: self.item.setFlags(self.item.flags() & ~Qt.ItemIsUserCheckable) self.item.setExpanded(expanded) min = QLineEdit() max = QLineEdit() self.addChildTree(self.item,...

Python MVC style GUI Temperature Converter


python,user-interface,python-3.x,model-view-controller,tkinter
#The view (GuiTest.py) import tkinter import Controller class MyFrame(tkinter.Frame): def __init__(self, controller): tkinter.Frame.__init__(self) self.pack() self.controller = controller #Output Label self.outputLabel = tkinter.Label(self) self.outputLabel["text"] = ("") self.outputLabel.pack({"side":"right"}) #Entry Space self.entrySpace = tkinter.Entry(self) self.entrySpace["text"] = ("") self.entrySpace.pack({"side":"left"}) #two convert buttons self.convertButton=tkinter.Button(self) self.convertButton["text"]= "Fahrenheit to...

Finding the number of letters in a sentence?


python,python-3.x
I am trying to write a program to find the total number of letters in a sentence. I would like to know why my program is wrong. This is what I tried: words = ["hi", "how", "are", "you"] alphabet = ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j",...

Python 3.4: List to Dictionary


python,list,python-3.x,dictionary
I have a string as follows : ['Total Revenue', 31821000, 30871000, 29904000, 'Cost of Revenue', 16447000, 16106000, 15685000, 'Gross Profit', 15374000, 14765000, 14219000, 'Research Development', 1770000, 1715000, 1634000, 'Selling General and Administrative', 6469000, 6384000, 6102000, 'Non Recurring', '-', '-', '-', 'Others', '-', '-', '-', 'Total Operating Expenses', '-', '-', '-',...

Pylint Error when using metaclass


python,python-3.x,vim,pylint,syntastic
i try to fix all pylint errors and pylint warnings in a project. but i keep getting an error when i set a metaclass (https://www.python.org/dev/peps/pep-3115/). here is my example code: #!/usr/bin/env python3 class MyMeta(type): pass class MyObject(object, metaclass=MyMeta): # pylint error here pass the error just says "invalid syntax". i...

T_STRING error in my php code [duplicate]


php,web-crawler
This question already has an answer here: PHP Parse/Syntax Errors; and How to solve them? 10 answers I have this PHP that is supposed to crawl End Clothing website for product IDs When I run it its gives me this error Parse error: syntax error, unexpected 'i' (T_STRING), expecting...

Python Reuse a Variable in the Else Block of an If-Else Statement


python,python-3.x,if-statement,condition
I'm currently working on a simple file transfer program in Python. I am having trouble with the function for prompting the user for the location of the folder to be copied. def getSrc(): if getSrc.has_been_called is False: source = askdirectory() getSrc.has_been_called = True return source else: return source getSrc.has_been_called =...

Pyqt - Add a QMenuBar to a QMainWindow which is in another class


python-3.x,pyqt,pyqt5
I have 2 classes : MainWindow() and Menubar(). MainWindow() is a QMainWindow and Menubar is a QMenuBar. I don't know how I can add the menu bar to the main window. With the QToolBar, I can make something like this : self.toolbar = Toolbar() self.addToolBar(self.toolbar) But with the QMenubar, there...

“Initializing” a constant containing a file in python?


python,python-3.x
I know that initializing variables/constants in python is not necessary, but my professor still wants us to initialize variables for practice. In my program, I have a file to which I assigned a name: infile = open("studentinfo.txt", "r") How would it make sense to initialize the constant "infile"? Can I...

python 3 error with print function syntax


python,python-3.x,printing
I have a list of lists with tuples. I want to get the length of a tuple using: item1=(4, 8, 16, 30) list6=[[(4, 8, 16, 29)], [(4, 8, 16, 30)], [(4, 8, 16, 32)]] print("list6.index((4, 8, 16, 29)):",list6.index([item1])) print("len(list6[1]):"), len(list6[1]) Output: list6.index((4, 8, 16, 29)): 1 len(list6[1]): There is no...

Django runserver not serving some static files


django,python-3.x
My local testing server for Django v1.8.2 on Windows 8.1 is only serving certain static files. Others are giving a 404 error. #urls.py - excerpt urlpatterns = [ url(r'^$', views.index) ] + staticfiles_urlpatterns() #settings.py - excerpt INSTALLED_APPS = ( 'django.contrib.admin', 'django.contrib.auth', 'django.contrib.contenttypes', 'django.contrib.sessions', 'django.contrib.messages', 'django.contrib.staticfiles', 'main', 'users' ) STATIC_URL =...

How to have multiple text widgets with scrollbars in a frame on tkinter


python,python-3.x,tkinter
I want my text widgets to have scrollbar capability within each text widget. I have created a canvas and within it have embedded a frame using create_window. I then put 2 Text widgets into this frame. I want each of the text widgets to have a scrollbar, however when I...

Pass function call as a function argument


python,python-2.7,python-3.x
Code: def function1(a,b): return a-1,b-1 def function2(c,d): return c+1,d+1 print function1(function2(1,2)) Error: Traceback (most recent call last): File "C:\Users\sony\Desktop\Python\scripts\twitter_get_data.py", line 6, in <module> print function1(function2(1,2)) TypeError: function1() takes exactly 2 arguments (1 given) [Finished in 0.1s with exit code 1] Why the above error? ...

subprocess python 3 check_output not same as shell command?


python-3.x,subprocess
I am trying to use the subprocess module in python but its a bit tricky to get working. Here's my code import sys import os import subprocess import shlex def install_module(dir_path, command): c = shlex.split(command) os.chdir(dir_path) try: p = subprocess.check_output(c, shell=True) except subprocess.CalledProcessError as e: #print('install failed for: ' +...

index() Method Not Accepting None as Start/Stop


python,python-3.x
While writing a binary search method for a list I decided to use the builtin index() method on a smaller slice of the list determined via the binary search method. However in certain cases I was getting the error: TypeError: slice indices must be integers or None or have an...

Python3 after cursor.execute it stopped?


mysql,python-3.x
After much trying on python3 (as of still new in this language), the line whereby cursor.execute will prevent the for loop to continue when condition met. However when I comment cursor.execute line, the looping able to continue until the end. How can I made it continue till the last result...

Multiple random choices with different outcomes


python,python-3.x,random
I'm trying to make a random NPC generator in Python--the last time I tried this, it was in PHP, and that went. . . strangely. What I'd like to be able to do with this is to call the defined variables within the string multiple times. I can do that...

argparse optional value for argument


python,python-3.x,command-line-interface,argparse
I want to distinguish between these three cases: The flag is not present at all python example.py; The flag is present but without a value python example.py -t; and The flag is present and has a value python example.py -t ~/some/path. How can I do this with Python argparse? The...

Python file processing?


python,python-3.x
My assignment was to write a program which extracts the first/last names, birth year, and ID from a file, manipulate that information to create a username and formatted ID, prompt the user for 3 test grades, calculate the average, and finally write all the information to a new file. This...

The event loop is already running


python,python-3.x,pyqt,pyqt4
I have the following 5 files: gui.py # -*- coding: utf-8 -*- from PyQt4 import QtCore, QtGui try: _fromUtf8 = QtCore.QString.fromUtf8 except AttributeError: def _fromUtf8(s): return s try: _encoding = QtGui.QApplication.UnicodeUTF8 def _translate(context, text, disambig): return QtGui.QApplication.translate(context, text, disambig, _encoding) except AttributeError: def _translate(context, text, disambig): return QtGui.QApplication.translate(context, text, disambig)...

tkinter showerror creating blank tk window


python-3.x,tkinter,messagebox,tkmessagebox
I have a program that needs to display graphical error messages to users. It is a tkinter GUI, so I am using tkinter.messagebox.showerror When I call showerror, it shows the error, but also creates a blank "tk" window, the kind created when an instance of the Tk class is called,...

writing a tkinter scrollbar for canvas within a class


python,python-3.x,tkinter
I've searched around and cannot seem to find an answer for my problem. I am trying to create a working scrollbar for the following code and cannot seem to get it to work. The problem appears to be with the OnFrameConfigure method. I have seen elsewhere that the method should...

How do I make each histogram bin show me the frequency of each action/event/item?


python-3.x,matplotlib,histogram
I want to plot a histrogram showing the frequencies of various actions at different intervals. I want to bin the occurence of actions into 10 minute intervals. binwidth = 10*60 #10 minutes times = array([ 1.43431325e+09, 1.43431325e+09, 1.43431329e+09, 1.43431330e+09, 1.43431333e+09, 1.43431334e+09, 1.43431345e+09, 1.43431346e+09, 1.43431346e+09, 1.43431346e+09, 1.43431349e+09, 1.43431350e+09, 1.43431350e+09, 1.43431351e+09, 1.43431354e+09,...

multiple iteration of the same list


python,python-2.7,python-3.x,numpy,shapely
I have one list of data as follows: from shapely.geometry import box data = [box(1,2,3,4), box(4,5,6,7), box(1,2,3,4)] sublists = [A,B,C] The list 'data' has following sub-lists: A = box(1,2,3,4) B = box(4,5,6,7) C = box(1,2,3,4) I have to check if sub-lists intersect. If intersect they should put in one tuple;...

How to make the Sieve of Eratosthenes faster?


python-3.x,primes,sieve-of-eratosthenes,number-theory
I am trying to solve the 10 problem in the Project Euler. It consists on finding the sum of all the primes below two million. I wrote the following code based on the Sieve of Eratosthenes. import time t0 = time.time() n=200000 liste=list(range(2,n)) k=2 s=2 while k <=n: liste=list(set(liste)-set(range(k,n,k))) if...

Addition of two dates on python 3


python,csv,datetime,python-3.x
I try adding date and hours from csv file in one datetime variable. I read questions about adding some timedelta and official doc https://docs.python.org/3/library/datetime.html#timedelta-objects, but don't understend how it works. My csv row looks like - ['2005.02.28', '17:38', '1.32690', '1.32720', '1.32680', '1.32720', '5'].I convert row[0] = 2005.02.28 to date and...

Cancel last line iteration on a file


python,python-3.x,for-loop,file-io
I need to iterate on a file, stop iteration on a condition and then continue parse the file at the same line with another function (That may change so I can't just add content in the previous function). An example file (file.txt) : 1 2 3 4 5 6 7...

sys.argv in a windows environment


python,windows,python-3.x
I'm attempting to learn python using the book 'a byte of python'. The code: import sys print('the command line arguments are:') for i in sys.argv: print(i) print('\n\nThe PYTHONPATH is', sys.path, '\n') outputs: the command line arguments are: C:/Users/user/PycharmProjects/helloWorld/module_using_sys.py The PYTHONPATH is ['C:\\Users\\user\\PycharmProjects\\helloWorld', 'C:\\Users\\user\\PycharmProjects\\helloWorld', 'C:\\Python34\\python34.zip', 'C:\\Python34\\DLLs', 'C:\\Python34\\lib', 'C:\\Python34', 'C:\\Python34\\lib\\site-packages']...

Python 3 filtering directories by name that matches specific pattern


python,regex,python-3.x,directory,filtering
Currently I'm developing script that will perform cleanup of specific directories. For example: Directory: /app/test/log contains many sub-directories with name pattern testYYYYMMDD and logYYYYMMDD What I need, is to filter out only directories like testYYYYMMDD To get all folders with absolute path that are in given directory I use: folders_in_given_folder...

Check if element exists in fetched URL [closed]


javascript,jquery,python,web-crawler,window.open
I have a page with, say, 30 URLS, I need to click on each and check if an element exists. Currently, this means: $('area').each(function(){ $(this).attr('target','_blank'); var _href = $(this).attr("href"); var appID = (window.location.href).split('?')[1]; $(this).attr("href", _href + '?' + appID); $(this).trigger('click'); }); Which opens 30 new tabs, and I manually go...

Python bruteforce combinations given a starting string


python,python-3.x,brute-force
I'm trying to do a bruteforce string generator in Python and itertools.combinations_with_replacement seemed to do just the trick. gen = itertools.combinations_with_replacement('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ',12) for combination in gen: check(''.join(combination)) Say the user runs the program for some hours and reaches up to string aaaeabdouzIU. Is there any way given a string where they...

How to access a class's property using a partialmethod?


python,python-3.x,descriptor
I have a need to create many similar functions in class definitions using their properties. To me, it makes perfect sense to use partial functions for this. However, the properties are not passing what I want to the partial methods (e.g. the property object is being passed, not what it...

Callable not defined for django.db.models field default


python,django,python-3.x,django-models
I am using PyCharm 4.5.2, Django 1.8.2. If I define a class as: class User(models.Model): first_name = models.CharField(max_length=256) last_name = models.CharField(max_length=256) slug = models.SlugField(max_length=256, unique=True, default=make_slug) def make_slug(self): return self.first_name + self.last_name[0] The IDE highlights default=make_slug with make_slug being undefined. The interpretter agrees and when the development server tries to...

django-admin startproject not working with python3 on OS X


python,django,osx,python-2.7,python-3.x
I have python3 installed with Django 1.8.2 on Mac OS. There is also python 2.7 installed by default with the OS. When trying to run startproject I get - $ django-admin startproject mysite Traceback (most recent call last): File "/usr/local/bin/django-admin", line 7, in <module> from django.core.management import execute_from_command_line ImportError: No...

Python3 create files from dictionary


file,python-3.x,dictionary
I have a dictionary in a function which is called searchInMyDict(dict) for example. The dictionary included in that function has for key a group name and has for value a list of gene's functions. the dictionary looks like : {"OG_1": ["gene's functionA, gene's functionB, gene's functionC"] "OG_2": ["gene's functionM, gene's...

“Initializing” variables in python?


python,python-3.x
Even though initializing variables in python is not necessary, my professor still wants us to do it for practice. I wrote my program and it worked fine, but after I tried to initialize some of the variables I got an error message when I tried to run it. Here is...

Python3:socket:TypeError: unsupported operand type(s) for %: 'bytes' and 'bytes'


sockets,python-3.x
I am try to use python socket package to implement an echo server. But it continuously occurs the error: TypeError: unsupported operand type(s) for %: 'bytes' and 'bytes', is there any errors in my code? here is the error: Exception in thread Thread-1: Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/threading.py",...

What's the fastest way to compare datetime in pandas?


python,python-3.x,numpy,pandas,datetime64
I have two big csv files with different number of rows which I am importing as follows: tdata = pd.read_csv(tfilepath, sep=',', parse_dates=['date_1']) print(tdata.iloc[:, [0,3]]) TBA date_1 0 0 2010-01-04 1 9 2010-01-05 2 0 2010-01-06 3 8 2010-01-07 4 0 2010-01-08 5 0 2010-01-09 pdata = pd.read_csv(pfilepath, sep=',', parse_dates=['date_2']) print(pdata.iloc[:,...

What is a reliable isnumeric() function for python 3?


python,regex,validation,python-3.x,isnumeric
I am attempting to do what should be very simple and check to see if a value in an Entry field is a valid and real number. The str.isnumeric() method does not account for "-" negative numbers, or "." decimal numbers. I tried writing a function for this: def IsNumeric(self,...

How to avoid user to click outside popup Dialog window using Qt and Python?


qt,user-interface,python-3.x,dialog,qt-creator
I created a Dialog window using Qt Creator and Python. I would like that Window stays on the top of my Gui AND avoid users to click outside that Dialog Until this dialog was closed.

If a block of code creates an error, do x; if not, do y (Python)


python,python-3.x
In Python, is it possible to test for an error in a block of code, and if one shows up, do something; if not, do something else? The psuedo-code would look like checkError: print("foobar" + 123) succeed: print("The block of code works!") fail: print("The block of code does not work!")...

Distinguishing between HTML and non-HTML pages in Scrapy


python,html,web-crawler,scrapy,scrapy-spider
I am building a Spider in Scrapy that follows all the links it can find, and sends the url to a pipeline. At the moment, this is my code: from scrapy import Spider from scrapy.http import Request from scrapy.http import TextResponse from scrapy.selector import Selector from scrapyTest.items import TestItem import...

Wrapping Functions in Python 3.4 missing required positional argument


python,python-3.x,flask,flask-login
I am trying to customize a login_required decorator from the Flask-Login package. I have read the source code and mimicked the syntax. Mine: def login_role_required(f, req_roles=['any']): @wraps(f) def decorated_view(*args, **kwargs): if current_app.login_manager._login_disabled: return f(*args, **kwargs) if not current_user.is_authenticated(): return current_app.login_manager.unauthorized() if req_roles == ['any']: return f(*args, **kwargs) user_roles = current_user.get_roles...

Why does round(5/2) return 2?


python,python-3.x,python-3.4
Using python 3.4.3, round(5/2) # 2 Shouldn't it return 3? I tried using python 2 and it gave me the correct result round(5.0/2) # 3 How can I achieve a correct rounding of floats?...

How to parse this string?


python,python-3.x
I have a string like the below string: >>> string = """00 1f [email protected] 00c 00e 00N 00> 00E 00O 00F 002 00& 00* 00/ 00) 00 1f 00 1c 00 00 00 17 00\r 00 08 00 03 00 f8 ff ea ff e1 ff e1 ff e0 ff...

Error Hashing + Salt password


python,authentication,python-3.x,hash,salt
Someone can help me to fix this problem: TypeError: can't concat bytes to str I am trying to safely store hash+salt passwords, I think the problem is that my salt is a byte object how can I transform it into a string? Or is there a way to hash it...

Return to main fuction in python


python-3.x,def
Working on Python 3.4.3 Let's say I have created three fuctions: def choosing(mylist=[]): print("We will have to make a list of choices") appending(mylist) done = False while(done == "False"): confirm = input("Is your list complete?[Y/N]") if(confirm == "Y"): print("Yaay! Choices creation complete." "{} choices have been added successfully".format(len(mylist))) done =...

Scrapy not entering parse method


python,selenium,web-scraping,web-crawler,scrapy
I don't understand why this code is not entering the parse method. It is pretty similar to the basic spider examples from the doc: http://doc.scrapy.org/en/latest/topics/spiders.html And I'm pretty sure this worked earlier in the day... Not sure if I modified something or not.. from selenium.webdriver.support.wait import WebDriverWait from selenium.webdriver.common.by import...