python,sockets,http,server , Cache a HTTP GET REQUEST in Python Sockets

Cache a HTTP GET REQUEST in Python Sockets


Tag: python,sockets,http,server

I'm making a proxy server using sockets. When the requested file is not in my current directory (cache), I do a http get request to the origin server (which is the www) and I cache it for later.

The problem with my code is that every time I get a resource from the www I cache it but the content of the file is always "Moved permanently".

So this is what happens: user requests "" by entering "localhost:8080/" into the browser. The browser will return the page correctly. When the user enters "localhost:8080/" for a 2nd time in the browser, the browser will return a page saying that has moved permanently.

Here is the code of the method that does the http get request and the caching:

    def find_on_www(conn, requested_file):
            # Create a socket on the proxy server
            print 'Creating socket on proxy server'
            c = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

            host_name = requested_file.replace("www.","",1)
            print 'Host Name: ', host_name

            # Connect to the socket to port 80
            c.connect((host_name, 80))
            print 'Socket connected to port 80 of the host'

            # Create a temporary file on this socket and ask port 80
            # for the file requested by the client
            file_object = c.makefile('r', 0)
            file_object.write("GET " + "http://" + requested_file + " HTTP/1.0\n\n")

            # Read the response into buffer
            buff = file_object.readlines()

            # Create a new file in the cache for the requested file.
            # Also send the response in the buffer to client socket
            # and the corresponding file in the cache
            temp_file = open("./" + requested_file, "wb")
            for i in range(0, len(buff)):


And here is the rest of my code, if anyone is interested:

import socket       # Socket programming
import signal       # To shut down server on ctrl+c
import time         # Current time
import os           # To get the last-modified
import mimetypes    # To guess the type of requested file
import sys          # To exit the program
from threading import Thread

def generate_header_lines(code, modified, length, mimetype):
        """ Generates the header lines for the response message """
        h = ''

        if code == 200:
            # Append status code
            h = 'HTTP/1.1 200 OK\n'
            # Append the date

            # Append the name of the server
            h += 'Server: Proxy-Server-Thomas\n'
            # Append the date of the last modification to the file
            h += 'Last-Modified: ' + modified + '\n'

        elif code == 404:
            # Append the status code
            h = 'HTTP/1.1 404 Not Found\n'
            # Append the date
            h += 'Date: ' + time.strftime("%a, %d %b %Y %H:%M:%S", time.localtime()) + '\n'
            # Append the name of the web server
            h += 'Server: Web-Server-Thomas\n'

        # Append the length of the content
        h += 'Content-Length: ' + str(length) + '\n'
        # Append the type of the content
        h += 'Content-Type: ' + mimetype + '\n'
        # Append the connection closed - let the client know we close the connection
        h += 'Connection: close\n\n'

        return h

def get_mime_type(requested_file):
    # Get the file's mimetype and encoding
        (mimetype, encoding) = mimetypes.guess_type(requested_file, True)
        if not mimetype:
            print "Mimetype found: text/html"
            return 'text/html'
            print "Mimetype found: ", mimetype
            return mimetype

    except TypeError:
        print "Mimetype found: text/html"
        return 'text/html'

class WebServer:
    def __init__(self):
        """ = ''      # Host for the server
        self.port = 8000    # Port for the server

        # Create socket
        self.socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

    def start_server(self):
        """ Starts the server
        # Bind the socket to the host and port
        self.socket.bind((, self.port))

        print "Connection started on ", self.port

        # Start the main loop of the server - start handling clients

    def shutdown():
        """ Shuts down the server """
        except Exception as e:
            print "Something went wrong closing the socket: ", e

    def main_loop(self):
        """Main loop of the server"""
        while True:
            # Start listening

            # Wait for a client to connect
            client_socket, client_address = self.socket.accept()

            # Wait for a request from the client
            data = client_socket.recv(1024)

            t = Thread(target=self.handle_request, args=(client_socket, data))

            # # Handle the request from the client
            # self.handle_request(client_socket, data)

    def handle_request(self, conn, data):
        """ Handles a request from the client """
        # Decode the data
        string = bytes.decode(data)

        # Split the request
        requested_file = string.split(' ')
        # Get the method that is requested
        request_method = requested_file[0]

        if request_method == 'GET':
            # Get the part of the request that contains the name
            requested_file = requested_file[1]
            # Get the name of the file from the request
            requested_file = requested_file[1:]

            print "Searching for: ", requested_file

                # Open the file
                file_handler = open(requested_file, 'rb')
                # Get the content of the file
                response_content =
                # Close the handler

                # Get information about the file from the OS
                file_info = os.stat(requested_file)
                # Extract the last modified time from the information
                time_modified = time.ctime(file_info[8])
                # Get the time modified in seconds
                modified_seconds = os.path.getctime(requested_file)

                print "Current time: ", time.time()
                print "Modified: ", time_modified

                if (float(time.time()) - float(modified_seconds)) > 120:  # more than 2 minutes
                    print "Time outdated!"
                    #self.find_on_www(conn, requested_file)

                # Get the file's mimetype and encoding
                mimetype = get_mime_type(requested_file)

                print "Mimetype = ", mimetype

                # Create the correct header lines
                response_headers = generate_header_lines(200, time_modified, len(response_content), mimetype)

                # Create the response to the request
                server_response = response_headers.encode() + response_content

                # Send the response back to the client

                # Close the connection

            except IOError:  # Couldn't find the file in the cache - Go find file on www
                print "Error: " + requested_file + " not found in cache!"
                self.find_on_www(conn, requested_file)

    def find_on_www(conn, requested_file):
            # Create a socket on the proxy server
            print 'Creating socket on proxy server'
            c = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

            host_name = requested_file.replace("www.","",1)
            print 'Host Name: ', host_name

            # Connect to the socket to port 80
            c.connect((host_name, 80))
            print 'Socket connected to port 80 of the host'

            # Create a temporary file on this socket and ask port 80
            # for the file requested by the client
            file_object = c.makefile('r', 0)
            file_object.write("GET " + "http://" + requested_file + " HTTP/1.0\n\n")

            # Read the response into buffer
            buff = file_object.readlines()

            # Create a new file in the cache for the requested file.
            # Also send the response in the buffer to client socket
            # and the corresponding file in the cache
            temp_file = open("./" + requested_file, "wb")
            for i in range(0, len(buff)):


        except Exception as e:
            # Generate a body for the file - so we don't have an empty page
            response_content = "<html><body><p>Error 404: File not found</p></body></html>"

            # Generate the correct header lines
            response_headers = generate_header_lines(404, '', len(response_content), 'text/html')

             # Create the response to the request
            server_response = response_headers.encode() + response_content

            # Send the response back to the client

            # Close the connection

def shutdown_server(sig, dummy):
    """ Shuts down the server """

    # Shutdown the server

    # exit the program

# Shut down on ctrl+c
signal.signal(signal.SIGINT, shutdown_server)

# Create a web server
s = WebServer()
# Start the server


The problem with your code is that when if you go to a page with that returns a status code of 301 page moved, it adds this to the header. When you view a page that is not stored in your cache, you copy the GET request that the proxy server makes straight to client. This will inform the client to make another GET request, which it makes ignoring your proxy server.

The second time you attempt to request the page through the proxy server, it retrieves the previous request from the cache. This file contains the headers from the previous request which correctly contain the redirect status code however you then add your own status code of 200 ok to the returned message. As the client reads this status code first it does not realise that you wish it to make another request to find the page that has been redirected. Therefore it just shows the page that tells you the page has moved.

What you need to do is parse the headers that are returned by the web server when the proxy server has to look at the actual page on the internet. Then depending on these server the correct headers back to the client.


Strange Behavior: Floating Point Error after Appending to List

I am writing a simple function to step through a range with floating step size. To keep the output neat, I wrote a function, correct, that corrects the floating point error that is common after an arithmetic operation. That is to say: correct(0.3999999999) outputs 0.4, correct(0.1000000001) outputs 0.1, etc. Here's...

Specified argument was out of the range of valid value to get data from network in c#

I am trying to send a command to a sensor and get the data from it using this code : const int PORT_NO = 3000; const string SERVER_IP = ""; //---listen at the specified IP and port no.--- IPAddress localAdd = IPAddress.Any; TcpListener listener = new TcpListener(localAdd, PORT_NO); Console.WriteLine("Listening..."); listener.Start();...

The event loop is already running

I have the following 5 files: # -*- coding: utf-8 -*- from PyQt4 import QtCore, QtGui try: _fromUtf8 = QtCore.QString.fromUtf8 except AttributeError: def _fromUtf8(s): return s try: _encoding = QtGui.QApplication.UnicodeUTF8 def _translate(context, text, disambig): return QtGui.QApplication.translate(context, text, disambig, _encoding) except AttributeError: def _translate(context, text, disambig): return QtGui.QApplication.translate(context, text, disambig)...

In sklearn, does a fitted pipeline reapply every transform?

Apologies if this is obvious but I couldn't find a clear answer to this: Say I've used a pretty typical pipeline: feat_sel = RandomizedLogisticRegression() clf = RandomForestClassifier() pl = Pipeline([ ('preprocessing', preprocessing.StandardScaler()), ('feature_selection', feat_sel), ('classification', clf)]),y) Now when I apply pl on a new set, pl.predict(X_classify); is RandomizedLogisticRegression going...

Python: histogram/ binning data from 2 arrays.

I have two arrays of data: one is a radius values and the other is a corresponding intensity reading at that intensity: e.g. a small section of the data. First column is radius and the second is the intensities. 29.77036614 0.04464427 29.70281027 0.07771409 29.63523525 0.09424901 29.3639355 1.322793 29.29596385 2.321502 29.22783249...

Replace nodejs for python?

i'm working in a HTML5 multiplayer game, and i need a server to sync player's movement, chat, battles, etc. So I'm looking for ways to use python instead nodejs, because i have I have more familiarity with python. The server is simple: var express = require('express'); var app = express();...

Pandas - Dropping multiple empty columns

I have some tables where the first 11 columns are populated with data, but all columns after this are blank. I tried: df=df.dropna(axis=1,how='all') which didn't work. I then used: df = df.drop(df.columns[range(11,36)], axis=1) Which worked on the first few tables, but then some of the tables were longer or shorter...

How do variables inside python modules work?

I am coming from a Java background with Static variables, and I am trying to create a list of commonly used strings in my python application. I understand there are no static variables in python so I have written a module as follows: import os APP_NAME = 'Window Logger' APP_DATA_FOLDER_PATH...

How to change the IP address of Amazon EC2 instance using boto library

How can I assign a new IP address (or Elastic IP) to an already existing AWS EC2 instance using boto library.

Peewee: reducing where conditionals break after a certain length

This is what I have:, (SomeTable.stuff == entry for entry in big_list))) The problem arises when I have a relatively large list of elements in big_list and I get this: RuntimeError: maximum recursion depth exceeded Is there another way to approach this that doesn't involve splitting up the list...

Python Popen - wait vs communicate vs CalledProcessError

Continuing from my previous question I see that to get the error code of a process I spawned via Popen in python I have to call either wait() or communicate() (which can be used to access the Popen stdout and stderr attributes): app7z = '/path/to/7z.exe' command = [app7z, 'a', dstFile.temp,...

SyntaxError: invalid syntax?

Good afternoon, I am developing a script in python and while I am trying to compile it from the terminator/terminal i always get this error, but I cannot understand where is the syntax error? File "", line 128 print ('########################') ^ SyntaxError: invalid syntax Then I just change the position...

SQLAlchemy. 2 different relationships for 1 column

I have a simple many-to-many relationship with associated table: with following data: matches: users: users_mathces: ONE user can play MANY matches and ONE match can involve up to TWO users I want to realize proper relationships in both "Match" and "User" classes users_matches_table = Table('users_matches', Base.metadata, Column('match_id', Integer, ForeignKey('', onupdate="CASCADE",...

Using counter on array for one value while keeping index of other values

After reading the answers on this question How to count the frequency of the elements in a list? I was wondering how to count the frequency of something, and at the same time retreive some extra information, through something like an index. For example a = ['fruit','Item#001'] b = ['fruit','Item#002']...

How to put an image on another image in python, using ImageTk?

I want to put an image in front of another one, then use this combined image as a button's background image in Tkinter. How can I do it? I am free to import Tkimage, Image. Clarify: I want to stick this on the center of this so that something like...

Python - Opening and changing large text files

I have a ~600MB Roblox type .mesh file, which reads like a text file in any text editor. I have the following code below: mesh = open("file.mesh", "r").read() mesh = mesh.replace("[", "{").replace("]", "}").replace("}{", "},{") mesh = "{"+mesh+"}" f = open("p2t.txt", "w") f.write(mesh) It returns: Traceback (most recent call last): File...

How to check for multiple attributes in a list

I am making a TBRPG game using Python 2.7, and i'm currently making a quest system. I wanted to make a function that checks all of the quests in a list, in this case (quests), and tells you if any of of the quests in the list have the same...

group indices of list in list of lists

I am looking for an elegant solution for the following problem. I have a list of ints and I want to create a list of lists where the indices with the same value are grouped together in the order of the occurrences of said list. [2, 0, 1, 1, 3,...

Parse text from a .txt file using csv module

I have an email that comes in everyday and the format of the email is always the same except some of the data is different. I wrote a VBA Macro that exports the email to a text file. Now that it is a text file I want to parse the...

Django: html without CSS and the right text

First of all, this website that I'm trying to build is my first, so take it easy. Thanks. Anyway, I have my home page, home.html, that extends from base.html, and joke.html, that also extends base.html. The home page works just fine, but not the joke page. Here are some parts...

Sockets make no sense?

I'm using the 'ws' library for Node.js. I can write code that sends data from my server to my client, posting a date and time update, and closes the socket when I click a button; var wss = new WebSocketServer({server: server}); console.log("WebSocket server created"); wss.on('connection', function(socket) { // SEND DATE...

How to use template within Django template?

I have the django template like below: <a href="{{ }}" target="_blank"><h1 class="title">{{ mylist.0.title }}</h1></a> <p> {{ mylist.0.text|truncatewords:50 }}<br> ... (the actual template is quite big) It should be used 10 times on the same page, but 'external' html elements are different: <div class="row"> <div class="col-md-12 col-lg-12 block block-color-1"> *django...

Create an exe with Python 3.4 using cx_Freeze

I have found two other articles about this problem on Stack Exchange but none of them has a clear answer: is it possible to create a .exe of a Python 3.4 script? The only solution I found was to use cx_Freeze. I used it, and it indeed created an executable...

Should I use different WSAOVERLAPPED struct for WSASend and WSARecv?

I'm developing a server-client application using WinSock. Does using the same WSAOVERLAPPED with both WSASend and WSARecv works well? Should I use different WSAOVERLAPPED struct for WSASend and WSARecv?...

Pandas Dataframe Complex Calculation

I have the following dataframe,df: Year totalPubs ActualCitations 0 1994 71 191.002034 1 1995 77 2763.911781 2 1996 69 2022.374474 3 1997 78 3393.094951 I want to write code that would do the following: Citations of currentyear / Sum of totalPubs of the two previous years I want something to...

How do I read this list and parse it?

I'm using requests and the output I get from the sites API is a list, I've been stuck trying to parse it to get the data from it. I use r = requests.get(urlas, params=params) r.json() to get the data I want. Here is a snippet of the list [{'relation_type': None,...

sys.argv in a windows environment

I'm attempting to learn python using the book 'a byte of python'. The code: import sys print('the command line arguments are:') for i in sys.argv: print(i) print('\n\nThe PYTHONPATH is', sys.path, '\n') outputs: the command line arguments are: C:/Users/user/PycharmProjects/helloWorld/ The PYTHONPATH is ['C:\\Users\\user\\PycharmProjects\\helloWorld', 'C:\\Users\\user\\PycharmProjects\\helloWorld', 'C:\\Python34\\', 'C:\\Python34\\DLLs', 'C:\\Python34\\lib', 'C:\\Python34', 'C:\\Python34\\lib\\site-packages']...

trying to understand LSH through the sample python code

the concise python code i study for is here Question A @ line 8 i do not really understand the syntax meaning for "res = res << 1" for the purpose of "get_signature" Question B @ line 49 (SOLVED BY myself through another Q&A) "xor = r1^r2" does not really...

Inserting a variable in MongoDB specifying _id field

I want to insert a variable, say, a = {1:2,3:4} into my database with a particular id "56". It is very clear from the docs that I can do the following: db.testcol.insert({"_id": "56", 1:2, 3:4}) However, I cannot figure out any way to insert "a" itself, specifying an id. In...

Count function counting only last line of my list

Count function counting only last line of my list N = int(raw_input()) cnt = [] for i in range(N): string = raw_input() for j in range(1,len(string)): if string[j] =='K': cnt.append('R') elif string[j] =='R': cnt.append('R') if string[0] == 'k': cnt.append('k') elif string[0] == 'R': cnt.append('R') print cnt.count('R') if I am giving...

Sort when values are None or empty strings python

I have a list with dictionaries in which I sort them on different values. I'm doing it with these lines of code: def orderBy(self, col, dir, objlist): if dir == 'asc': sorted_objects = sorted(objlist, key=lambda k: k[col]) else: sorted_objects = sorted(objlist, key=lambda k: k[col], reverse=True) return sorted_objects Now the problem...

Python: can't access newly defined environment variables

I can't access my env var: import subprocess, os print os.environ.get('PATH') # Works well print os.environ.get('BONSAI') # doesn't work But the env var is well added in my /home/me/.bashrc: BONSAI=/home/me/Utils/bonsai_v3.2 export BONSAI And I can access this env var from a new terminal....

How to get the socket's specific error reason when POLLERR happens?

When POLLERR comes up after polling, how can I determine the specific error reason?

Spring-integration scripting with Python

I'm trying to use Python with spring-integration and jython-standalone-2.7.0: Here is my application context: <int:inbound-channel-adapter id="in" channel="exampleChannel" > <int:poller fixed-rate="1000" /> <int-script:script lang="python" location="script/" /> </int:inbound-channel-adapter> <int:channel id="exampleChannel" /> <int-ip:udp-outbound-channel-adapter id="udpOut" channel="exampleChannel" host="" port="11111" /> Here is my script in Python: print "Python"...

Socket.IO message doesn't update Angular variable

I have a client-server setup with AngularJS running on the client. // Server.js var io = require('')(server); io.on('connection', function (socket) { socket.on('message', function (msg) { //console.log(msg); console.log(msg); io.emit('message', msg); }); }); As observed, it essentially emits a message events with the data stored in the variable msg. And then...

how to enable a entry by clicking a button in Tkinter?

I need to activate many entries when button is clicked please do not write class based code, modify this code only because i need to change the whole code for the project as i did my whole project without classes from Tkinter import * import ttk x='disabled' def rakhi(): global...

ctypes error AttributeError symbol not found, OS X 10.7.5

I have a simple test function on C++: #include <stdio.h> #include <string.h> #include <stdlib.h> #include <locale.h> #include <wchar.h> char fun() { printf( "%i", 12 ); return 'y'; } compiling: gcc -o -shared -fPIC test.cpp and using it in python with ctypes: from ctypes import cdll from ctypes import c_char_p...

Calling function and passing arguments multiple times

I want to call the function multiple time and use it's returned argument everytime when it's called. For example: def myfunction(first, second, third): return (first+1,second+1,third+1) 1st call: myfunction(1,2,3) 2nd call is going to be pass returned variables: myfunction(2,3,4) and loop it until defined times. How can I do such loop?...

represent an index inside a list as x,y in python

I have a list which contains 1000 integers. The 1000 integers represent 20X50 elements of dimensional array which I read from a file into the list. I need to walk through the list with an indicator in order to find close elements to each other. I want that my indicator...

Matplotlib: Plot the result of an SQL query

from sqlalchemy import create_engine import _mssql from matplotlib import pyplot as plt engine = create_engine('mssql+pymssql://**:****@') connection = engine.connect() result = connection.execute('SELECT Campaign_id, SUM(Count) AS Total_Count FROM Impressions GROUP BY Campaign_id') for row in result: print row connection.close() The above code generates an array: (54ca686d0189607081dbda85', 4174469) (551c21150189601fb08b6b64', 182) (552391ee0189601fb08b6b73', 237304) (5469f3ec0189606b1b25bcc0',...

What type of database is the best for storing array or object like data [on hold]

I'm just curious what the best method would be if I'm trying to have a bot running on my Node server that I could play Blackjack against. But for multiple connected clients via sockets, each connected socket will have their own bot to play against but I need some way...

Python recursive function not recursing

I'm trying to solve a puzzle, which is to reverse engineer this code, to get a list of possible passwords, and from those there should be one that 'stands out', and should work function checkPass(password) { var total = 0; var charlist = "abcdefghijklmnopqrstuvwxyz"; for (var i = 0; i...

odoo v8 - Field(s) `arch` failed against a constraint: Invalid view definition

I want to create a new view with a DB-view. When I try to install my app, DB-view was created then I get error: 2015-06-22 12:59:10,574 11988 ERROR odoo Das Feld `datum` existiert nicht Fehler Kontext: Ansicht `overview.tree.view` [view_id: 1532, xml_id: k. A., model: net.time.overview, parent_id: k. A.] 2015-06-22...

How does the class_weight parameter in scikit-learn work?

I am having a lot of trouble understanding how the class_weight parameter in scikit-learn's Logistic Regression operates. The Situation I want to use logistic regression to do binary classification on a very unbalanced data set. The classes are labelled 0 (negative) and 1 (positive) and the observed data is in...

How to remove structure with python from this case?
How to remove "table" from HTML using python? I had case like this: paragraph = ''' <p>Lorem ipsum dolor sit amet, consectetur adipisicing elit. Quidem molestiae consequuntur officiis corporis sint.<br /><br /> <table> <tr> <td> text title </td> <td> text title 2 </td> </tr> </table> <p> lorem ipsum</p> ''' how...

Find the tf-idf score of specific words in documents using sklearn

I have code that runs basic TF-IDF vectorizer on a collection of documents, returning a sparse matrix of D X F where D is the number of documents and F is the number of terms. No problem. But how do I find the TF-IDF score of a specific term in...

Twilio Client Python not Working in IOS Browser

I have created a simple twilio client application to make phone calls from Web Browser to phones. I used a sample Flask app to generate a secure Capability Token and used twilio.min.js library to handle calls from my HTML. The functionality works fine in Computer Browsers ans Android Phone Browsers,...

Displaying a 32-bit image with NaN values (ImageJ)

I wrote a multilanguage 3-D image denoising ImageJ plugin that does some operations on an image and returns the denoised image as a 1-D array. The 1-D array contains NaN values (around the edges). The 1-D array is converted back into an image stack and displayed. It is simply black....

Identify that a string could be a datetime object

If I knew the format in which a string represents date-time information, then I can easily use datetime.datetime.strptime(s, fmt). However, without knowing the format of the string beforehand, would it be possible to determine whether a given string contains something that could be parsed as a datetime object with the...

Sum of two variables in RobotFramework

I have two variables: ${calculatedTotalPrice} = 42,42 ${productPrice1} = 43,15 I executed ${calculatedTotalPrice} Evaluate ${calculatedTotalPrice}+${productPrice1} I got 42,85,15 How can I resolve it?...