numpy,matplotlib,histogram , Logarithmic multi-sequenz plot with equal bar widths

Logarithmic multi-sequenz plot with equal bar widths


Tag: numpy,matplotlib,histogram

I have something like

import matplotlib.pyplot as plt
import numpy as np

a=[0.05, 0.1, 0.2, 1, 2, 3]
plt.hist((a*2, a*3), bins=[0, 0.1, 1, 10])
plt.gca().set_xscale("symlog", linthreshx=0.1)

which gives me the following plot: log histogram

As one can see, the bar width is not equal. In the linear part (from 0 to 0.1), everything is find, but after this, the bar width is still in linear scale, while the axis is in logarithmic scale, giving me uneven widths for bars and spaces in between (the tick is not in the middle of the bars).

Is there any way to correct this?


Inspired by I came up with the following solution:

import matplotlib.pyplot as plt
import numpy as np

d=[0.05, 0.1, 0.2, 1, 2, 3]

def LogHistPlot(data, bins):
    colors=("b", "r", "g")
    for i, d in enumerate(data):
        heights = np.histogram(d, bins)[0]
        left=np.array(range(len(heights))) + i*width, heights, width, color=colors[i], label=i)
        plt.xticks(range(len(bins)), bins)

LogHistPlot((d*2, d*3, d*4), [0, 0.1, 1, 10])

Which produces this plot: Correct logarithmic histogram with multiple datasets

The basic idea is to drop the plt.hist function, compute the histogram by numpy and plot it with Than, you can easily use a linear x-axis, which makes the bar width calculation trivial. Lastly, the ticks are replaced by the bin edges, resulting in the logarithmic scale. And you don't even have to deal with the symlog linear/logarithmic botchery anymore.


Finding indices of elements in vector

I have a vector orig which is a p dimensional vector Now, I sampled c elements from this vector (with replacement), lets call it sampled_vec. So basically,sampled_vec has elements from orig Now, I want to find out the indices of these elements (in sampled_vec) from orig. Probably, an example would...

Better image normalization with numpy

I already achieved the goal described in the title but I was wondering if there was a more efficient (or generally better) way to do it. First of all let me introduce the problem. I have a set of images of different sizes but with a width/height ratio less than...

Data Analysis and Scatter Plot different file and different column

i have a lot of files and i want to open, read data1.txt and data2.txt file and then data1.txt file 22. column "x_coordinate" and data2.txt file 23. column "y_coordinate" scatter plot. how can i ? with open('data1.txt') as f: with open('data2.txt') as f2: data1 = f.readlines() data2 = f2.readlines() f1.xArr=[]...

Python np.delete issue

A = np.array([[1,2,3],[3,4,5],[5,6,7]]) X = np.array([[0, 1, 0]]) for i in xrange(np.shape(X)[0]): for j in xrange(np.shape(X)[1]): if X[i,j] == 0.0: A = np.delete(A, (j), axis=0) I am trying to delete j from A if in X there is 0 at index j. I get IndexError: index 2 is out of...

Identifying the nearest grid point

I have three arrays lat=[15,15.25,15.75,16,....30] long=[91,91.25,91.75,92....102] data= array([[ 0. , 0. , 0. , ..., 0. , 0. , 0. ], [ 0. , 0. , 0. , ..., 0. , 0. , 0. ], [ 0. , 0. , 0. , ..., 0. , 0. , 0. ], ...,...

Optional parameter to theano function

I have a function f in theano which takes two parameters, one of them optional. When I call the function with the optional parameter being None the check inside f fails. This script reproduces the error: import theano import theano.tensor as T import numpy as np # function setup def...

Is it possible to specify the order of levels in Pandas factorize method?

I am using pandas to factorize an array consisting of two types of strings. I want to make sure that one of the strings "XYZ" is always coded as a 0 and the other string "ABC" is always coded as 1. Is it possible to do this? I looked up...

Numpy and dot products of multiple vector pairs: how can it be done?

I want to get dot product of N vector pairs (a_vec[i, :], b_vec[i, :]). a_vec has shape [N, 3], bvec has the same shape (N 3D vectors). I know that it can be easily done in cycle via function. But cannot it be done somehow simpler and faster?...

Sending live video frame over network in python opencv

I'm trying to send live video frame that I catch with my camera to a server and process them. I'm usig opencv for image processing and python for the language. Here is my code import cv2 import numpy as np import socket import sys import pickle cap=cv2.VideoCapture(0) clientsocket=socket.socket(socket.AF_INET,socket.SOCK_STREAM) clientsocket.connect(('localhost',8089))...

How to set first column to a constant value of an empty np.zeros numPy matrix?

I'm working on setting some boundary conditions for a water table model, and I am able to set the entire first row to a constant value, but not the entire first column. I am using np.zeros((11,1001)) to make an empty matrix. Does anyone know why I am successful at defining...

Calculating distances between unique Python array regions?

I have a raster with a set of unique ID patches/regions which I've converted into a two-dimensional Python numpy array. I would like to calculate pairwise Euclidean distances between all regions to obtain the minimum distance separating the nearest edges of each raster patch. As the array was originally a...

Parse multicolumn string using python

I'm trying to extract data from the text output of a cheminformatics program called NWChem, I've already extraced the part of the output that I'm interested in(the vibrational modes), here is the string that I have extracted: s = ''' 1 2 3 4 5 6 P.Frequency -0.00 0.00 0.00...

Insert a numpy array into another without having to worry about length

When doing: import numpy A = numpy.array([1,2,3,4,5,6,7,8,9,10]) B = numpy.array([1,2,3,4,5,6]) A[7:7+len(B)] = B # A[7:7+len(B)] has in fact length 3 ! we get this typical error: ValueError: could not broadcast input array from shape (6) into shape (3) This is 100% normal because A[7:7+len(B)] has length 3, and not a...

Find Maximum of 3D np.array along Axis = 0

I have a 3D numpy array that looks like this: X = [[[10 1] [ 2 10] [-5 3]] [[-1 10] [ 0 2] [ 3 10]] [[ 0 3] [10 3] [ 1 2]] [[ 0 2] [ 0 0] [10 0]]] At first I want the maximum along...

Why my mask failed in Python?

My code: #!/usr/bin/python import numpy as np names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe']) data = np.random.randn(7, 4) + 0.8 print (data) mask2= ((names != 'Joe') == 7.0) d2 = data[mask2] print (d2) d3 = data[names != 'Joe'] = 7.0 print (d3) Actually,my intention was to get the...

Correlate a single time series with a large number of time series

I have a large number (M) of time series, each with N time points, stored in an MxN matrix. Then I also have a separate time series with N time points that I would like to correlate with all the time series in the matrix. An easy solution is to...

Advanced indexing for sympy?

With numpy, I am able to select an arbitrary set of items from an array with a list of integers: >>> import numpy as np >>> a = np.array([1,2,3]) >>> a[[0,2]] array([1, 3]) The same does not seem to work with sympy matrices, as the code: >>> import sympy as...

How to show minor tick labels on log-scale with Matplotlib

Does anyone know how to show the labels of the minor ticks on a logarithmic scale with Python/Matplotlib? Thanks!...

Plotting two different arrays of different lengths

I have two arrays. One is the raw signal of length (1000, ) and the other one is the smooth signal of length (100,). I want to visually represent how the smooth signal represents the raw signal. Since these arrays are of different length, I am not able to plot...

what is the best method to extract highly correlated vaiables within the given threshold

I have one data frame and pairwise correlation were calculated >>> df1 = pd.read_csv("/home/zebrafish/Desktop/stack.csv") >>> df1.corr() GA PN PC MBP GR AP GA 1.000000 0.070541 0.259937 -0.452661 0.115722 0.268014 PN 0.070541 1.000000 0.512536 0.447831 -0.042238 0.263601 PC 0.259937 0.512536 1.000000 0.331354 -0.254312 0.958877 MBP -0.452661 0.447831 0.331354 1.000000 -0.467683 0.229870...

Python: matplotlib - probability mass function as histogram

I want to draw a histogram and a line plot at the same graph. However, to do that I need to have my histogram as a probability mass function, so I want to have on the y-axis a probability values. However, I don't know how to do that, because using...

Plotting non-numeric x-axis away from the y-axis

I am using matplotlib to graph a curve with a non-numeric x-axis. I would like there to be some space between the y-axis and the start of the plot. This code implements a subplot with a gap (on the left) & a subplot with a gap using the set_xlim method...

Removing repeated sub-lists from a list

I have a list as follows: l = [['A', 'C', 'D'], ['B', 'E'], ['A', 'C', 'D'], ['A', 'C', 'D'], ['B', 'E'], ['F']] The result should be: [['A', 'C', 'D'], ['B', 'E'], ['F']] The order of elements is also not important. I tried as: print list(set(l)) Does numpy has better way...

Rebin data and update imshow plot

I have a large data set I want to be able to "zoom" in on. What I really want is for the data to be rebinned based on the selection and then update the data in the graph. So the graph will show different limits but maintain the same resolution....

Index 3D aray by 2D array

I have a 3D color image im (shape 512 512 3), and a 2D array mask(512 512). I want to annotate this color image by the mask: im = im[mask>threshold] + im[mask<threshold] * 0.2 + (255,0,0) * [mask<threshold]. How do I write this in Python efficiently?...

represent an index inside a list as x,y in python

I have a list which contains 1000 integers. The 1000 integers represent 20X50 elements of dimensional array which I read from a file into the list. I need to walk through the list with an indicator in order to find close elements to each other. I want that my indicator...

What's the fastest way to compare datetime in pandas?

I have two big csv files with different number of rows which I am importing as follows: tdata = pd.read_csv(tfilepath, sep=',', parse_dates=['date_1']) print(tdata.iloc[:, [0,3]]) TBA date_1 0 0 2010-01-04 1 9 2010-01-05 2 0 2010-01-06 3 8 2010-01-07 4 0 2010-01-08 5 0 2010-01-09 pdata = pd.read_csv(pfilepath, sep=',', parse_dates=['date_2']) print(pdata.iloc[:,...

Matplotlib heatmap: Image rotated when heatmap plot over it

I am trying to plot a heatmap on top of an image. What I did: import matplotlib.pyplot as plt import numpy as np import numpy.random import urllib #downloading an example image urllib.urlretrieve("", "/tmp/cards.png") #reading and plotting the image im = plt.imread('/tmp/cards.png') implot = plt.imshow(im) #generating random data for the histogram...

How do I make each histogram bin show me the frequency of each action/event/item?

I want to plot a histrogram showing the frequencies of various actions at different intervals. I want to bin the occurence of actions into 10 minute intervals. binwidth = 10*60 #10 minutes times = array([ 1.43431325e+09, 1.43431325e+09, 1.43431329e+09, 1.43431330e+09, 1.43431333e+09, 1.43431334e+09, 1.43431345e+09, 1.43431346e+09, 1.43431346e+09, 1.43431346e+09, 1.43431349e+09, 1.43431350e+09, 1.43431350e+09, 1.43431351e+09, 1.43431354e+09,...

Need workaround to treat float values as tuples when updating “list” of float values

I am finding errors with the last line of the for loop where I am trying to update the curve value to a list containing the curve value iterations. I get errors like "can only concatenate tuple (not "float) to tuple" and "tuple object has no attribute 'append'". Does anyone...

How can I change the color of a grouped bar plot in Pandas?

I have this plot that you'll agree is not very pretty. Other plots I made so far had some color and grouping to them out of the box. I tried manually setting the color, but it stays black. What am I doing wrong? Ideally it'd also cluster the same tags...

How to fit datasets so that they share some (but not all) parameter values

Say I want to fit two arrays x_data_one and y_data_one with an exponential function. In order to do that I might use the following code (in which x_data_one and y_data_one are given dummy definitions): import numpy as np from scipy.optimize import curve_fit def power_law(x, a, b, c): return a *...

Make a heatmap with a specified discrete color mapping with matplotlib in python

I would like to make a heatmap for a matrix of data such that all positions that are 1 will be red, all positions that are 2 will be white, and etc. with an arbitrary specification. Ideally this should handle the case where all of the values are the same,...

Inconsistency between gaussian_kde and density integral sum

Can one explain why after estimation of kernel density d = gaussian_kde(g[:,1]) And calculation of integral sum of it: x = np.linspace(0, g[:,1].max(), 1500) integral = np.trapz(d(x), x) I got resulting integral sum completely different to 1: print integral Out: 0.55618 ...

multiple iteration of the same list

I have one list of data as follows: from shapely.geometry import box data = [box(1,2,3,4), box(4,5,6,7), box(1,2,3,4)] sublists = [A,B,C] The list 'data' has following sub-lists: A = box(1,2,3,4) B = box(4,5,6,7) C = box(1,2,3,4) I have to check if sub-lists intersect. If intersect they should put in one tuple;...

Factorial of a matrix elementwise with Numpy

I'd like to know how to calculate the factorial of a matrix elementwise. For example, import numpy as np mat = np.array([[1,2,3],[2,3,4]]) np.the_function_i_want(mat) would give a matrix mat2 such that mat2[i,j] = mat[i,j]!. I've tried something like np.fromfunction(lambda i,j: np.math.factorial(mat[i,j])) but it passes the entire matrix as argument for np.math.factorial....

How to surround curves with annotation in matplotlib?

I have a python code that produces the following figures. I would like to do an annotation with ellipses to surround the curves as the figure mentions. N.B. The figure is produced using MATLAB and I cannot do it in python-matplotlib. Thanks....

Will numpy.roots() ever return n different floats when a polynomial only has

I think the title says it all, but just to be specific, say I have some list of numbers named "coeffs". Assuming the polynomial with said coefficients has exactly k unique roots, will the following code ever set number_of_unique_roots to be a number greater than k? import numpy as np...

Linear programming with scipy.optimize.linprog

I've just check the simple linear programming problem with scipy.optimize.linprog: 1*x[1] + 2x[2] -> max 1*x[1] + 0*x[2] <= 5 0*x[1] + 1*x[2] <= 5 1*x[1] + 0*x[2] >= 1 0*x[1] + 1*x[2] >= 1 1*x[1] + 1*x[2] <= 6 And got the very strange result, I expected that x[1]...

Matplotlib figure not updating on data change

I'm implementing an image viewer using matplotlib. The idea is that changes being made to the image (such as filter application) will update automatically. I create a Figure to show the inital image and have added a button using pyQt to update the data. The data does change, I have...

Object-oriented access to fill_between shaded region in matplotlib

I'm trying to get access to the shaded region of a matplotlib plot, so that I can remove it without doing plt.cla() [since cla() clears the whole axis including axis label too] If I were plotting I line, I could do: import matplotlib.pyplot as plt ax = plt.gca() ax.plot(x,y) ax.set_xlabel('My...

manipulating top and bottom margins in pyplot horizontal stacked bar chart (barh)

I'm trying to plot a horizontal stacked bar chart but get annoyingly big margins on top and bottom. I would like to get rid of that or control the size. Here is an example code and fig: from random import random Y = ['A', 'B', 'C', 'D', 'E','F','G','H','I','J', 'K'] y_pos...

Array stacking/ concatenation error in python

I am trying to concatenate two arrays: a and b, where a.shape (1460,10) b.shape (1460,) I tried using hstack and concatenate as: np.hstack((a,b)) c=np.concatenate(a,b,0) I am stuck with the error ValueError: all the input arrays must have same number of dimensions Please guide me for concatenation and generating array c...

Read CSV and plot colored line graph

I am trying to plot a graph with colored markers before and after threshold value. If I am using for loop for reading the parsing the input file with time H:M I can plot and color only two points. But for all the points I cannot plot. Input akdj 12:00...

Memory Issue for Array Conversion

If we convert a large array containing 0 and 1 as boolean to another array containing 0 and 1 as float, the size of array would be almost 10 times larger. What is the best way (if any) to handle this issue in python (Numpy) if we need this conversion?

Read One Input File and plot multiple

I am trying to read one input file of below format. Where Col[1] is x axis and Col[2] is y axis and col[3] is some name. I need to plot multiple line graphs for separate names of col[3]. Eg: Name sd with x,y values will have one line graph and...

Matplotlib: Plot the result of an SQL query

from sqlalchemy import create_engine import _mssql from matplotlib import pyplot as plt engine = create_engine('mssql+pymssql://**:****@') connection = engine.connect() result = connection.execute('SELECT Campaign_id, SUM(Count) AS Total_Count FROM Impressions GROUP BY Campaign_id') for row in result: print row connection.close() The above code generates an array: (54ca686d0189607081dbda85', 4174469) (551c21150189601fb08b6b64', 182) (552391ee0189601fb08b6b73', 237304) (5469f3ec0189606b1b25bcc0',...