I build a scipy sparse matrix S with sklearn.preprocessing.OneHotEncoder(). The matrix S has 10^6 rows for 500 columns. I also have a numpy array A with 10^6 values as follows: A = [1,1,2,2,2,3,4,5,6,6,7,8,8,8,...] I want to do a group by on the sparse matrix S following the groups written in...

I am trying to replicate some of the functionality of Matlab's interp2. I know somewhat similar questions have been asked before, but none apply to my specific case. I have a distance map (available at this Google drive location): https://drive.google.com/open?id=0B6acq_amk5e3X0Q5UG1ya1VhSlE&authuser=0 Values are normalized in the range 0-1. Size is 200...

I am trying to figure out why the following code returns different values for the sample's kurtosis: import pandas import scipy e = pandas.DataFrame([1, 2, 3, 4, 5, 4, 3, 2, 1]) print "pandas.rolling_kurt:\n", pandas.rolling_kurt(e, window=9) print "\nscipy.stats.kurtosis:", scipy.stats.kurtosis(e) The output I am getting: pandas.rolling_kurt: 0 0 NaN 1 NaN...

I need to generate a sparse random matrix in Python with all values in the range [-1,1] with uniform distribution. What is the most efficient way to do this? I have a basic sparse random matrix: from scipy import sparse from numpy.random import RandomState p = sparse.rand(10, 10, 0.1, random_state=RandomState(1))...

I am trying to find the root y of a function called f using Python. Here is my code: def f(y): w,p1,p2,p3,p4,p5,p6,p7 = y[:8] t1 = w - 0.500371726*(p1**0.92894164) - (-0.998515304)*((1-p1)**1.1376649) t2 = w - 8.095873128*(p2**0.92894164) - (-0.998515304)*((1-p2)**1.1376649) t3 = w - 220.2054377*(p3**0.92894164) - (-0.998515304)*((1-p3)**1.1376649) t4 = w - 12.52760758*(p4**0.92894164)...

I have a raster with a set of unique ID patches/regions which I've converted into a two-dimensional Python numpy array. I would like to calculate pairwise Euclidean distances between all regions to obtain the minimum distance separating the nearest edges of each raster patch. As the array was originally a...

I am trying to find the root y of a function called f using Python. Here is my code: def f(y): w,p1,p2,p3,p4,p5,p6 = y[:7] t1 = w - 0.99006633*(p1**0.5) - (-1.010067)*((1-p1)) t2 = w - 22.7235687*(p2**0.5) - (-1.010067)*((1-p2)) t3 = w - 9.71323491*(p3**0.5) - (-1.010067)*((1-p3)) t4 = w - 2.43852877*(p4**0.5)...

I have a project where I'm sampling analog data and attempting to analyze with matplotlib. Currently, my analog data source is a potentiometer hooked up to a microcontroller, but that's not really relevant to the issue. Here's my code arrayFront = RunningMean(array(dataFront), 15) arrayRear = RunningMean(array(dataRear), 15) x = linspace(0,...

I'm trying to install Scikit-Learn on a 64-bit Red Hat Enterprise 6.6 server on which I don't have root privileges. I've done a fresh installation of Python 2.7.9, Numpy 1.9.2, Scipy 0.15.1, and Scikit-Learn 0.16.1. The Atlas BLAS installation on the server is 3.8.4. I can install scikit-learn, but when...

At the moment, I am trying to solve a system of coupled ODEs with complex terms. I am using scipy.integrate.ODE, I have successfully solved a previous problem involving a coupled ODE system with only real terms. In that case I used odeint, which is not suitable for the problem I...

I followed the tutorial here in order to implement Logistic Regression using theano. The aforementioned tutorial uses SciPy's fmin_cg optimisation procedure. Among the important argument to the aforementioned function are: f the object/cost function to be minimised, x0 a user supplied initial guess of the parameters, fprime a function which...

My input file is in ijv/coo/triplet format with string column names, eg: Apple,Google,1 Apple,Banana,5 Microsoft,Orange,2 Should result in this 2x3 matrix: [[1,5,0], [0,0,2]] I can read it manually by putting the column names to dictionaries and create a scipy sparse coo_matrix with that dict mapping to IDs. I would like...

I've been trying to scipy.mstats.zscore a dataset that is intentionally organized into a nested list, and it gives: TypeError: unsupported operand type(s) for /: 'list' and 'long' which probably suggests that scipy.stats doesn't work for nested lists. What can I do about it? Does a for loop affect the nature...

just working through the example for numpy.save -- http://docs.scipy.org/doc/numpy/reference/generated/numpy.save.html Examples from tempfile import TemporaryFile outfile = TemporaryFile() x = np.arange(10) np.save(outfile, x) AFTER this command (highlighted), why i could not find the output file called "outfile" in the current directory? sorry this may sound stupid outfile.seek(0) # Only needed here...

I have two numpy square matrices called M1 and M2 as: M1 = np.matrix('0 1 2 3; 4 5 6 7; 8 9 10 11; 12 13 14 15') M2 = np.matrix('100 200; 300 400') I would like to group 2x2 elements of M1 assigning those elements to the corresponding...

What is the best way to find the number of points (rows) that are within a distance of a given point in this pandas dataframe: x y 0 2 9 1 8 7 2 1 10 3 9 2 4 8 4 5 1 1 6 2 3 7 10...

I'm trying to interpolate my set of data (first columnt is the time, third columnt is the actual data): import numpy as np import matplotlib.pyplot as plt from scipy.interpolate import interp1d data = np.genfromtxt("data.csv", delimiter=" ") x = data[:, 0] y = data[:, 2] xx = np.linspace(x.min(), x.max(), 1000) y_smooth...

I am trying to plot a smooth curve using the x,y cordinates above. Howsoever the graph which i get is out of the range of my data. The snippet of my code is here. import numpy as np import matplotlib.pyplot as plt from scipy.interpolate import spline ylist = [0.36758563074352546, 0.27634194831013914,...

Given that the fitting function is of type: I intend to fit such function to the experimental data (x,y=f(x)) that I have. But then I have some doubts: How do I define my fitting function when there's a summation involved? Once the function defined, i.e. def func(..) return ... is...

If I want to interpolate the data below: from scipy.interpolate import RectBivariateSpline, interp2d import numpy as np x1 = np.linspace(0,5,10) y1 = np.linspace(0,20,20) xx, yy = np.meshgrid(x1, y1) z = np.sin(xx**2+yy**2) with interp2d this works: f = interp2d(x1, y1, z, kind='cubic') however if I use RectBivariateSpline with the same x1,...

I'm writing code to evaluate the mean of functions it is passed, but where the functional form is not known beforehand. I have code below that does work, using scipy.integrate.quad, but it is rather slow. I was wondering does anybody know of a faster way? import numpy as np from...

I am trying to read a fortran file with headers as integers and then the actual data as 32 bit floats. Using numpy's fromfile('mydatafile', dtype=np.float32) it reads in the whole file as float32 but I need the headers to be in int32 for my output file. Using scipy's FortranFile it...

I wish to normalize each row of a sparse scipy matrix, obtained from a networkx directed graph. import networkx as nx import numpy as np G=nx.random_geometric_graph(10,0.3) M=nx.to_scipy_sparse_matrix(G, nodelist=G.nodes()) from __future__ import division print(M[3]) (0, 1) 1 (0, 5) 1 print(M[3].multiply(1/M[3].sum())) (0, 1) 0.5 (0, 5) 0.5 this is ok, I...

The maxwell-boltzmann distribution is given by . The scipy.stats.maxwell distribution uses loc and scale parameters to define this distribution. How are the parameters in the two definitions connected? I also would appreciate if someone could tell in general how to determine the relation between parameters in scipy.stats and their usual...

This is a very simple question. For SciPy sparse matrices like coo_matrix, how does one access individual elements? To give an analogy to Eigen linear algebra library. One can access element (i,j) using coeffRef as follows: myMatrix.coeffRef(i,j) ...

I search to compute logarithm of a matrix which is given by logm (scipy.linalg) I wrote this code in Python : from scipy.linalg import logm, expm from Bio import SeqIO import numpy as np from numpy.linalg import svd from numpy import eye np.set_printoptions(linewidth=10000) my_file = open("matrice/mammifere_muscle.list.imv") #read two lines of...

Suppose I have an array of the shape (m,n,3), where m and n refers to the y and x coordinates of a point, and the 3 numbers in each point refer to a three-dimensional vector. (A similar situation is an image with height m and width n, and 3 refers...

My basic problem is that I want to install scipy on a Window's machine for Python 3 and use Pycharm as my development environment. The suggestion from the Scipy Documentation as well as several StackOverflow posts (Installing NumPy and SciPy on 64-bit Windows (with Pip), Trouble installing SciPy on windows,...

I apologize in advance if this is poorly worded. If I have a stdDev = 1, mean = 0, scipy.stats.cdf(-1, loc = 0, scale = 1) will give me the probability that a normally distributed random variable will be <= -1, and that is 0.15865525393145707. Given 0.15865..., how do I...

I have one data frame and pairwise correlation were calculated >>> df1 = pd.read_csv("/home/zebrafish/Desktop/stack.csv") >>> df1.corr() GA PN PC MBP GR AP GA 1.000000 0.070541 0.259937 -0.452661 0.115722 0.268014 PN 0.070541 1.000000 0.512536 0.447831 -0.042238 0.263601 PC 0.259937 0.512536 1.000000 0.331354 -0.254312 0.958877 MBP -0.452661 0.447831 0.331354 1.000000 -0.467683 0.229870...

Suppose I have an MxNx3 array A, where the first two indexes refer to the coordinates a point, and the last index (the number '3') refers to the three components of a vector. e.g. A[4,7,:] = [1,2,3] means that the vector at point (7,4) is (1,2,3). Now I need to...

I am trying to use sobel filter on an image of a wall but it doesn't work. My code is : im=scipy.misc.imread('IMG_1479bis.JPG') im = im.astype('int32') dx=ndimage.sobel(im,1) dy=ndimage.sobel(im,0) mag=np.hypot(dx,dy) mag*=255.0/np.max(mag) cv2.imshow('sobel.jpg', mag) I really don't understand where is my mistake. Any help would be appreciated ! Thanks in advance !...

I have some t-values and degrees of freedom and want to find the p-values from them (it's two-tailed). In the real world I would use a t-test table in the back of a Statistics textbook; however, I am using stdtr or stats.t.sf function in python. Both of them work fine...

I have a data frame that I import using df = pd.read_csv('my.csv',sep=','). In that CSV file, the first row is the column name, and the first column is the observation name. I know how to select a subset of the Panda dataframe, using: df.iloc[:,1::] which gives me only the numeric...

I'm very new to Python but I'm trying to produce a 2D Gaussian fit for some data. Specifically, stellar fluxes linked to certain positions in a coordinate system/grid. However not all of the positions in my grid have corresponding flux values. I don't really want to set these values to...

I would like to use Delaunay Triangulation in Python to interpolate the points in 3D. What I have is # my array of points points = [[1,2,3], [2,3,4], ...] # my array of values values = [7, 8, ...] # an object with triangulation tri = Delaunay(points) # a set...

I have trained a machine learning binary classifier on a 100x85 array in sklearn. I would like to be able to vary 2 of the features in the array, say column 0 and column 1, and generate contour or surface graph, showing how the predicted probability of falling in one...

I have a function called calculate_cost which calculates the performance of supplier for different S_range (stocking level). The function works but the plots are not smooth, is there a way to smooth it in Python? import numpy import scipy.stats import scipy.integrate import scipy.misc import matplotlib import math import pylab from...

I am currently using Scipy 0.7.2 with Numpy 1.4.1. My Python version is 2.6.6. I have written a simple code to read a coo sparse matrix from a .mtx file as follows: data = scipy.io.mmread('matrix.mtx') On running the code, I got the following error: Traceback (most recent call last): File...

I have Python 2.7.9 on windows 7 64-bits. I'm trying to install scipy using pip. I used pip install scipy but I get the following error : Command "C:\Python27\python.exe -c "import setuptools, tokenize;__file__='c:\\us ers\\admin\\appdata\\local\\temp\\pip-build-xpl5cw\\scipy\\setup.py';exec(compil e(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file __, 'exec'))" install --record c:\users\admin\appdata\local\temp\pip-b68pfc-reco rd\install-record.txt --single-version-externally-managed --compile" failed with error...

Is there any way to do anomaly detection in dataset using recursive curve fitting and removing points having the most mean square error with respect to the curve, upto an acceptable threshold? I am using the scipy.optimize.curve_fit function for python 2.7, and I need to work with python preferably. ...

I would like to solve a nonlinear first order differential equation using Python. For instance, df/dt = f**4 I wrote the following program, but I have an issue with matplotlib, so I don't know if the method I used with scipy is correct. from scipy.integrate import odeint import numpy as...

I want to find the 1st and 2nd largest eigenvalues of a big, sparse and symmetric matrix (in python). scipy.sparse.linalg.eigsh with k=2 gives the second largest eigenvalue with respect to the absolute value - so it's not a good solution. In addition, I can't use numpy methods because my...

I created this toy problem that reflects my much bigger problem: import numpy as np ind = np.ones((3,2,4)) # shape=(3L, 2L, 4L) dist = np.array([[0.1,0.3],[1,2],[0,1]]) # shape=(3L, 2L) ans = np.array([np.dot(dist[i],ind[i]) for i in xrange(dist.shape[0])]) # shape=(3L, 4L) print ans """ prints: [[ 0.4 0.4 0.4 0.4] [ 3. 3....

I think the title says it all, but just to be specific, say I have some list of numbers named "coeffs". Assuming the polynomial with said coefficients has exactly k unique roots, will the following code ever set number_of_unique_roots to be a number greater than k? import numpy as np...

I want to check if two csr_matrix are equal. If I do: x.__eq__(y) I get: raise ValueError("The truth value of an array with more than one " ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all(). This, However, works well: assert...

I'm trying to create a class which uses fftconvolve from scipy.signal to convolve some data with a gaussian inside a method of the class instance. However every time create an instance and call the method enlarge_smooth (which happens upon right arrow key press), I get an error from fftconvolve stating:...

Say I want to fit two arrays x_data_one and y_data_one with an exponential function. In order to do that I might use the following code (in which x_data_one and y_data_one are given dummy definitions): import numpy as np from scipy.optimize import curve_fit def power_law(x, a, b, c): return a *...

Say my list is the following: ['cat','elephant'] How can I efficiently convert my list into an array of boolean elements, where each index represents whether a given animal (of 10^n animals) is present in my list? That is, if cat is present index x is true and if elephant is...

I'm running this SVD solver from scipy with the below code: import numpy as np from scipy.sparse.linalg import svds features = np.arange(9,dtype=np.float64).reshape((3,3)) for i in range(10): _,_,V = svds(features,2) print i,np.mean(V) I expected the printed mean value to be the same each time, however it changes and seems to cycle...

It seems that if a file is called io.py and it imports scipy.ndimage, the latter somehow ends up failing to find its own submodule, also called io: $ echo "import scipy.ndimage" > io.py $ python io.py Traceback (most recent call last): File "io.py", line 1, in <module> import scipy.ndimage File...

So, I have a set of texts I'd like to do some clustering analysis on. I've taken a Normalized Compression Distance between every text, and now I have basically built a complete graph with weighted edges that looks something like this: text1, text2, 0.539 text2, text3, 0.675 I'm having tremendous...

I have a series of methods that take an image 89x22 pixels (although the size, theoretically, is irrelevant) and fits a curve to each row of pixels to find the location of the most significant signal. At the end, I have a list of Y-values, one for each row of...

I have been trying to use QuTiP to solve a quantum mechanics matrix differential equation (a Lindblad equation). Here is the code: from qutip import * from matplotlib import * import numpy as np hamiltonian = np.array([[215, -104.1, 5.1, -4.3 ,4.7,-15.1 ,-7.8 ], [-104.1, 220.0, 32.6 ,7.1, 5.4, 8.3, 0.8],...

I'm trying to make a Voronoi plot update in real time as the generating points change position. My problem is how to reuse the same figure, since currently I get a new window each time I call voronoi_plot_2d. See code: #!/usr/bin/env python import numpy as np import time from scipy.spatial...

I am testing scipy.spatial.Delaunay and not able to solve two issues: the mesh has errors the mesh doesn't include all points Code and image of plot: import numpy as np from scipy.spatial import Delaunay,delaunay_plot_2d import matplotlib.pyplot as plt #input_xyz.txt contains 1000 pts in "X Y Z" (float numbers) format points...

I am experimenting with scipy.spatial's implementation of Qhull's Delaunay triangulation. Is it possible to generate the triangulation in a manner that preserves the edges defined by the input vertices? (EDIT: i.e. a constrained Delaunay triangulation.) As can be done with the triangle package for Python. For example, in the picture...

I am trying to use python to find the values of three unknowns (x,y,z) in a nonlinear equation of the type: g(x) * h(y) * k(z) = F where F is a vector with hundreds of values. I successfully used scipy.optimize.minimize where F only had 3 values, but that failed...

The full mathematical problem is here. Briefly I want to integrate a function with a double integral. The inner integral has boundaries 20 and x-2, while the outer has boundaries 22 and 30. I know that with Scipy I can compute the double integral with scipy.integrate.nquad. I would like to...

I am trying to integrate over the sum of two 'half' normal distributions. scipy.integrate.quad works fine when I try to integrate over a small range but returns 0 when I do it for large ranges. Here's the code: mu1 = 0 mu2 = 0 std1 = 1 std2 = 1...

I want to get dot product of N vector pairs (a_vec[i, :], b_vec[i, :]). a_vec has shape [N, 3], bvec has the same shape (N 3D vectors). I know that it can be easily done in cycle via numpy.dot function. But cannot it be done somehow simpler and faster?...

I want to configurate the QN-Minimizer from Stanford Core NLP Lib to get nearly similar optimization results as scipy optimize L-BFGS-B implementation or get a standard L-BFSG configuration that is suitable for the most things. I set the standard paramters as follow: The python example I want to copy: scipy.optimize.minimize(neuralNetworkCost,...

This question already has an answer here: Adding a column of zeroes to a csr_matrix 2 answers I have a function that takes a csr_matrix and does some calculations on it. The behavior of these calculation requires the shape of this matrix to be specific (say NxM). The input...

I've just check the simple linear programming problem with scipy.optimize.linprog: 1*x[1] + 2x[2] -> max 1*x[1] + 0*x[2] <= 5 0*x[1] + 1*x[2] <= 5 1*x[1] + 0*x[2] >= 1 0*x[1] + 1*x[2] >= 1 1*x[1] + 1*x[2] <= 6 And got the very strange result, I expected that x[1]...

I may be misunderstanding how broadcasting works in Python, but I am still running into errors. scipy offers a number of "special functions" which take in two arguments, in particular the eval_XX(n, x[,out]) functions. See http://docs.scipy.org/doc/scipy/reference/special.html My program uses many orthogonal polynomials, so I must evaluate these polynomials at distinct...

I have compared the different methods for convolving/correlating two signals using numpy/scipy. It turns out that there are huge differences in speed. I compared the follwing methods: correlate from the numpy package (np.correlate in plot) correlate from the scipy.signal package (sps.correlate in plot) fftconvolve from scipy.signal (sps.fftconvolve in plot) Now...

It is simple, I know but I have little understanding of convex optimization yet Problem definition: Objective function is II b - Aw II norm 2 a vector of unknown [w1, w2, ..., wn] a data matrix A (m x n), each row has n components([ai1, ai2, ..., ain]), m...

This is my first time posting on stackoverflow, so if I don't use the correct stackoverflow etiquette, I'm sorry. Before I go into my problem, I've searched the relevant threads on stackoverflow with the same problem: input/output error in scipy.optimize.fsolve Python fsolve() complains about shape. Why? fsolve - mismatch between...

i am trying to find the item belongs to which category based on mode by using below pandas data frame data ITEM CATEGORY 1 red saree actual 2 red saree actual 3 glass lbh 4 glass lbh 5 red saree actual 6 red saree lbh 7 glass actual 8 bottle...

I have been working on loading some files in python, and then once the files are loaded I want to export them into a .mat file and do the rest of the processing in MATLAB. I understand that I can do this with: import scipy.io as sio # load some...

Good day. I calculating following integral using scipy: from scipy.stats import norm def integrand(y, x): # print "y: %s x: %s" % (y,x) return (du(y)*measurment_outcome_belief(x, 3)(y))*fv_belief(item.mean, item.var)(x) return dblquad( integrand, norm.ppf(0.001, item.mean, item.var), norm.ppf(0.999, item.mean, item.var), lambda x: norm.ppf(0.001, x, 3), lambda x: norm.ppf(0.999, x, 3))[0] I have belief state...

Is there a way to get scipy's interp1d (in linear mode) to return the derivative at each interpolated point? I could certainly write my own 1D interpolation routine that does, but presumably scipy's is internally in C and therefore faster, and speed is already a major issue. I am ultimately...

I'm trying to fit a surface model to a 3D data-set (x,y,z) using matplotlib. Where z = f(x,y). So, I'm going for the quadratic fitting with equation: f(x,y) = ax^2+by^2+cxy+dx+ey+f So far, I have successfully plotted the 3d-fitted-surface using least-square method using: # best-fit quadratic curve A = np.c_[np.ones(data.shape[0]), data[:,:2],...

I've got a list of sorted samples. They're sorted by their sample time, where each sample is taken one second after the previous one. I'd like to find the minimum value in a neighborhood of a specified size. For example, given a neighborhood size of 2 and the following sample...

I am trying to maximize x^(0.5)y^(0.5) st. x+y=10 using scipy. I can't figure out which method to use. I would really appreciate it if someone could guide me on this. ...

I have a series of images which serve as my raw data which I am trying to prepare for publication. These images have a series of white specks randomly throughout which I would like to replace with the average of some surrounding pixels. I cannot post images, but the following...

I am trying to do some statistics in Python. I have data with several missing values, filled with np.nan, and I am not sure should I remove it manually, or scipy can handle it. So I tried both: import scipy.stats, numpy as np a = [0.75, np.nan, 0.58337, 0.75, 0.75,...

I am using the following code to perform t-test: def t_stat(na,abar,avar,nb,bbar,bvar): logger.info("T-test to be performed") logger.info("Set A count = %f mean = %f variance = %f" % (na,abar,avar)) logger.info("Set B count = %f mean = %f variance = %f" % (nb,bbar,bvar)) adof = na - 1 bdof = nb -...

I have a few large sets of data which I have used to create non-standard probability distributions (using numpy.histogram to bin the data, and scipy.interpolate's interp1d function to interpolate the resulting curves). I have also created a function which can sample from these custom PDFs using the scipy.stats package. My...

After interpolating data to a target grid i am not able to reshape my data to to match the original shape. The original shape of my data is 900x900 being rows x columns. After the interpolation i have an 1-D array of interpolated values in the new size of the...

I've solved a simple LP problem where all constraints are "less than or equal to". I used scipy.optimize.linprog for those. The problem is when one or more of the constraints equation is "greater than or equal to". How do I specify that? I need to use the two-phase approach provided...

Using ipython notebook, a pandas dataframe has 4 columns: numerator1, numerator2, denominator1 and denominator2. Without iterating through each record, I am trying to create a fifth column titled FishersExact. I would like the value of the column to store the tuple returned by scipy.stats.fisher_exact using values (or some derivation of...

Recently I've begun to receive SyntaxErrors when running parallel neural-network simulations using brian2. These are being raised by calls to scipy.weave.inline when it tries to evaluate lines of code in a cache file. The full description of the problem and my guess at its cause is here. And here is...

How to make a scipy sparse matrix from a list of lists with integers (or strings)? [[1,2,3], [1], [1,4,5]] Should become: [[1, 1, 1, 0, 0], [1, 0, 0, 0, 0], [1, 0, 0, 1, 1]] But then in scipy's compressed sparse format?...

I was trying to filter a signal using the scipy module of python and I wanted to see which of lfilter or filtfilt is better. I tried to compare them and I got the following plot from my mwe import numpy as np import scipy.signal as sp import matplotlib.pyplot as...

I have this numpy array with points, something like [(x1,y1), (x2,y2), (x3,y3), (x4,y4), (x5,y5)] What I would like to do, is to get an array of all minimum distances. So for point 1 (x1, y1), I want the distance of the point closest to it, same for point 2 (x2,y2),...

If one uses the scipy.mstats.theilslopes routine on a data set with missing values, the results of the lower and upper bounds for the slope estimate are incorrect. The upper bound is often/always(?) NaN, while the lower bound is simply wrong. This happens, because the theilslopes routine computes an index into...

Is it possible to do the following in scikit-learn? We train an estimator A using the given mapping from features to targets, then we use the same data (or mapping) to train another estimator B, then we use outputs of the two trained estimators (A and B) as inputs for...

I am trying to figure out how to speed up the following Python code. Basically, the code builds the matrix of outter products of a matrix C and stores it as block diagonal sparse matrix. I use numpy.repeat() to build indices into the block diagonal. Profiling the code revealed that...

I have 3 arrays: array "words" of pairs ["id": "word"] by the length 5000000, array "ids" of unique ids by the length 13000 and array "dict" of unique words (dictionary) by the length 500000. This is my code: matrix = sparse.lil_matrix((len(ids), len(dict))) for i in words: matrix[id.index(i['id']), dict.index(i['word'])] += 1.0...

I have the following code snippet from SciPy: resDat = data[scipy.random.randint(0,N,(N,))] What I try to understand is how and why this line works. the randint function seems to return a list of N integer values in the range of the data indizes, so what I interpret this line of code...

I am trying to smoothen a scatter plot shown below using SciPy's B-spline representation of 1-D curve. The data is available here. The code I used is: import matplotlib.pyplot as plt import numpy as np from scipy import interpolate data = np.genfromtxt("spline_data.dat", delimiter = '\t') x = 1000 / data[:,...

I'm trying to learn more about image processing in Python and, as part of the process, am doing some of the exercises in a book that I am reading. In one exercise I'm trying to do kmeans clustering of average pixel color in an image. The code below is pretty...

I am wondering if there is any difference (advantage/disadvantage) of using .toarray() vs. .todense() on sparse NumPy arrays. E.g., import scipy as sp import numpy as np sparse_m = sp.sparse.bsr_matrix(np.array([[1,0,0,0,1], [1,0,0,0,1]])) %timeit sparse_m.toarray() 1000 loops, best of 3: 299 µs per loop %timeit sparse_m.todense() 1000 loops, best of 3: 305...

I have two arrays that have the shapes N X T and M X T. I'd like to compute the correlation coefficient across T between every possible pair of rows n and m (from N and M, respectively). What's the fastest, most pythonic way to do this? (Looping over N...

I have two dictionaries that I use as sparse vectors: dict1 = {'a': 1, 'b': 4} dict2 = {'a': 2, 'c': 2} I wrote my own __add__ function to get this desired result: dict1 = {'a': 3, 'b': 4, 'c': 2} It is important that I know the strings 'a',...

I'm trying to efficiently select a random non-zero column index for each row of a large sparse SciPy matrix. I can't seem to figure out a vectorized way of doing it, so I'm resorting to a very slow Python loop: random_columns = np.zeros((sparse_matrix.shape[0])) for i,row in enumerate(sparse_matrix): random_columns[i] = (np.random.choice(row.nonzero()[1]))...

Unfortunately the power fit with scipy does not return a good fit. I tried to use p0 as an input argument with close values which did not help. I would be very glad if someone could point out to me my problem? # Imports from scipy.optimize import curve_fit import numpy...

I am running some goodness of fit tests using scipy.stats in Python 2.7.10. for distrName in distrNameList: distr = getattr(distributions, distrName) param = distr.fit(sample) pdf = distr.pdf(???) What do I pass into distr.pdf() to get the values of the best-fit pdf on the list of sample points of interest, called...

I need to solve a system of linear equations Lx=b, where x is always a vector (3x1 array), L is an Nx3 array, and b is an Nx1 vector. N usually ranges from 4 to something like 10. I have no problems solving this using scipy.linalg.lstsq(L,b) However, I need to...