FAQ Database Discussion Community


Pandas Dataframe Complex Calculation

python,python-2.7,pandas,dataframes
I have the following dataframe,df: Year totalPubs ActualCitations 0 1994 71 191.002034 1 1995 77 2763.911781 2 1996 69 2022.374474 3 1997 78 3393.094951 I want to write code that would do the following: Citations of currentyear / Sum of totalPubs of the two previous years I want something to...

Calculate days before next third Friday in a month for a Dataframe

python,pandas,dataframes
I have a time series in pandas Dataframe which looks like following: time A B 2012-06-11 09:25:00.005001 2572.4 2.589 2012-06-11 09:30:00.005004 2573.2 2.592 2012-06-11 09:31:00.005000 2572.6 2.592 2012-06-11 09:32:00.004996 2572.2 2.591 2012-06-11 09:33:00.005003 2570.0 2.589 2012-06-18 09:34:00.004999 2571.2 2.590 2012-06-18 09:35:00.004996 2572.0 2.591 2012-06-18 09:36:00.005002 2572.2 2.590 Is there a...

pandas groupby access last group

python,pandas,grouping,dataframes
I have a pandas DataFrame looking like this: date info A x A y B z B x C y I only want to know the last date. In this case it is C. I thought that I can get this by grouping and sorting by the Date column: df.groupby('date',...

Filling missing values pandas dataframe

python,numpy,pandas,dataframes
I'm trying to fill missing datavalues in a pandas dataframe based on date column. df.head() col1 col2 col3 date 2014-06-20 3 752 4028 2014-06-21 4 752 4028 2014-06-22 32 752 4028 2014-06-25 44 882 4548 2014-06-26 32 882 4548 I tried the following idx = pd.date_range(df.index[0], df.index[-1]) df = df.reindex(idx).reset_index()...

How to compute variance with missing value in a DataFrame - Python Pandas?

python,join,pandas,merge,dataframes
To be concrete, say we have a dataframe df1: name date valueA valueB color A 12/1/14 3 10 red A 12/2/14 1 30 red B 12/1/14 2 30 green B 12/3/14 3 20 green C 12/3/14 4 40 white The range of date is from 12/1/14 to 12/4/14. Each group...

How to calculate mean values grouped on another column in Pandas

python,pandas,dataframes
For the following dataframe: StationID HoursAhead BiasTemp SS0279 0 10 SS0279 1 20 KEOPS 0 0 KEOPS 1 5 BB 0 5 BB 1 5 I'd like to get something like: StationID BiasTemp SS0279 15 KEOPS 2.5 BB 5 I know I can script something like this to get the...

Python pandas : dataframe read rows (readlines)

pandas,dataframes,readlines
I have a dataframe that was produced by pandas, for example: d = {'one':[1,1],'two':[2,2], 'three':[3,3]} i = ['a','b','c'] df = pd.DataFrame(data = d, index = i) df one two three a 1 2 3 b 1 2 3 c 1 2 3 I now need to read each row with...

Restructuring Dataframe in Python

python,pandas,dataframes
I have gathered data from the penultimate worksheet in this Excel file along with all the data in the last Worksheet from "Maturity Years" of 5.5 onward. I have code that does this. However, I am now looking to restructure the dataframe such that it has the following columns and...

fillna in clustered data in large pandas dataframes

python,numpy,pandas,dataframes
Considering the following dataframe: index group signal 1 1 1 2 1 NAN 3 1 NAN 4 1 -1 5 1 NAN 6 2 NAN 7 2 -1 8 2 NAN 9 3 NAN 10 3 NAN 11 3 NAN 12 4 1 13 4 NAN 14 4 NAN I...

How to remove 1 of 2 decimal points in a number using Pandas

python,pandas,dataframes
I have a column of numbers in Excel ... e.g. 1.2345.678 I want to remove the second decimal point from all data. Is this possible via import from csv to dataframe? Thanks. ...

Ignoring future dates in python

python,datetime,pandas,dataframes
I have a large database and I am looking to read only the last week for my python code. However, somebody made a typo in the database so there is a date in the future that is throwing everything off. Input: recvd_dttm 6/5/2015 18:28:50 PM 6/5/2015 14:25:43 PM 9/10/2015 21:45:12...

Pandas groupby category, rating, get top value from each category?

python,pandas,dataframes
First question on SO, very new to pandas and still a little shaky on the terminology: I'm trying to figure out the proper syntax/sequence of operations on a dataframe to be able to group by column B, find the max (or min) corresponding value for each group in column C,...

How to get a value from a Pandas DataFrame and not the index and object type

python,pandas,dataframes
Say I have the following DataFrame Letter Number A 1 B 2 C 3 D 4 Which can be obtained through the following code import pandas as pd letters=pd.Series(('A', 'B', 'C', 'D')) numbers=pd.Series((1, 2, 3, 4)) keys=('Letters', 'Numbers') df=pd.concat((letters, numbers), axis=1, keys=keys) Now I want to get the value C...

pandas dataframe join with where restriction

python,join,pandas,dataframes
I have 2 pandas DataFrames looking like this: ranks: year name rank 2015 A 1 2015 B 2 2015 C 3 2014 A 4 2014 B 5 2014 C 6 and tourneys: date name 20150506 A 20150708 B 20150910 C 20141212 A 20141111 B 20141010 C I want to join...

ValueError: invalid literal for float(): when inserted substring from “2015-05-21T18:11:55” into dataframe

python,pandas,dataframes
I have a key value pair in a JSON-derived dictionary that looks like this: u'local_start_time': u'2015-05-21T18:11:55.000Z' When I try to insert a portion of this string into a dataframe I get this error: File "fix_runs_prepare.py", line 63, in <module> df.set_value(i, name, str(g[name])[0:19]) File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 1679, in set_value engine.set_value(series.values, index,...

pandas - sort by absolute value without changing the data

python,pandas,dataframes
I'm looking for a simple way to sort a pandas dataframe by the absolute value of a particular column, but without actually changing the values within the dataframe. Something similar to sorted(df, key=abs). So if I had a dataframe like: a b 0 1 -3 1 2 5 2 3...

performing math on dataframe variables after groupby in pandas and bringing results back to original dataframe

python,pandas,group-by,dataframes
First the data: df City Date Sex Weight 0 A 6/12/2015 M 185 1 A 6/12/2015 F 120 2 A 7/12/2015 M 210 3 A 7/12/2015 F 105 4 B 6/12/2015 M 225 5 B 6/12/2015 F 155 6 B 6/19/2015 M 167 7 B 6/19/2015 F 121 I am...

How to stack data frames on top of each other in Pandas

python,pandas,dataframes
I have a dataframe with 96 columns: df.to_csv('result.csv') out (excel): Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7 Run 8 Run 9 Run 10 Run 11 Run 12 Run 13 Run 14 Run 15 Run 16 Run 17 Run 18 Run 19 Run 20...

Python : Pandas DataFrame to CSV

python,csv,pandas,dataframes
I want to simply create a csv file from the constructed DataFrame so I do not have to use the internet to access the information. The rows are the lists in the code: 'CIK' 'Ticker' 'Company' 'Sector' 'Industry' My current code is as follows: def stockStat(): doc = pq('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies') for...

Select pandas frame rows based on two columns' values

python,arrays,numpy,pandas,dataframes
I wish to select some specific rows based on two column values. For example: d = {'user' : [1., 2., 3., 4] ,'item' : [5., 6., 7., 8.],'f1' : [9., 16., 17., 18.], 'f2':[4,5,6,5], 'f3':[4,5,5,8]} df = pd.DataFrame(d) print df Out: f1 f2 f3 item user 0 9 4 4...

Create a new field based on the file name in R

r,dataframes
I have a number of .csv files that all contain the same fields that are housed in the same directory, but the values are each file are for a specific date. However, the data in the .csv files does not contain the date - only the file names contain the...

Pandas rank by column value [duplicate]

pandas,dataframes,rank
This question already has an answer here: python pandas rank by column 1 answer I have a dataframe that has auction IDs and bid prices. The dataframe is sorted by auction id (ascending) and bid price (descending): Auction_ID Bid_Price 123 9 123 7 123 6 123 2 124 3...

DataFrame only has last item after construction

python,pandas,dataframes
The dataframe that I have constructed will only return the last item in the list. I am not sure what I am doing wrong. def stockStat(): for heading in doc(".mw-headline:contains('S&P 500 Component Stocks')").parent("h2"): rows = pq(heading).next("table tr") for row in rows: tds = pq(row).find("td") cik = [tds.eq(7).text()] ticker = [tds.eq(0).text()]...

conditional column output for pandas dataframe

python,pandas,dataframes
I have a pandas DataFrame looking like this: nameA statusA nameB statusB a Q x X b Q y X c X z Q d X o Q e Q p X f Q r Q i want to print the rows of this dataframe based on the following rule:...

Delete rows from python panda dataframe

python,pandas,dataframes
My dataframe has columns like ticket, host, drive model, Chassis, Rack, etc. I want all the rows with value in the Chassis column equal to '1025C-M3B', '1026T-M3FB', '2026TT-DLRF' or 'SYS-2027TR-D70RF+'. I want to delete the rest. I tried data2 = data1[data1.Chassis == '1025C-M3B' or data1.Chassis == '1026T-M3FB' or data1.Chassis ==...

How (in a vectorized manner) to retrieve single value quantities from dataframe cells containing numeric arrays?

r,dataframes,vectorization
I've got a dataframe that includes columns like the one on the right here: lengthArray speed_max 1 4 24, 18, 24, 18 2 10 2, 2, 2, 2, 2, 2, 2, 2, 2, 2 3 4 -999, -999, -999, -999 4 2 -999, -999 5 2 18, 18 6 1...

Save vars pr iteration to df and when done save df to csv

python,csv,pandas,dataframes
I need to make a DataFrame (df_max_res) with the 15 best performances from my stock strategies combined with company tickers (AAPL for Apple Computers etc.). I have a list of more than 500 stock tickers that I fetch and on which I analyze using four of my own strategies. In...

How to create a dataframe of summary statistics?

python,pandas,matplotlib,dataframes
I have a dataframe with IDs and numerous test results relating to each ID. What I want to do is create a second dataframe which summarises the average score and the standard deviation for a particular test, which I can then plot on a graph. Below is the code I...

pySpark DataFrames Aggregation Functions with SciPy

apache-spark,dataframes,pyspark
I've tried a few different scenario's to try and use Spark's 1.3 DataFrames to handle things like sciPy kurtosis or numpy std. Here is the example code but it just hangs on a 10x10 dataset (10 rows with 10 columns). I've tried: print df.groupBy().agg(kurtosis(df.offer_id)).collect() print df.agg(kurtosis(df.offer_ID)).collect() But this works no...

Sort Date Year-Month using Panda

python,sorting,dataframes
I have python pandas data frame like the following panda data frame +---------+-------+ | Date | value | +---------+-------+ | 2013-12 | A | | 2013-01 | B | | 2013-04 | C | | 2014-06 | D | +---------+-------+ How can I sort this data frame? I tried to...

How do I copy a row from one pandas dataframe to another pandas dataframe?

python,python-2.7,pandas,dataframes
I have a dataframe of data that I am trying to append to another dataframe. I have tried various ways with .append() and there has been no successful way. When I print the data from iterrows. I provide 2 possible ways I tried to solve the issue below, one creates...

python pandas: drop a df column if condition

python,pandas,delete,dataframes
I would like to drop a given column from a pandas dataframe IF all the values in the column is "0%". my df: data = {'UK': ['11%', '16%', '7%', '52%', '2%', '5%', '3%', '3%'], 'US': ['0%', '0%', '0%', '0%', '0%', '0%', '0%', '0%'], 'DE': ['11%', '16%', '7%', '52%', '2%', '5%',...

SparkSQL UDF Registration in Java8

apache-spark,java-8,dataframes,apache-spark-sql
I'm using Spark 1.3.0 on Java 8. I've got no issues setting up my SQLContext and creating dataframes, the spark DSL is pretty smooth. But I want to use a custom UDF. According to the spark documentation: https://spark.apache.org/docs/1.3.0/sql-programming-guide.html#udf-registration-moved-to-sqlcontextudf-java--scala sqlCtx.udf().register("strLen", (String s) -> { s.length(); }); Should do it for registering...

Parsing two different formats of dates in data frame

python,parsing,datetime,pandas,dataframes
I have a column having 2 different formats of the date which I'm trying to convert to datetime using to_datetime of pandas Here's the code import pandas as pa pa.to_datetime(data["servertime"], format="%a %b %d %H:%M:%S %Y") e.g - servertime Tue Nov 4 12:01:15 2014 But few rows have data in following...

How to select row and column from dataframe using pandas

python,pandas,dataframes
I currently have the following code: import glob import pandas as pd path_pattern = 'C:/Users/Joey/Desktop/GC results/Results/FID_00*' files = glob.glob(path_pattern) dataframes = [pd.DataFrame.from_csv(f, index_col=None) for f in files] new_df = pd.DataFrame() for i in dataframes: selected_data = i['Unnamed: 3'].ix[12:16] new_df['Run'] = selected_data print new_df out: Run 12 5187666.22 13 1453339.93 14...

Joining two Pandas DataFrames does not work anymore?

join,pandas,merge,dataframes
I have 2 Pandas Dataframes. The first one looks like this: date rank id points 2010-01-04 1 100001 10550 2010-01-04 2 100002 9205 The second one like this: id name 100001 A 100002 B I want to join both dataframes via the id column. So the result should look like:...

R: Data frame operations: filtering common rows and removing rows of several data frames

r,merge,dataframes,subset
dfA <- data.frame(Efficiency=c(7,2,8,9), Value=c(3, 4, 7, 8)) dfB <- data.frame(Efficiency=c(7,2,4,2,8,9), Value=c(3, 4, 4, 1, 7, 8)) dfC <- data.frame(Efficiency=c(7,9), Value=c(3, 8)) I want to get the common rows of dfA and dfB. From the resulting data.frame I want to remove the rows which have the same values as dfC....

Python: Eliminating rows in Pandas DataFrame based on boolean condition

python,pandas,boolean,dataframes
Suppose I have a DataFrame in Pandas like c1 c2 0 'ab' 1 1 'ac' 0 2 'bd' 0 3 'fa' 1 4 'de' 0 and I want it to show all rows such that c1 doesn't contain 'a'. My desired output would be: c1 c2 2 'bd' 0 4...

Pandas / SQLITE DataFrame plot

python,sqlite,pandas,dataframes
I try to plot data from sqlite but i can't achieve this :-/ p2 = sql.read_sql('select DT_COMPUTE_FORCAST,VALUE_DEMANDE,VALUE_FORCAST from PCE', cnx) # Data frame p2 show the datas DT_COMPUTE_FORCAST VALUE_DEMANDE VALUE_FORCAST 0 27/06/2014 06:00 5.128 5.324 1 27/06/2014 07:00 5.779 5.334 2 27/06/2014 08:00 5.539 5.354 df = pd.DataFrame({'Demande' : p2['VALUE_DEMANDE'],'Forcast'...

How to attribute to a slice of a pandas Dataframe pointed by a variable?

python,pandas,copy,dataframes,slice
The following commands show how to attribute to a slice: In [81]: a=pd.DataFrame([[1,2,3],[4,5,6],[7,8,9]]) In [82]: a Out[82]: 0 1 2 0 1 2 3 1 4 5 6 2 7 8 9 In [83]: a.loc[1] = [10,11,12] In [84]: a Out[84]: 0 1 2 0 1 2 3 1 10...

python pandas time series select day of year

python,date,pandas,dataframes
I want to select data from a dataframe for a particular day of the year. Here is what I have so far as a minimal example. import pandas as pd from datetime import datetime from datetime import timedelta import numpy.random as npr rng = pd.date_range('1/1/1990', periods=365*10, freq='D') df1 = pd.DataFrame(npr.randn(len(rng)),...

python pandas extract year from datetime — df['year'] = df['date'].year is not working

python,datetime,pandas,extract,dataframes
Sorry for this question that seems repetitive - I expect the answer will make me feel like a bonehead... but I have not had any luck using answers to the similar questions on SO. I am importing data in through read_csv, but for some reason which I cannot figure out,...

pandas python: adding blank columns to df

python,pandas,insert,dataframes,col
I'm trying to add x number of blank columns to a dataframe. Here is my function: def fill_with_bars(df, number=10): ''' Add blank, empty columns to dataframe, at position 0 ''' numofcols = len(df.columns) while numofcols < number: whitespace = '' df.insert(0, whitespace, whitespace, allow_duplicates=True) whitespace += whitespace return df but...

How to split string from column to create long format dataframe

python,pandas,dataframes
If I have the dataframe shown below, how do I make a long format dataframe (I.e. one term per gene per row). I guess I will have to apply or map a split(",") to the Term column, but what do I do after that? import pandas as pd from StringIO...

Segmenting a series of Timedeltas to a minute by minute graph (pandas)

python,pandas,matplotlib,dataframes
I have a dataframe with the index as a Timedelta, ranging from 0 to 5 minutes, and a column of floating point numbers. Here's an example subset: 32 0.740283 34 0.572126 36 0.524788 38 0.509685 40 0.490219 42 0.545977 44 0.444170 46 1.098387 48 2.209113 51 1.426835 53 1.536439 55...

Selecting Data from Last Week in Python

python,datetime,pandas,format,dataframes
I have a large database and I am looking to read only the last week for my python code. My first problem is that the column with the received date and time is not in the format for datetime in pandas. My input (Column 15) looks like this: recvd_dttm 1/1/2015...

Iteratively change every cell in a column of a Pandas dataframe

python,pandas,dataframes
I am trying to change the value of every cell in a Pandas data frame. Expecting .loc to allow me to identify a cell with the paradigm df.loc[row_index, column_name] = cell value, I've used the following loop: table["field"] = 6 #placehodler value used only to create the column for field,...

Concat Columns produces NAN even though axis is the same for all datasets

python,pandas,dataframes,concat
I am trying to concat columns from multiple dataframes. `AUD = OHLC_AUDUSD['bid']['close'];` `AUD = AUD.dropna()` `CAD = OHLC_USDCAD['bid']['close'];` `CAD = CAD.dropna()` `print AUD` symbol timestamp AUDUSD 2015-01-05 0.8096 2015-01-06 0.8077 2015-01-07 0.8074 2015-01-08 0.8112 2015-01-09 0.8200 Name: close, dtype: float64 `print CAD` symbol timestamp USDCAD 2015-01-05 1.1756 2015-01-06 1.1838 2015-01-07...

Construct bipartite graph from columns of python dataframe

python,graph,dataframes,networkx
I have a dataframe with three columns. data['subdomain'], data['domain'], data ['IP'] I want to build one bipartite graph for every element of subdomain that corresponds to the same domain, and the weight to be the number of times that it corresponds. For example my data could be: subdomain , domain,...

Deedle dataframe slicing by rows in C#

c#,dataframes,deedle
How do I slice by rows in a Deedle dataframe using C#? For example, I want the first three rows in a Deedle dataframe using C#.

Pandas difference between dataframes on column values

python,pandas,dataframes,difference
I couldn't find a way to have a dataframe that has the difference of 2 dataframes based on a column. So basically: dfA = ID, val 1, test 2, other test dfB = ID, val 2, other test I want to have a dfC that holds the difference dfA -...

Is there an easy way to group columns in a Pandas DataFrame?

pandas,dataframes,indices,columnname
I am trying to use Pandas to represent motion-capture data, which has T measurements of the (x, y, z) locations of each of N markers. For example, with T=3 and N=4, the raw CSV data looks like: T,Ax,Ay,Az,Bx,By,Bz,Cx,Cy,Cz,Dx,Dy,Dz 0,1,2,1,3,2,1,4,2,1,5,2,1 1,8,2,3,3,2,9,9,1,3,4,9,1 2,4,5,7,7,7,1,8,3,6,9,2,3 This is really simple to load into a DataFrame,...

apply pandas qcut function to subgroups

python,pandas,dataframes
Let us assume we created a dataframe df using the code below. I have created a bin frequency count based on the 'value' column in df. Now how do I get the frequency count of these label=1 samples frequency count based on previous created bin? Obviously, I should not use...

Python: How can I get the previous 5 values in a Pandas dataframe after skipping the very last one?

python,pandas,dataframes
I have a Pandas dataframe, df as follows: 0 1 2 0 k86e 201409 180 1 k86e 201410 154 2 k86e 201411 157 3 k86e 201412 153 4 k86e 201501 223 5 k86e 201502 166 6 k86e 201503 163 7 k86e 201504 169 8 k86e 201505 157 I know that...

how to drop dataframe in pandas?

python,pandas,dataframes
Tips are there for dropping column and rows depending on some condition. But I want to drop the whole dataframe created in pandas. like in R : rm(dataframe) or in SQL: drop table This will help to release the ram utilization....

Formatting a Pivot Table in Python

python,sorting,pandas,format,dataframes
I am trying to reformat a table based on counts in different columns. df = pd.DataFrame({'Number': [1, 2, 3, 4, 5], 'X' : ['X1', 'X2', 'X3', 'X3', 'X3'], 'Y' : ['Y2','Y1','Y1','Y1', 'Y2'], 'Z' : ['Z3','Z1','Z1','Z2','Z1']}) Number X Y Z 0 1 X1 Y2 Z3 1 2 X2 Y1 Z1 2...

Move given row to end of DataFrame

python,pandas,dataframes,concat
I would like to take a given row from a DataFrame and prepend or append to the same DataFrame. My code below does just that, but I'm not sure if I'm doing it the right way or if there is an easier, better, faster way? testdf = df.copy() #get row...

Python Pandas matching closet index from another Dataframe

python,pandas,dataframes
df.index = 10,100,1000 df2.index = 1,2,11,50,101,500,1001 Just sample I need to match closet index from df2 compare with df by these conditions df2.index have to > df.index only one closet value for example output df | df2 10 | 11 100 | 101 1000 | 1001 Now I can do...

Python Pandas Dataframe Conditional If, Elif, Else

python,if-statement,pandas,dataframes
In a Python Pandas DataFrame, I'm trying to apply a specific label to a row if a 'Search terms' column contains any possible strings from a joined, pipe-delimited list. How can I do conditional if, elif, else statements with Pandas? For example: df = pd.DataFrame({'Search term': pd.Series(['awesomebrand inc', 'guy boots',...

populating a dataframe column in pandas with another dataframe's column

python-2.7,pandas,dataframes
I'm generating a list of conference names from the database and trying to populate them into a column in another dataframe. For some reason it's not working and returning NaN. Can anyone help explain why it's doing that? Why the last line isn't doing what it's supposed to? df_conf =...

Join 2 DataFrames on an index without introducing nans on missing indices

python,pandas,dataframes
I have 2 DataFrames: df1 = pandas.DataFrame([[1,2,3],[4,5,6],[7,8,9]], columns=['a','b','c']) df2 = df1*2 df2.index = [1,2,3] >>> df1 a b c 0 1 2 3 1 4 5 6 2 7 8 9 >>> df2 a b c 1 2 4 6 2 8 10 12 3 14 16 18 Say I...

Python Pandas two conditional dataframe groupby running sort

python,pandas,group-by,dataframes
I'm looking for a way to run a two conditional pandas DataFrame groupby method. I have many logs to parse and I have the following single condition groupby method, but is there a way to have a two conditional groupby method? DF[DF['Feature Enabled'] == 1].groupby([’Feature Active'])[['Value1','Value2']].mean() Is there a way...

How to create separate Pandas DataFrames for each CSV file and give them meaningful names?

python,python-2.7,csv,pandas,dataframes
I've searched thoroughly and can't quite find the guidance I am looking for on this issue so I hope this question is not redundant. I have several .csv files that represent raster images. I'd like to perform some statistical analysis on them so I am trying to create a Pandas...

join two pandas dataframe using a specific column

python,join,pandas,dataframes
I am new with pandas and I am trying to join two dataframes based on the equality of one specific column. For example suppose that I have the followings: df1 A B C 1 2 3 2 2 2 df2 A B C 5 6 7 2 8 9 Both...

conditionally dividing columns in dataframes based in values pandas

python,pandas,dataframes
So i have a dataframe that looks like this...let's call it df1 Disease Gene1 Gene2 Gene3 Gene4 0 D1 1 1 26 1 1 D2 1 1 1 1 2 D3 1 18 1 17 3 D4 25 1 1 1 4 D5 1 1 1 1 5 D6 1...

How to return all opposite pairs in a Pandas DataFrame?

python,pandas,match,dataframes
For the dataframe below, how to return all opposite pairs? import pandas as pd df1 = pd.DataFrame([1,2,-2,2,-1,-1,1,1], columns=['a']) a 0 1 1 2 2 -2 3 2 4 -1 5 -1 6 1 7 1 The output should be as below: (1) sum of all rows is 0 (2) as...

Convert python pandas series to a dataframe and split the column into two

python,pandas,dataframes
This is my series named table Host cds170.yyz.llnw.net 1 cds172.yyz.llnw.net 3 cds180.yyz.llnw.net 1 cds182.yyz.llnw.net 1 cds183.yyz.llnw.net 3 fcds113.yyz.llnw.net 1 Name: Host, dtype: int64 This is the dataframe that I want Host count cds170.yyz.llnw.net 1 cds172.yyz.llnw.net 3 cds180.yyz.llnw.net 1 cds182.yyz.llnw.net 1 cds183.yyz.llnw.net 3 fcds113.yyz.llnw.net 1 I have tried table = pd.DataFrame(table)...

Apache Spark, add an “CASE WHEN … ELSE …” calculated column to an existing DataFrame

scala,apache-spark,dataframes,apache-spark-sql
I'm trying to add an "CASE WHEN ... ELSE ..." calculated column to an existing DataFrame, using Scala APIs. Starting dataframe: color Red Green Blue Desired dataframe (SQL syntax: CASE WHEN color == Green THEN 1 ELSE 0 END AS bool): color bool Red 0 Green 1 Blue 0 How...

selecting a column value based on a different column value after applying Groupby

python,group-by,dataframes
i am able to get this to work, but not after I apply a groupby. in this example I simply want to have the last column contain the lowest value from column x. I have popuymated the df with a column called yminx with is what I would like my...

Pandas: creating dataframe rows from other dataframe information

python,pandas,dataframes
I'm working with aggregated data, which I need to dis-aggregate in order to process it further. The original df contains a value 'no. of students' per row and I need one row in the new df per student: Original df: faculty A faculty B faculty x male students 2 7...

How to call more than one function in python

python,pandas,dataframes
for some reason when I call all four functions at once I get an error with the newly named dataframes. specifically the empty dataframes that I want to fill. Have no idea why. I've tried to move all empty dataframes outside the function and that didn't work. Any help appreciated....

Python- Compute the sum of numerical characters of every string in a dataframe

python,pandas,dataframes
so I have a dataframe with a column "dname". It contains many rows of 2LD domain names. i.e. [123ask , example92 , what3ver]. I want to find the number of digits for every string in every row. So, to create a new column in the dataframe with values i.e. [3...

extracting days between a dataframe date and today

datetime,pandas,dataframes
I have a date in a dataframe and I would like to create another dataframe column that has the number of days between this date (option expiration) and today. The date looks like this: df2.expiration.head(1) 0 05/29/15 Name: expiration, dtype: object i tried this: from datetime import datetime def compare_dates(date):...

How use dataframe as map to change values in another dataframe

python,pandas,dataframes
I have one large dataframe that acts as a map between integers and names: from StringIO import StringIO import pandas as pd gene_int_map = pd.read_table(StringIO("""Gene Int Mt-nd1 2 Cers2 4 Nampt 10 Madd 20 Zmiz1 21 Syt1 26 Syt5 30 Syt7 32 Cdca7 34 Ablim2 42 Elp5 43 Clic1 98...

How to run strings as a command in python

python,csv,pandas,dataframes
I added two strings together and now I want to use those strings to create multiple dataframes.... I currently have: filepath = 'C:/........GC results/lng_11169_fid00' l = [] for i in range(1,7): newpath = filepath + str(i) l.append(newpath) print l d =[] for i in range(1,7): dataframes = "df" + str(i)...

Pandas - Convert Unbalanced Panel Data to Cross Section

python,pandas,dataframes
Actually am unsure if the end of this is cross-section because it's over a time period, but I think it is still. I have a data frame that looks like this: Player Finish Tournament Year id ------------------------------------------------ Aaron Baddeley 9 Memorial 2012 1 Aaron Baddeley 17 Masters 2013 1 Aaron...

Pandas - How do I subset a column composed of list objects?

python,json,pandas,dataframes
I am working with a JSON file that I pulled from Github using: curl https://api.github.com/repos/mbostock/d3/stats/commit_activity > d3_commit-activity.json Then, within Pandas I ran the following commands: import pandas as pd import numpy as np import matplotlib.pylab as plt df = pd.io.json.read_json("d3_commit-activity.json") One of the columns in df is called "days" and...

Using Pandas to Iteratively Add Columns to a Dataframe

python,loops,pandas,dataframes
I have some relatively simple code that I'm struggling to put together. I have a CSV that I've read into a dataframe. The CSV is panel data (i.e., unique company and year observations for each row). I have two columns that I want to perform a function on and then...

Pandas dataframe first x columns [duplicate]

pandas,dataframes
This question already has an answer here: Is there a pandas function to display the first/last n columns, as in .head() & .tail()? 5 answers I have a dataframe with about 500 columns and that's why I am wondering if there is anyway that I could use head() function...

Write a user defined fillna function in pandas dataframe to fill np.nan different values with conditions

python,pandas,dataframes,user-defined-functions,nan
Considering the following pandas dataframe: import pandas as pd change = [0.475, 0.625, 0.1, 0.2, -0.1, -0.75, 0.1, -0.1, 0.2, -0.2] position = [1.0, 1.0, nan, nan, nan, -1.0, nan, nan, nan, nan] date = ['20150101', '20150102', '20150103', '20150104', '20150105', '20150106', '20150107', '20150108', '20150109', '20150110'] pd.DataFrame({'date': date, 'position': position, 'change':...

multiply pandas dataframe column with a constant

python-2.7,pandas,dataframes
I have two dataframes: df: Conference Year SampleCitations Percent 0 CIKM 1995 373 0.027153 1 CIKM 1996 242 0.017617 2 CIKM 1997 314 0.022858 3 CIKM 1998 427 0.031084 And another dataframe which returns to me the total number of citations: allcitations= pd.read_sql("Select Sum(Citations) as ActualCitations from publications " I...

How to find matching rows in Pandas DataFrame with identical values with same/opposite signs in certain columns?

python,pandas,duplicates,dataframes,matching
For the dataframe below, how can I return first and third row, as they have identical values in column "c" and "d", and have values opposite of each other in "a" and b"? df1=pd.DataFrame([ [1,2,3,4],[5,6,7,8], [-1,-2,3,4]], columns=['a', 'b', 'c', 'd']) a b c d 0 1 2 3 4 1...

Python: how to get values from a dictionary from pandas series

python,dictionary,pandas,key,dataframes
I am very new to python and trying to get value from dictionary where keys are defined in a dataframe column (pandas). I searched quite a bit and the closest thing is a question in the link below, but it doesnt come with an answer. So, here I am trying...

trouble selecting a period of dates in Pandas with < and >

python,pandas,dataframes
a really simple question perhaps, but it's my first time working with pandas already and I'm having trouble slicing up my dataframes into smaller ones based on dates. So, i have a dataframe (named firstreadin) that looks like this (and thousands more rows): date numbers megaball 0 1999-01-12 [5, 7,...

Python Spark Dataframes: Better way to export groups to text file

python,apache-spark,dataframes
I want to export data to separate text files; I can do it with this hack: for r in sqlContext.sql("SELECT DISTINCT FIPS FROM MY_DF").map(lambda r: r.FIPS).collect(): sqlContext.sql("SELECT * FROM MY_DF WHERE FIPS = '%s'" % r).rdd.saveAsTextFile('county_{}'.format(r)) What is the right way to do it with Spark 1.3.1/Python dataframes? I want...

create new column based on other column but stripping

python,pandas,dataframes
I have a pandas DataFrame with an id column looking like this: id A2015 B2016 C2017 I want two new columns as follows: id year name A2015 2015 A Q B2016 2016 B Q C2017 2017 C Q so the year column should take the four last characters of the...

df.fillna(0) command won't replace NaN values with 0

python-2.7,pandas,dataframes,nan
I'm trying to replace the NaN values generated in the code below to 0. I don't understand what the below won't work. It still keeps the NaN values. df_pubs=pd.read_sql("select Conference, Year, count(*) as totalPubs from publications where year>=1991 group by conference, year", db) df_pubs['Conference'] = df_pubs['Conference'].str.encode('utf-8') df_pubs = df_pubs.pivot(index='Conference', columns='Year',...

Python pandas: exclude rows below a certain frequency count

python,pandas,filter,dataframes
So I have a pandas DataFrame that looks like this: r vals positions 1.2 1 1.8 2 2.3 1 1.8 1 2.1 3 2.0 3 1.9 1 ... ... I would like the filter out all rows by position that do not appear at least 20 times. I have seen...

Create multiple dataframes in loop

python,pandas,dataframes
I have a list, with each entry being a company name companies = ['AA', 'AAPL', 'BA', ....., 'YHOO'] I want to create a new dataframe for each entry in the list. Something like (pseudocode) for c in companies: c = pd.DataFrame() I have searched for a way to do this...

Pandas logical indexing on a single column of a dataframe to assign values

python,r,pandas,dataframes
I am an R programmer and looking for a similar way to do something like this in R: data[data$x > value, y] <- 1 (basically, take all rows where the x column is greater than some value and assign the y column at those rows the value of 1) In...

adding one to all the values in a dataframe

pandas,dataframes
I have a dataframe like the one below. I would like to add one to all of the values in each row. I am new to this forum and python so i can't conceptualise how to do this. I need to add 1 to each value. I intend to use...

Pandas get previous dataframe row by date

python,pandas,dataframes
I'm working with some data where I have to get the date of occurrence. For example, say we're working with medical data. Each row is a unique visit from a patient, though the same patient can have multiple rows. Each row also contains info on the type of visit, whether...

Trying to create a new dataframe based on internal sums of a column from another dataframe using Python/pandas

python,indexing,pandas,sum,dataframes
Let's assume I have a pandas dataframe df as follow: df = DataFrame({'Col1':[1,2,3,4], 'Col2':[5,6,7,8]}) Col1 Col2 0 1 5 1 2 6 2 3 7 3 4 8 Is there a way for me to change a column into the sum of all the following elements in the column? For...

Python pandas - grouping by and summarizing on a field

python,pandas,dataframes
I've been playing with Panda's DataFrames recently, and struggling to analyze some multi-dimensional data. Let's say I have some data such as below: order | sample | feature1 | feature2 ------------------------------------- 1234 | A | 0.20 | 0.45 1234 | B | 0.71 | 0.08 1234 | C | 0.21...

Apache Spark: get elements of Row by name

scala,apache-spark,schema,dataframes
In a DataFrame object in Apache Spark (I'm using the Scala interface), if I'm iterating over its Row objects, is there any way to extract values by name? I can see how to do some really awkward stuff: def foo(r: Row) = { val ix = (0 until r.schema.length).map( i...

How can I add columns in a data frame?

python,pandas,dataframes
I have the following data: Example: DRIVER_ID;TIMESTAMP;POSITION 156;2014-02-01 00:00:00.739166+01;POINT(41.8836718276551 12.4877775603346) I want to create a pandas dataframe with 4 columns that are the id, time, longitude, latitude. So far, I got: cur_cab = pd.DataFrame.from_csv( path, sep=";", header=None, parse_dates=[1]).reset_index() cur_cab.columns = ['cab_id', 'datetime', 'point'] path specifies the .txt file containing the...

Nested if loop with DataFrame is very,very slow

python,if-statement,nested,dataframes
I have 10 million rows to go through and it will take many hours to process, I must be doing something wrong I converted the names of my df variables for ease in typing Close=df['Close'] eqId=df['eqId'] date=df['date'] IntDate=df['IntDate'] expiry=df['expiry'] delta=df['delta'] ivMid=df['ivMid'] conf=df['conf'] The below code works fine, just ungodly slow,...

Python Pandas sum of dataframe with one column

python-2.7,pandas,sum,dataframes,calculated-columns
I have a Python Pandas DataFrame: df = pd.DataFrame(np.random.rand(5,3),columns=list('ABC')) print df A B C 0 0.041761178 0.60439116 0.349372206 1 0.820455992 0.245314299 0.635568504 2 0.517482167 0.7257227 0.982969949 3 0.208934899 0.594973111 0.671030326 4 0.651299752 0.617672419 0.948121305 Question: I would like to add the first column to the whole dataframe. I would like...

How to apply a function to the elements of a pandas dataframe

pandas,lambda,dataframes
I want to apply a lambda function to the elements of a dataframe, in the same way as np.sqrt returns a dataframe with the sqrt of each element. However pd.DataFrame.apply apply the function to an row or an column. Is there a similar comand that apply a lambda function on...

Python data frame column string extraction efficient way?

python,pandas,pattern-matching,dataframes
I have a data frame df with column ID in the following pattern. What I want is to return a string column with the number after the dash sign. For the example below, I need 01,01,02. I used the command below and it failed. Since it is a very large...