FAQ Database Discussion Community


How to filter a pandas DataFrame for a certain column value and only return columns that do not have NAN?

python,pandas,data-analysis
Example data: In [42]: data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'], 'year': [2000, 2001, 2002, 2001, 2002], 'pop': [1.5, 1.7, 3.6, 2.4, 2.9]} pd.DataFrame(data, columns=['year', 'state', 'pop', 'debt']) Out[42]: year state pop debt 0 2000 Ohio 1.5 NaN 1 2001 Ohio 1.7 NaN 2 2002 Ohio 3.6 NaN 3...

How to create venn diagram in R studio from group of three frequency column

r,rstudio,data-analysis
How to create a venn diagram in R from dataFrame: user has_1 has_2 has_3 3431 true false true 3432 false true false 3433 true false false 3434 true false false 3435 true false false 3436 true false false There are thousands such row. I want to show how many users...

QlikView Resources for Beginner Developer

business-intelligence,data-analysis,qlikview,qliksense
I am looking to add Qlikview Development to my skill-set. I have a C# and SQl background. Are there any free online resources to getting me going at developer level not end-user? What's the best starting place for me and the level of difficulty involved. I have seen the capabilities...

Favorite tool for word/phrase counting

full-text-search,text-mining,data-analysis,word-count,text-analysis
I am looking for a tool that performs counting of words and, more importantly, phrases, in large amounts of open-ended text responses. I need the ability to exclude certain words (a, the, and, etc.) as well. I am aware of a few tools that do this: - http://www.mywritertools.com/default.asp - http://www.hermetic.ch/wfca/wfca.htm...

Python Pandas add column with relative order numbers

python,pandas,data-analysis
How do I add a order number column to an existing DataFrame? This is my DataFrame: import pandas as pd import math frame = pd.DataFrame([[1, 4, 2], [8, 9, 2], [10, 2, 1]], columns=['a', 'b', 'c']) def add_stats(row): row['sum'] = sum([row['a'], row['b'], row['c']]) row['sum_sq'] = sum(math.pow(v, 2) for v in...

Optimal way to analyze user qualifications data (C# and SQL server)

c#,sql-server,algorithm,data-analysis
I need to analyze data from SQL server table. Table contains data connected with qualifications of all employyes in the company and has the following structure (simplified): | User | Qualification | DateOfQualificationAssignment | | user000 | Junior | 2014-01-15 | | user000 | Middle | 2014-02-15 | | user001...

How to concatenate a specific column from a pandas.DataFrame()?

python,pandas,data-analysis
I have a list of files, and I want to combine a specific column from it for all my files, to run some cumulative analysis. import pandas as pd import numpy as np all_data_sets = pd.DataFrame([]) for file_name in file_list: my_data = pd.DataFrame([]) my_data = pd.read_csv(file_name, delimiter=',', names=header_row) my_data =...

JavaScript JSON Combination and Data Anylish

javascript,json,for-loop,data-analysis
I am trying to add values from multiple JSON responses that are saved in a .txt file. The .txt files has about 4000 entries. They are each the same format as follows: {"id":"8f546dcf-b66a-4c53-b3d7-7290429483b8","price":"247.96000000","size":"0.03121005","product_id":"BTC-USD","side":"sell","stp":"dc"} {"id":"0ec4b63a-b736-42af-a0aa-b4581bf12955","price":"247.90000000","size":"0.03910014","product_id":"BTC-USD","side":"sell","stp":"dc"}...

Changing values of multiple column elements for dataframe in R

r,data-analysis,data-manipulation
I'm trying to update a bunch of columns by adding and subtracting SD to each value of the column. The SD is for the given column. The below is the reproducible code that I came up with, but I feel this is not the most efficient way to do it....

Mean and standart deviation by groups where a condition is satisfied

r,data-analysis
I have such a data frame(df) which is just a sapmle: group condition values 1 0 12 1 1 15 1 1 23 1 1 14 2 1 34 2 1 37 2 0 31 2 0 36 2 1 35 Namely; df<-data.frame(group=c(1, 1, 1, 1, 2, 2, 2, 2,...

Ambiguous truth value with boolean logic

python,excel,algorithm,pandas,data-analysis
I am trying to use some boolean logic in a function on a dataframe, but get an error: In [4]: data={'level':[20,19,20,21,25,29,30,31,30,29,31]} frame=DataFrame(data) frame Out[4]: level 0 20 1 19 2 20 3 21 4 25 5 29 6 30 7 31 8 30 9 29 10 31 In [35]: def...

Counting the number of join symptoms

sql,data-analysis
I'm trying to do something that's very simple to do in other languages but in SQL it's proving rather puzzling. I have a database with the patient ID as row, and 100 symptoms as columns. Symptoms are binary, 0 or 1 if the patient has it or not. Let's say...

How to reformat a dataset using the values of rows as new columns?

postgresql,pivot,crosstab,data-analysis
I have a dataset that looks like this: id | test_id ---+-------- 1 | a 1 | b 1 | u 2 | a 2 | u 3 | a 3 | b 3 | u And I would like to roll it up into a new table such that...

Fourier transform with python

python,python-2.7,scipy,data-analysis
I have a set of data. It is obviously have some periodic nature. I want to find out what frequency it has by using the fourier transformation and plot it out. Here is a shot of mine, but it seems not so good. This is the corresponding code, I don't...

How to use multiple data to train a linear regression model in R

r,linear-regression,data-analysis
I am building a linear regression model to predict 2015 values. I have data from 2013 and 2014. My question is, how can I use both the data from 2013 and 2014 to train my linear regression model in R? I have: model1 = lm(x ~ y, data = data2013)...

how to select column values to display in pandas groupby

python,pandas,data-analysis
I am using pandas 0.16.0, I have data: id A B 1 10 100 2 10 101 3 20 102 when I call df.groupby(['A']).groups I have {10: [1 2], 20: [3]} and I want to have this (values from column B) {10: [100, 101], 20: [102]} please help...

Python - What are the major improvement of Pandas over Numpy/Scipy

python,numpy,pandas,scipy,data-analysis
I have been using numpy/scipy for data analysis. I recently started to learn Pandas. I have gone through a few tutorials and I am trying to understand what are the major improvement of Pandas over Numpy/Scipy. It seems to me that the key idea of Pandas is to wrap up...