FAQ Database Discussion Community

Using dplyr window functions to calculate percentiles

r,dplyr,tidyr
I have a working solution but am looking for a cleaner, more readable solution that perhaps takes advantage of some of the newer dplyr window functions. Using the mtcars dataset, if I want to look at the 25th, 50th, 75th percentiles and the mean and count of miles per gallon...

Using gather to tidy dataset in R- attributes are not identical

r,plyr,tidyr
My end goal is to do ratios between two values (T/D) with my dataset, but it seemed like the best way to do that would be to tidy my dataset using something like tidyr. I have been trying to use gather and separate but had some hiccups. The data looks...

Create Conditional new variables in R

r,dplyr,tidyr
I need to recreate the original variables of a very large data frame (900+ variables). Here is an example of what I'm trying to do: dat <- data.frame( id=c('user1','user2','user3'), agePanel1=c(20,25,32), agePanel2=c(21,NA,33), favColPanel1=c('blue','red','blue'), favColPanel2=c('red',NA,'red') ) id agePanel1 agePanel2 favColPanel1 favColPanel2 1 user1 20 21 blue red 2 user2 25 NA red...

R spreading multiple columns with tidyr [duplicate]

r,dplyr,tidyr
This question already has an answer here: tidyr repeated measures multiple variables (wide format) 3 answers Take this sample variable df <- data.frame(month=rep(1:3,2), student=rep(c("Amy", "Bob"), each=3), A=c(9, 7, 6, 8, 6, 9), B=c(6, 7, 8, 5, 6, 7)) I can use spread from tidyr to change this to wide...

Loop through each column and row, do stuff

r,dplyr,tidyr
I think this is the best way to describe what I want to do: df\$column <- ifelse(is.na(df\$column) == TRUE, 0, 1) But where column is dynamic. This is because I have about 45 columns all with the same kind of content, and all I want to do is check each...

Grouping key/value columns into single rows

r,data.frame,data.table,tidyr
I'm trying to take key-value combinations and put all the values on the same row as the keys. I'm pretty sure I knew how to do this at one point (I think with data.table) and I've been looking at the usual suspects reshape2, tidyr, data.table, etc, but I can't seem...

Separate a column into multiple columns using tidyr::separate with sep=“”

r,tidyr
df <- data.frame(category = c("X", "Y"), sequence = c("AAT.G", "CCG-T"), stringsAsFactors = FALSE) df category sequence 1 X AAT.G 2 Y CCG-T I want to separate the column sequence into 5 columns (one for each character). I tried to do that with tidyr::separate but it internally uses stringi::stri_split_regex which doesn't...

Converting columns into rows without specifying the column names

r,reshape2,tidyr
I have a data frame with following structure: bad_df <- data.frame( id = c("id001", "id002", "id003"), participant.1 = c("Jana", "Marina", "Vasilei"), participant.2 = c("Niko", "Micha", "Niko"), role.1 = c("writer", "writer", "speaker"), role.2 = c("observer", "observer", "observer"), stringsAsFactors = F ) bad_df I would need to gather it into something like...

In R: get multiple rows by splitting a column using tidyr and reshape2

r,split,reshape2,tidyr
What is the most simpel way using tidyr or reshape2 to turn this data: data <- data.frame( A=c(1,2,3), B=c("b,g","g","b,g,q")) Into (e.g. make a row for each comma separated value in variable B): A B 1 1 b 2 1 g 3 2 g 4 3 b 5 3 g 6...

skip the empty dataframes and produce the output

r,merge,ggplot2,tidyr,rpostgresql
sessionInfo() R version 3.2.0 (2015-04-16) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1 locale: [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C [5] LC_TIME=German_Germany.1252 attached base packages: [1] grid stats graphics grDevices utils datasets methods [8] base other attached packages: [1] WriteXLS_3.5.1 tidyr_0.2.0 scales_0.2.4 gridExtra_0.9.1 [5]...

Creating factor-level indicator variables in R using spread or cast

r,reshape,dplyr,factors,tidyr
Assume a data structure like the following MemberID <- c(123,123,234,234) nbin <- 4 imatrix <- matrix(sample(c(0,1), size=nbin * length(MemberID), replace=TRUE), nrow=length(MemberID)) colnames(imatrix) <- letters[1:nbin] years <- c("Y1","Y2","Y1","Y2") mydf <- data.frame(cbind(MemberID, years, imatrix)) How can I make a similar data structure such that I have an indicator for each level of...

I have a dataframe as follows: ddd <- structure(list(sample_date = structure(c(1400612280, 1400612280, 1400612280, 1400612280, 1400612280, 1400612280, 1400616420, 1400616420, 1400616420, 1400616420, 1400616420, 1400616420, 1400604780, 1400604780, 1400604780, 1400604780, 1400604780, 1400604780, 1400604780, 1400604780, 1400604780, 1400604780, 1400604780, 1400604780, 1400604780, 1400604780, 1400604780, 1400604780, 1400604780, 1400604780, 1400604780, 1400604780, 1400604780, 1400604780, 1400604780, 1400604780, 1400604780, 1400604780,...

Difficulty with wide data frame in r

r,dplyr,tidyr
I have dataframe (as follows) which contains cases (ID) that received different diagnoses (DX) both during a single admission and across different admissions. I want to widen this dataframe so that every singly admission has all the diagnoses in separate columns. I have tried dplyr spread function but it doesn't...

tidyr gather: simultaneously gather and rename key?

r,tidyr
Suppose I have the following data frame: > a <- data_frame(my_type_1_num_widgets = c(1, 2, 3), my_type_2_num_widgets = c(4, 5, 6)) > a Source: local data frame [3 x 2] my_type_1_num_widgets my_type_2_num_widgets 1 1 4 2 2 5 3 3 6 I want to do two things: gather the "num_widgets" columns....

What is the dplyr equivalent of plyr::ldply(tapply) in R?

r,plyr,dplyr,tidyr
Ultimately, I am trying to achieve something similar to the following, but leveraging dplyr instead of plyr: library(dplyr) probs = seq(0, 1, 0.1) plyr::ldply(tapply(mtcars\$mpg, mtcars\$cyl, function(x) { quantile(x, probs = probs) })) # .id 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% # 1 4 21.4 21.50...

Speedy/elegant way to unite many pairs of columns

r,reshape,dplyr,tidyr
Is there an elegant/fastR way to combine all pairs of columns in a data.frame? For example, using mapply() and paste() we can turn this data.frame: mydf <- data.frame(a.1 = letters, a.2 = 26:1, b.1 = letters, b.2 = 1:26) head(mydf) a.1 a.2 b.1 b.2 1 a 26 a 1 2...

colapse multiple columns values into 1 factor

r,reshape,tidyr
Is there a function to collapse several columns values into 1 factor? Every record has exactly 1 TRUE value for columns 2:4. The resulting value for a record should be the column's name which has the true value. input data frame: data <- data.frame(user=c(1,2,3,4), blue=c(T,F,T,F), green=c(F,F,F,T), red=c(F,T,F,F)) user blue green...

Transposing data frames

r,reshape2,tidyr
Happy Weekends. I've been trying to replicate the results from this blog post in R. I am looking for a method of transposing the data without using t, preferably using tidyr or reshape. In example below, metadata is obtained by transposing data. metadata <- data.frame(colnames(data), t(data[1:4, ]) ) colnames(metadata) <-...

r,dplyr,reshape2,tidyr
First, here is the data tbl_df (simplified) I am using : > mytbldf Source: local data frame [6 x 5] iso2c country year var1 var2 1 BI Burundi 2011 4.486265 6.693711 2 BI Burundi 2012 3.939242 5.330326 3 BI Burundi 2013 4.286439 5.747370 4 UG Uganda 2011 3.998849 10.025680 5...

Tidy data frame to matrix in R

r,matrix,dplyr,tidyr
This is the glimpse() of my data frame: \$ Row (int) 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 1, 1, 1, 1,... \$ Col (int) 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4,...

R: create data.table with periodic function

r,function,data.frame,data.table,tidyr
I would like to create a data.table in tidy form containing the columns articleID, period and demand (with articleID and period as key). The demand is subject to a random function with input data from another data.frame (params). It is created at runtime for differing numbers of periods. It is...

adding default values to item x group pairs that don't have a value (df %>% spread %>% gather seems strange)

r,dplyr,tidyr
Short version How to do the operation df1 %>% spread(groupid, value, fill = 0) %>% gather(groupid, value, one, two) in a more natural way? Long version Given a data frame df1 <- data.frame(groupid = c("one","one","one","two","two","two", "one"), value = c(3,2,1,2,3,1,22), itemid = c(1:6, 6)) for many itemid and groupid pairs we...

Using spread to create two value columns with tidyr

I have a data frame that looks just like this (see link). I'd like to take the output that is produced below and go one step further by spreading the tone variable across both the n and the average variables. It seems like this topic might bear on this, but...

Merging two columns, but changing the name of specific variables

r,tidyr
I have two factor columns with a number of missing data. The names of specific variables were changed during data collection. What I am trying to do is merge the two columns together, but change the names of specific old variables to match the new. I used help <- data.frame(var1...

Plotting (ggplot) numeric values from mixed long format column of class character

r,ggplot2,dplyr,tidyr
Following the tidy data standard, I have my data in long format with a key and a value column. The values for some keys are numeric, for others are characters, and so R has the entire column set as character class. When I use filter() to pipe only the numeric...

r,data.frame,pivot,melt,tidyr
Continuing from my previous post, I am now having 1 more column of ID values that I need to use to pivot rows into columns. NUM <- c(1,2,3,1,2,3,1,2,3,1) ID <- c("DJ45","DJ45","DJ45","DJ46","DJ46","DJ46","DJ47","DJ47","DJ47","DJ48") Type <- c("A", "F", "C", "B", "D", "A", "E", "C", "F", "D") Points <- c(9.2,60.8,22.9,1012.7,18.7,11.1,67.2,63.1,16.7,58.4) df1 <- data.frame(ID,NUM,Type,Points) df1:...

replace a numerical column by a character column in R with a defined mapping

r,mapping,dplyr,tidyr
I did some searching but could not find an answer to my problem. Say I have a data frame with a column student_id in integers and some other columns. I also have another mapping table containing two columns, with the 1st one being student_id and the 2nd one student_name in...

Tidyr's gather() with NAs

r,lubridate,tidyr
I am using tidyr and lubridate to convert a wide table to a long table. The following works just fine. > (df <- data.frame(hh_id = 1:2, bday_01 = ymd(20150309), bday_02 = ymd(19850911), bday_03 = ymd(19801231))) hh_id bday_01 bday_02 bday_03 1 1 2015-03-09 1985-09-11 1980-12-31 2 2 2015-03-09 1985-09-11 1980-12-31 >...

extracting values from column using tidyr

regex,r,gsub,tidyr
I have data.frame annotwhich contains following columns: annot Name GOs dd_1 C:extracellular space; C:cell body; P:cell migration process; P:NF/ß pathway dd_2 C:Signal transduction; C:nucleus; F:positive regulation; P:single organism; P:positive(+) regulation dd_3 C:cardiomyceltes; C:intracellular pace; F:putative; F:magnesium ion binding; F:calcium ion binding; P:visual perception; P:blood coagulation dd_4 F:poly(A) RNA binding; P:DNA-templated...

Retain attributes when using gather from tidyr (attributes are not identical)

r,tidyr
I have a data frame that needs to be split into two tables to satisfy Codd's 3rd normal form. In a simple case, the original data frame looks something like this: library(lubridate) > (df <- data.frame(hh_id = 1:2, income = c(55000, 94000), bday_01 = ymd(c(20150309, 19890211)), bday_02 = ymd(c(19850911, 20000815)),...

How do you compare data from two experiments

r,dplyr,tidyr
I am often trying to measure percentage changes under two distinct scenarios/test/period. An example dataset: library(dplyr) set.seed(11) toy_dat <- data.frame(state = sample(state.name,3, replace=F), experiment=c('control','measure'), accuracy=sample(30:50, size=6, replace=T), speed=sample(21:39, size=6, replace=T)) %>% arrange(state) state experiment accuracy speed 1 Alabama measure 31 24 2 Alabama control 36 37 3 Indiana control 30...

tidyr: spread without expanding all columns

r,tidyr
Getting around to learning tidyr and having trouble with spread(). Here's a fake experimental dataset: library(tidyr) df <- structure(list(mood = c(0.855, -0.103, 0.421, -0.222, 0.772, -0.027, -1.088, 0.923, -1.516, -1.503, -0.358, -0.357, -0.344, 0.294, 0.348, -0.174, 0.872, -1.188, 0.842, -0.246, -0.758, 0.674, 0.045, 0.72, -1.253, 0.00599999999999995, -0.0749999999999999,1.623, -1.754, -0.44, -0.607,...