FAQ Database Discussion Community


Generate pairings within World Cup tournament groups

python,r,pandas,plyr,split-apply-combine
I put some data together for the 2015 FIFA Women's World Cup: import pandas as pd df = pd.DataFrame({ 'team':['Germany','USA','France','Japan','Sweden','England','Brazil','Canada','Australia','Norway','Netherlands','Spain', 'China','New Zealand','South Korea','Switzerland','Mexico','Colombia','Thailand','Nigeria','Ecuador','Ivory Coast','Cameroon','Costa Rica'], 'group':['B','D','F','C','D','F','E','A','D','B','A','E','A','A','E','C','F','F','B','D','C','B','C','E'],...

Find top deciles from dataframe by group

r,data.frame,rank,quantile,split-apply-combine
I am attempting to create new variables using a function and lapply rather than working right in the data with loops. I used to use Stata and would have solved this problem with a method similar to that discussed here. Since naming variables programmatically is so difficult or at least...

Programattically calling group_by on a varying variable

r,dplyr,split-apply-combine
Using dplyr, I'd like to summarize [sic] by a variable that I can vary (e.g. in a loop or apply-style command). Typing the names in directly works fine: library(dplyr) ChickWeight %>% group_by( Chick, Diet ) %>% summarise( mw = mean( weight ) ) But group_by wasn't written to take a...

R: subsetting and ordering large data.frame without forloop

r,for-loop,data.table,dplyr,split-apply-combine
I have long table with 97M rows. Each row contains the information of an action taken by a person and the timestamp for that action, in the form: actions <- c("walk","sleep", "run","eat") people <- c("John","Paul","Ringo","George") timespan <- seq(1000,2000,1) set.seed(28100) df.in <- data.frame(who = sample(people, 10, replace=TRUE), what = sample(actions, 10,...