I have a dataframe consisting of a series of paired columns. Here is a small example. df1 <- as.data.frame(matrix(sample(0:1000, 36*10, replace=TRUE), ncol=1)) df2 <- as.data.frame(rep(1:12, each=30)) df3 <- as.data.frame(matrix(sample(0:500, 36*10, replace=TRUE), ncol=1)) df4 <- as.data.frame(c(rep(5:12, each=30),rep(1:4, each=30))) df5 <- as.data.frame(matrix(sample(0:200, 36*10, replace=TRUE), ncol=1)) df6 <- as.data.frame(c(rep(8:12, each=30),rep(1:7, each=30))) Example <-...

How can I generate more columns in a dataframe using apply with more columns? My df is: A B C 0 11 21 31 1 12 22 31 If I want to generate only one column that works perfectly: df['new_1']=df[['A','C','B']].apply(lambda x: x[1]/2, axis=1) The result is: A B C new_1...

I'm trying to avoid using loops by using apply to apply a user-defined function to a matrix. The problem I have is that there are additional parameters that my function uses and they differ for each column of the matrix. Below is a toy example. Say I have the following...

I reproduced in R a simulation that was originally done in Stata. I used 'for' loops since this is the only way I know how to make this work. It takes quite a long time to run, so I would like to use one of the 'apply' commands instead to...

I'm trying to prepare some demographic data retrieved from Eurostat for further processing, amongst others replacing any missing data with corresponding approximated ones. First I was using data.frames only, but then I got convinced that data.tables might offer some advantages over regular data.frames, so I migrated to data.tables. One thing...

I'm starting out in R, and pretty sure this is achievable via one of the apply functions. I have two differently sized vectors, a <- c('A', 'B', 'C') and b <- c('A', 'B', 'C', 'D', 'E'). I want to compare the values of a and b, and where they match,...

I have the following data: ID Value 1 3 1 5 How can I compute the mean by ID, and put the mean in the data frame as a new variable such that it is repeated for the same ID. The result should look like this: ID Value Mean 1...

I have a data set that looks like this birds[,1:3] Source: local data frame [15 x 3] year month day 1 2015 5 13 2 2015 5 14 3 2015 5 15 4 2015 5 16 5 2015 5 17 6 2014 5 28 7 2014 5 29 8 2014...

I have a dataframe with a dimension column and 4 value columns. How can I subset the column such that all 4 columns for each record are less than a given x? I know I could do this manually using subset and specifying the condition for each column, but is...

I am trying to make a new column in my dataset give a single output for each and every row, depending on the inputs from pre-existing columns. In this output column, I desire "NA" if any of the input vales in a given row are "0". Otherwise (if none of...

I want to use apply instead of a for loop to speed up a function that creates a character string vector from paste-collapsing each row in a data frame, which contains strings and numbers with many decimals. The speed up is notable, but apply forces the numbers to fill the...

In reference to this question, I was trying to figure out the simplest way to apply a list of functions to a list of values. Basically, a nested lapply. For example, here we apply sd and mean to built in data set trees: funs <- list(sd=sd, mean=mean) sapply(funs, function(x) sapply(trees,...

I have a matrix of lists. How do I apply a function to each set of the lists and return a matrix of the same dimensions as my original matrix? I tried apply(X=data.matrix , MARGIN=c(1,2) , function(x) min(x$P) ) but it returned Error in min(x$P) : (converted from warning) no...

I have a matrix: mat <- matrix(c(0,0,0,0,1,1,1,1,-1,-1,-1,-1), ncol = 4 , nrow = 4) and I apply the following functions to filter out the columns with only positive entries, but for the columns that have negative entries I get a NULL. How can I suppress the NULLs from the output...

I have a subset of a data frame of 16 columns. They are all factors, with the same levels and labels. I am trying to use one of the apply() functions to assign the levels and labels at once, but my function is printing the results rather than assigning them...

How do I use apply in Scheme to multiply the first element of each tuple by a number? Example, if my list x = ( (1 2) (3 4) ) I want to do something like: (apply * 2 (car x)) so that it would return ( (2 2) (6...

This question already has an answer here: why all date strings are changed into numbers? 2 answers Why apply() converts my date objects to numeric before calling the user function? apply(matrix(seq(as.Date("2010-01-01"), as.Date("2010-01-05"), 1)), 1, function(x) { return(class(x)) }) [1] "numeric" "numeric" "numeric" "numeric" "numeric" And why as.Date() doesn't have...

Mapply applies a 2-dimensional function to the 1st elements of each m-dimensional vector, and then to the 2nd elements of each, etc. The result is an m-dimensional vector. For example > mapply(sum, 1:5, 12:16) [1] 13 15 17 19 21 Now, is there a DIRECT alternative to mapply that applies...

I have these loops : xall = data.frame() for (k in 1:nrow(VectClasses)) { for (i in 1:nrow(VectIndVar)) { xall[i,k] = sum(VectClasses[k,] == VectIndVar[i,]) } } The data: VectClasses = Data Frame containing the characteristics of each classes VectIndVar = Data Frame containing each record of the data base The two...

I'm running a large number of data frames with variable dimensions through a series of apply() calls that look something like the code below. df1 = t(data.frame('test'=c(0,0,1,0))) df1 = apply(df1,2,function(j){sub(0,'00',j)}) df1 = apply(df1,2,function(j){sub(1,'01',j)}) df1 = apply(df1,2,function(j){sub(2,'10',j)}) df1 In some rare cases where the data frame is size 1xn the first...

What is the proper way to do this? I have a function that works great on its own given a series of inputs and I'd like to use this function on a large dataset rather than singular values by looping through the data by row. I have tried to update...

I am attempting to use the count with zero occurrences based on a defined list within the apply function. I have managed to do these separately, but would ideally like to have them in a single line. Here is my aim: list <- c("x", "y", "z") df V1 V2 V3...

I am using distHaversine, which takes two points and gives a distance, i.e. distHaversine(c(35,-75),c(35.1,-74.9)) prints: [1] 11501.11 I have two matricies, A and B that are (m by 2) and (n by 2), i.e. A has m points and B has n points. How can I use distHaversine on A...

I would like to transform a matrix of latent scores to observed scores. One can do so by apply break points/thresholds to the original matrix, thus ending up having a new, categorical matrix. Doing so is simple, for example: #latent variable matrix true=matrix(c(1.45,2.45,3.45, 0.45,1.45,2.45, 3.45,4.45,5.45) ,ncol=3,byrow=TRUE) #breaks for the cut...

By matrix multiplication I get the following matrix, which, let's say, shows how many customers who purchased product A, sooner or later, also purchased product B, product C and so on. Obviously, the diagonal values represent 100% of all purchases of a particular product. I'm looking for a way of...

I have a data frame and a vector of unequal lengths. They do not share an id. df <- data.frame( id = factor(rep(1:24, each = 10)), x = runif(20)*100 ) a <- sort(runif(100*100)) Now, I would really like run over each row of the data frame and find the location...

I saw in a recent answer an apply family function with assignments built-in and can't generalize it. lst <- list(a=1, b=2:3) lst $a [1] 1 $b [1] 2 3 This can't yet be made into a data.frame because of the unequal lengths. But by coercing the max length to the...

I'm using Elastic Search, with query match_all and filtering. In my situation I want to apply a general filter and filters by condition. Here in pseudo: query: match all (works fine) filter range date between d1 and d2 (works fine without bullet 3) filter (apply only if field exists, but...

I have a data frame with measurements. One collumn show the measurements in mm, and the other in units (which is a relative scale varying with zoom-level on the stereo microscope I was using). I want to go through every row of my data frame, and for each "length_mm" that...

I am trying to apply a IDW (inverse distance weighting) to different groups in a database. I am trying to use dplyr to apply this function to each group, but i am making a mistake in the Split-Apply-Combine. The current function returns 10 values for each group of 10 observations,...

I am attempting to create a df with a new variable called 'epi' (stands for episode)... which is based on the 'days.since.last' variable. when the value of 'days.since.last' is greater than 90, I want the episode variable to increase by 1. Here is the original df deid session.number days.since.last 1...

for nested data. I tried <?php $names = array('firstnames' => array("Baba", "Billy"), 'lastnames' => array("O'Riley", "O'Reilly")); array_walk_recursive($names, function (&value, $key) { $value = htmlentities($value, ENT_QOUTES); }) foreach ($names as $nametypes) { foreach ($nametypes as $name) { print "$name\n"; } } ?> (An example from the book O'reilly PHP Cookbook 3rd...

I was given a large csv that is 115 columns across and 1000 rows. The columns have a variety of data, some is character-based, some is integer, etc. However, the data has a LOT of null variables of varying types (NA, -999, NULL, etc.). What I want to do is...

I need add new column with query result. I have this Query: SELECT DISTINCT Arrival , Flight , TotalPax.SumPassengers , TotalPaxLocal.SumLocalPassengers , STD , STA --, PassengerID --, Departure --, JourneyNumber --, SegmentNumber --, LegNumber --, InventoryLegKey --, RecordLocator FROM #TempLocalOrg tmp CROSS APPLY ( SELECT COUNT(1) AS SumPassengers FROM...

I am looking to split my dataframe into subsets according to the column "Height" with each subset having one row with a value and 0-Inf rows with NAs. This is, to be able to apply functions to the subsets afterwards, specifically order the rows according to their "Diameter" value,...

Given this code: test=matrix(c(1,2,3,4,5,6,7,8,9,10,11,12),4) splitData=data.frame(first=c(1,3),second=c(2,4)) apply(splitData,1,function (x) {test[x[1]:x[2],]}) I get this matrix: [,1] [,2] [1,] 1 3 [2,] 2 4 [3,] 5 7 [4,] 6 8 [5,] 9 11 [6,] 10 12 Why don't I get a list of matrices? Intended result: [[1]] [,1] [,2] [,3] [1,] 1 5 9...

I have the following data frame C. >>> C a b c 2011-01-01 0 0 NaN 2011-01-02 41 12 NaN 2011-01-03 82 24 NaN 2011-01-04 123 36 NaN 2011-01-05 164 48 NaN 2011-01-06 205 60 2 2011-01-07 246 72 4 2011-01-08 287 84 6 2011-01-09 328 96 8 2011-01-10 369...

I've hacked together a quick solution to my problem, but I have a feeling it's quite obtuse. Moreover, it uses for loops, which from what I've gathered, should be avoided at all costs in R. Any and all advice to tidy up this code is appreciated. I'm still pretty new...

I have a for loop like so: var myary = []; for(i=0; i<3; i++){ myary[i] = i; } //yields [0, 1, 2] I'd like to accomplish the same with myary.apply() or a functional equivalent, but I am not familiar with generating arithmetic sequences via functional methods in JavaScript. Is this...

I have data for different tissues like so tissueA tissueB tissueC gene1 4.5 6.2 5.8 gene2 3.2 4.7 6.6 And I want to calculate a summary statistic that is x = Σ [1-log2(i,j)/log2(i,max)]/n-1 where n is the number of tissues (here it is 3), (i,max) is the highest value for...

I have two dataframes that contain id, score, and studentName. I would like to create a dataframe that contains only ids that appear in both test1 and test2. Then, I would like to average the students' scores. Here is some sample data: test1 <- data.frame(id = numeric(0), score = integer(0),...

This solution is almost what I need, but do not worked to my case. Here is what I have tried: comb_apply <- function(f,...){ exp <- expand.grid(...,stringsAsFactors = FALSE) apply(exp,1,function(x) do.call(f,x)) } #--- Testing Code l1 <- list("val1","val2") l2 <- list(2,3) testFunc<-function(x,y){ list(x,y) } #--- Executing Test Code comb_apply(testFunc,l1,l2) comb_apply(paste,l1,l2) It...

I have a data frame df with four columns, e.g. A B C D x a 1 3 x a 3 4 x b 5 5 x b 6 8 y a 6 5 y a 8 9 y b 7 0 y b 4 2 I want to aggregate...

I have a matrix with n rows of observations. Observations are frequency distributions of the features. I would like to transform the frequency distributions to probability distributions where the sum of each row is 1. Therefore each element in the matrix should be divided by the sum of the row...

I would like to combine large rectangular matrices stored in multiple lists. E.g. rbind.fill.matrix {plyr} the i:th matrix from all N lists. The number of matrices n within each list is equal across N lists. #Dummy data using N=2, n=2 # binary matrices ls1 <- replicate(n=2, list(matrix(rbinom(1,0.5,n=20), nrow=2))) ls2 <-...

I was wondering if it was possible to produce a set of boxplots similar to that produced by this nested loop combinations using an apply function. It may not be possible/necessary but I thought it should be possible, I just cant wrap my head around how to do it. I...

I am looking to apply a function to a data frame and then store the results of that function in a new column in the data frame. Here is a sample of my data frame, tradeData: Login AL Diff a 1 0 a 1 0 a 1 0 a 0...

I have a list of data.tables library(data.table) set.seed(27) test <- list() test$a <- data.table(x = rnorm(n = 10), y = rnorm (n = 10)) test$b <- data.table(x = rnorm(n = 10), y = rnorm (n = 10)) Each member of the list has a unique name test In preparation to...

I want to create a program that generates random things when requested, such as letters in the below example. I've defined an abstract class with a companion object to return one of the subclasses. abstract class Letter class LetterA extends Letter { override def toString = "A" } class LetterB...

I'd like to know how I may speed up the following function, e.g. with Cython? def groupby_maxtarget(df, group, target): df_grouped = df.groupby([group]).apply(lambda row: row[row[target]==row[target].max()]) return df_grouped This function groups by a single column and returns all rows where each group's target achieves its max value; the resulting dataframe is returned....

I would like to use an objective function based on a list of elements, each of which is the result of applying a function over a dataframe (df) ((function is, say, variance of df's observations' "measure")). That is, I have a list of dfs. I naturally want to sapply my...

So, my goal is to take an input vector and to make an output matrix of different counters. So every time a value appears in my inputs, I want to find that counter and iterate it by 1. I understand that I'm not good at explaining this, so I illustrated...

Here is the data source: https://www.dropbox.com/s/z5jsvwbzz5fumqp/countyComplete.csv?dl=0 I want to multiply 2 columns (pop2010 * percapitaincome) for each county and then divide it by the count of state, grouped by state. How can I do it using any of the apply functions in R. here my try myfun<-function(x,y){ x*y } y<-county$per_capita_income...

I've searched here and on Google and haven't found an answer that I can apply to my situation. Lets say I have a dataframe with columns for Element 1, Element 2, Element 3, Metric, Other. I have another internal function that has three arguments (input_dataframe, element_position, metric_position) that I use...

I constantly struggle with cleanly iterating or applying a function to Pandas DataFrames of variable length. Specifically, a length 1 DataFrame slice (Pandas Series). Simple example, a DataFrame and a function that acts on each row of it. The format of the dataframe is known/expected. def stringify(row): return "-".join([row["y"], str(row["x"]),...

I have a Pandas DataFrame with numeric data. For each non-binary column, I want to identify the values larger than its 99th percentile and create a boolean mask that I will later use to remove the rows with outliers. I am trying to create this boolean mask using the apply...

I Have a problem with using the apply function in R. I made the following function: TrainSupportVectorMachines <- function(trainingData,kernel,G,C){ ####train het model fit<-svm(Device~.,data=trainingData,kernel=kernel,probability=TRUE, gamma =G, costs=C) return(fit); } I want to train the model with different values of Cost(c). Therefore, I tried the following commend: cst = matrix(2^(-4:-2),ncol=3) kernl =...

I would like to apply the distancePointSegment function to all points in my vector, both are given in the code snippet below. The function takes in 6 values, 2 of which are dynamic (column/row specific) and 4 are static. # Function that I want to apply: distancePointSegment <- function(px, py,...

Given a trivial function returning an array: scala> def methodReturnsArray() = { Array(1.0, 2.0) } methodReturnsArray: ()Array[Double] We can go ahead and invoke the function: scala> val myarr = methodReturnsArray myarr: Array[Double] = Array(1.0, 2.0) scala> myarr(0) res21: Double = 1.0 However, it is not possible to use the apply...

Here is duration data by time intervals. id <- c("A", "B", "B", "B", "C", "C", "D", "E", "F", "F", "F", "F") start <- c(368, 200, 230, 788, 230, 521, 272, 306, 0, 162, 337, 479) end <- c(373.98, 229.98, 233.98, 842.98, 239.98, 639.98, 285.98, 306.98, 95.98, 162.98, 339.98, 539.98) value...

It occurs when I use apply.daily for an asset that has a total of 10 days worth of intraday data called rs packages needed are: library("xts") library("highfrequency") Where the error occurs: ts <- apply.daily(rs,function(x){ aggregatets(x ,on="minutes", k=15)}) ** REPRODUCIBLE DATA ** rs <- structure(c(222950, 222880, 222960, 222975, 222800, 222750, 222769,...

I am new to R programming language and currently I working on some financial data. The problem is a bit complicated to describe so I think it's better to start it step by step. First here is a small portion of the master dataframe(named:log_return) I am working on: Date AUS.Yield...

This question already has an answer here: Joining aggregated values back to the original data frame 5 answers I have what I fear may be a simple problem, to which I almost have the solution (indeed, I do have a solution, but it's clumsy). I have a data frame...

I have a list of files containing output from a large model. I load these as a datatable using: files <- list.files(path.expand("/XYZ/"), pattern = ".*\\.rds", full.names = TRUE) dt<- as.data.table(files) This datatable "dt" has just 1 column, the file name. e.g XZY_00_34234.rds the 50th and 51st character of each file...

I have a dataframe with sets of scores, and sets of grouping variables, something like: s1 s2 s3 g1 g2 g3 4 3 7 F F T 6 2 2 T T T 2 4 9 G G F 1 3 1 T F G I want to run an...

I have a dataframe datwe with 37 columns. I am interested in converting the integer values(1,2,99) in columns 23 to 35 to character values('Yes','No','NA'). datwe$COL23 <- sqldf("SELECT CASE COL23 WHEN 1 THEN 'Yes' WHEN 2 THEN 'No' WHEN 99 THEN 'NA' ELSE 'Name ittt' END as newCol FROM datwe")$newCol I...

I have a data frame that needs to be re-represented. The original data frame has each row as a unique search term and the columns are all the resulting products. So each row is a different length. I want to turn this into a rectangular dataframe (called rectangle in the...

JSD matrix is a similarity matrix of distributions based on Jensen-Shannon divergence. Given matrix m which rows present distributions we would like to find JSD distance between each distribution. Resulting JSD matrix is a square matrix with dimensions nrow(m) x nrow(m). This is triangular matrix where each element contains JSD...

I want to apply a function that computes something similar to a weighted average absolute deviation of all the elements of my data frame. I already have a solution for it, but it seems quirky to me because I have to use groupby with a lambda function that always returns...

I'm sure there is an easy solution to this but I cannot seem to output the correct values. I have a dataframe and I would like to calculate an average based on values above a certain value, in this case 150. df1 <- as.data.frame(matrix(sample(0:1000, 36*10, replace=TRUE), ncol=1)) df2 <- as.data.frame(matrix(sample(0:500,...

Suppose (small numbers in this example) I have an array that is 3 x 14 x 5 call this set.seed(1) dfarray=array(rnorm(5*3*14,0,1),dim=c(3,14,5)) I have a matrix that corresponds to this and is 39 (which is 13*3) x 14 Call this matrix: dfmat = matrix(rnorm(13*3*14,0,1),39,14) dfmat = cbind(dfmat,rep(1:3,13)) dfmat = dfmat[order(dfmat [,15]),]...

In short, I have a list of expressions that I want to apply to each row of a dataframe. This is very similar to this question, but there is a subtle difference in that I do not have a list of functions, but have a list of expressions. Here's what...

I have a dataframe composed of 3 columns and ~2000 rows. ID DistA DistB 1 100 200 2 239 390 3 392 550 4 700 760 5 770 900 The first column (ID) is a unique identifier for each row. I'd like my script to read each row, and subtract/compare...

I have 2 objects: A data frame with 3 variables: v1 <- 1:10 v2 <- 11:20 v3 <- 21:30 df <- data.frame(v1,v2,v3) A numeric vector with 3 elements: nv <- c(6,11,28) I would like to compare the first variable to the first number, the second variable to the second number...

I have the following data prepared Timestamp Weighted Value SumVal Group 1 1600 800 1 2 1000 1000 2 3 1000 1000 2 4 1000 1000 2 5 800 500 3 6 400 500 3 7 2000 800 4 8 1200 1000 4 I want to calculate for each group...

I want to compute the mean over the 3-D of a multidimensional array. As this dimension is supposed to be the time, I wanted to computed monthly means. For that, I tried to use apply, but I am not sure where the problem is. Let's say my data is as...

My UDF: testfn = function(x1, x2, x3){ if(x1 > 0){y = x1 + x2 + x3} if(x1 < 0){y = x1 - x2 - x3} return(y) } My Sample Test set: test = cbind(rep(1,3),c(2,4,6),c(1,2,3)) Running of apply: apply(test, 1, testfn, x1 = test[1], x2 = test[2], x3 = test[3]) This...

I am writing a retry function with async and await def awaitRetry[T](times: Int)(block: => Future[T]): Future[T] = async { var i = 0 var result: Try[T] = Failure(new RuntimeException("failure")) while (result.isFailure && i < times) { result = await { Try(block) } // can't compile i += 1 } result.get...

I am having problems with trying to create a new column using a conditional calculation based on a function. I have some small datasets that are used to interpolate a reference temperature (Tref) based on altitude (CalcAlt). The function works when I try to do a single calculation but I...

I'm trying apply a function to column in a dataframe that contains dates and keep getting an error. Not exactly sure what I'm doing wrong. Here is my df: dates total 1 2014-12-08 01:10:00 163.7 2 2014-12-08 01:10:00 163.9 3 2014-12-08 01:12:00 163.6 4 2014-12-08 08:27:00 163.0 5 2014-12-08 08:35:00...

I have a three dimensional data structure reflecting data at particular longitudes, latitudes, and depth. I would like to apply a function to this data. Normally, say I want to find the depth-averaged value I'd do the following: apply(MyData, MAR = c(1, 2), mean) which makes sense to me. What...

This question already has an answer here: Deleting columns from a data.frame where NA is more than 15% of the column length 1 answer I'm working with a data frame resembling the extract below. sample.df Obs Var1 Var2 Var3 A0001 21 21 21 A0002 21 78 321 A0003 32...

I have data for 90 climate stations. For each station, I have made 100+ simulations using a statistical model. So, in R, I have 90 dataframes, each dataframe has 100+ simulations arranged column-wise. Now, I would like to fit an extreme value distribution (EVD) to each climate station. That is,...

My reproducible R example: f = runif(1500,10,50) p = matrix(0, nrow=1250, ncol=250) count = rep(0, 1250) for(i in 1:1250) { ref=f[i] for(j in 1:250) { p[i,j] = f[i + j - 1] / ref-1 if(p[i,j] == "NaN") { count[i] = count[i] } else if(p[i,j] > (0.026)) { count[i] = (count[i]...

I want to use regex to capture substrings - I already have a working solution, but I wonder if there is a faster solution. I am applying applyCaptureRegex on a vector with about 400.000 entries. exampleData <- as.data.frame(c("[hg19:21:34809787-34809808:+]","[hg19:11:105851118-105851139:+]","[hg19:17:7482245-7482266:+]","[hg19:6:19839915-19839936:+]")) captureRegex <- function(captRegEx,str){ sapply(regmatches(str,gregexpr(captRegEx,str))[[1]], function(m) regmatches(m,regexec(captRegEx,m))) } applyCaptureRegex <- function(mir,r){...