FAQ Database Discussion Community


Subset a dataframes in a list based on the content of a vector

r,list,subset
I have a list of five dataframes. Each dataframe contains one dimension column and 4 value columns. I would like to subset each dataframe in the list based on the contents of a vector. df <- data.frame(x = 1:100, y2 = runif(100, 0, 100), y3 = runif(100, 0, 100), y4...

Subset multiples data frames in a list that match a certain condition

r,list,subset
I am new in this and i am stuck. I have a list of data frames that have information about pressure, temperature and salinity. I want to subset all of them and keep only the values of temperature and salinity when the pressure is equal to 5. Below this is...

Selecting rows by offsetting

r,subset
I have this data frame, lets call it my_df. It looks like this: my_df <- data.frame(rnorm(n = 30,sd=.5),rep(c("a","b","c"),each=10)) names(my_df) <- c("num","let") head(my_df) num let 1 0.01202600 a 2 1.09025768 a 3 -0.08656178 a 4 -0.04847073 a 5 -0.63750258 a 6 0.58846135 a What I want to do is select all...

Multiple filter using grep and subset in R

r,grep,subset,grepl
I'm trying to create a filter to remove lines from a dataset using grep and subset together. Sample dataset: id <- 1:10 problem <- c("a" , "b", "c", "d", "a","b","c","a", "b", "a") solution1 <- c("eat", "sleep", "drink", "play", "sleep", "play", "play", "drink", "play", "eat") solution2 <- c("read", "read", "eat", "drink",...

How to subset two different periods in a text file?

r,text,subset
I have a text file for two years of data that I want to extract from it two different periods(July to sept for each year) Read the file: wg=read.table("C:\\Users\\ERIE.txt", sep ='' , header =TRUE) head(wg) Year day hour mint valu1 valu2 date 105169 2008 1 7 30 0.045 0.014 2008-01-01...

(R) [] / subset() returns an empty data frame

r,subset
I have a large dataset that looks something like this with a few hundred thousand more entries, saved as data: Group1 dtm_Flight_Date Departure Arrival str_Fare_Category_Ident 1 8P104 06/11/2010 9:05 YYJ YVR B 2 8P104 06/11/2010 9:05 YYJ YVR K 3 8P104 06/11/2010 9:05 YYJ YVR L 4 8P104 06/11/2010 9:05...

Is one serie of constraints a subset of the other?

prolog,constraints,subset
I have two series of constraints S and S', they describe possibly infinitely large sets. Say for example S: x <= 10 and y <= x and S': x <= 20 and y <= 20. Now I want to know if S is a subset of S'? I thought I...

subsetting 1-column matrix deletes rownames [duplicate]

r,matrix,subset
This question already has an answer here: How to subset matrix to one column, maintain matrix data type, maintain row/column names? 1 answer When I try to subset a 1-colum matrix by it's row names the subsetting works but an numeric vector is returned. can you somehow prevent that...

Subsetting rows by passing an argument to a function

r,subset
I have the following data frame which I imported into R using read.table() (I incorporated read.table() within read_data() which is a function I created that also throw messages in case the file name is not written appropriately): > raw_data <- read_data("n44.txt") [1] #### Reading txt file #### > head(raw_data) subject...

Subsetting Outliers from a data frame by using the results of a boxplot diagram

r,subset,rounding
I transformed my data into a boxplot (used geom_boxplot of ggplot), so that the outliers got visible. Afterwards I wanted to remove them from my data. That is why I used "ggplot_build" to get on all the informations of the plot and saved it with a new name. Outlier_boxplot<-ggplot_build(boxplot) Now...

Subsetting by summing number of values in clustered data in R [duplicate]

r,subset,multi-level
This question already has an answer here: subset() a factor by its number of observation 1 answer I am trying to solve a data formatting problem. I have a data frame where the variables are leveled into schools and students. For example: Schools Students SchoolA Student1 SchoolA Student2 SchoolA...

How to subtract a complete character vector with repeated characters from the other vector in R

r,vector,subset,subtract
I want to subtract y from x, which means remove one "A", three "B" and one "E" from x, so xNew will be c("A", "C", "A","B","D"). It also means length(xNew)=length(x) - length(y) x <- c("A","A","C","A","B","B","B","B","D","E") y <- c("A","B","B","B","E") setdiff doesn't work because xNew <- setdiff(x,y) xNew [1] "C" "D" match...

Filtering rows by a criteria defined by variables

r,data.table,subset
I've the following data.table structure(list(val1 = c(1, 2, 1, 3, 4, 5, 3), val2 = c(4, 5, 6, 4, 2, 4, 5)), .Names = c("val1", "val2"), row.names = c(NA, -7L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0xedae28>) What I would like to do is filter the rows in it...

Keep the second occurrence in a column in R

r,conditional,subset,find-occurrences
I have quite a simple dataset: ID Value Time 1 censored 1 1 censored 2 1 uncensored 3 1 uncensored 4 1 censored 5 1 censored 6 2 censored 1 2 uncensored 2 2 uncensored 3 2 uncensored 4 2 censored 5 I want to keep the first uncensored occurrence,...

How to create a new variable with values from different variables if another variable equals a set value in R?

r,subset
I have a complicated question that I will try to simplify by simplifying my dataset. Say I have 5 variables: df$Id <- c(1:12) df$Date <- c(NA,NA,a,a,b,NA,NA,b,c,c,b,a) df$va <- c(1.1, 1.4, 2.5, ...) #12 randoms values df$vb <- c(5.9, 2.3, 4.7, ...) #12 other random values df$vc <- c(3.0, 3.3, 3.7,...

Combining the terms of two MuMIn subset of models to create new subset in R

r,subset,lm
I am conducting an analysis where I am chosing between variables in two steps. Step 1: choose the best variables and combinations of variables from each of two set of variables (e.g., intrinsic & extrinsic variables). Step 2: take the best variables combinations of each subset and create new set...

Loop within dataframe subset

r,loops,subset
This is probably a basic question in R, but I am trying to loop data within subsets of a larger data frame. I have added the 'data=sub240' line within the 'while' command, but this leads to a brackets error, which I suspect is indicative of a larger problem. Can anyone...

Dynamic Programing approach for a subset sum

java,algorithm,recursion,dynamic-programming,subset
Given the following Input 10 4 3 5 5 7 Where 10 = Total Score 4 = 4 players 3 = Score by player 1 5 = Score by player 2 5 = Score by player 3 7 = Score by player 4 I am to print players who's combine...

Subset of a data set

r,subset
The question relates to iris data set: library(datasets) data(iris) How to extract column 'Sepal.Width' for the species virginica?...

R How to mutate a subset of rows

r,data.table,subset,dplyr
I am having trouble mutating a subset of rows in dplyr. I am using the chaining command: %>% to say: data <- data %>% filter(ColA == "ABC") %>% mutate(ColB = "XXXX") This works fine but the problems is that I want to be able to select the entire original table...

Subset duplicates based on two columns [duplicate]

r,duplicates,subset
This question already has an answer here: R - Remove all unique rows 3 answers My data looks like this: A B 1 2 1A 2 1A 2 2 3 2 4 2 4 3A 0 3A 0 4A 1 4A 1 5 5 I want to subset the...

Reactive subset in ddply for rmarkdown shiny

r,shiny,subset,rmarkdown
I am trying to calculate and plot % yield of some data based on user definable inputs. I am using rmarkdown and shiny to do this. I keep getting stuck when passing a reactive subset through ddply to count the number of rows in the subset.."invalid (null) left side of...

How to subset data with several conditions in R?

r,subset
I have a text file consisting of 6 columns as shown below. the measurements are taken each 30 mint for several years (2001-2013) and sometimes differ each 32 or 39 for certain days. I want to extract and select certain range from this data. to read the file: LR=read.table("C:\\Users\\dat.txt", sep...

How to subset rows with only odd numbers in 1 column in R [closed]

r,subset
I want to compile only rows containing odd numbers in one of the columns. An example of my data frame is below: V1 V2 V3 V4 V5 V6 V7 V8 14221 USDJPY 20030507 20:00:00 116.33 116.19 116.47 116.25 14222 USDJPY 20030507 21:00:00 116.24 116.24 116.42 116.32 14223 USDJPY 20030507 22:00:00...

How to subset a data.frame?

r,data.frame,subset
I have a data set like this a <- data.frame(var1 = c("patientA", "patientA", "patientA", "patientB", "patientB", "patientB", "patientB"), var2 = as.Date(c("2015-01-02","2015-01-04","2015-02-02","2015-02-06","2015-01-02","2015-01-07","2015-04-02")), var3 = c(F, T, F, F, F, T, F) ) sequ <- rle(as.character(a$var1)) a$sequ <- sequence(sequ$lengths) producing > a var1 var2 var3 sequ 1 patientA 2015-01-02 FALSE 1 2...

Creating data subset with a vector - why does data have to be sorted?

r,subset
I am hoping someone can help with the following problem i am having while creating subsets of my data: I have a data set titled 'LakeK_all'. One of the columns is titled 'Lake' and contains a list of lake names (S001-Out, S002-Out, Y001-Out, Y002-Out,...). I would like to pull out...

R to Python subsetting via vector

python,r,pandas,subset
I'm a python newbie but have some R experience. In R if I'd like to subset a data.frame I can use a variable to do something like this: # Columns # Assign column names to variable colsToUse <- c('col1','col2','col3') # Use variable to subset df2 <- df1[,colsToUse] # Rows #...

Subset rows that contain some sequence of characters in one variable

r,subset
I would like to subset the rows that have a concrete sequence of characters in one variable. For example I would like to subset the rows that have at least three consecutive 1 ("111"; e.g. "01110", "11111", "01111") in the variable history. Here is some example data: id <- c(1,2,3,4,5,6,7,8,9,10)...

Unexpected behavior in subsetting aggregate function in R

r,logic,aggregate,subset,subsetting
I have a data frame that contains with the following format: manufacturers pricegroup leads harley <2500 # honda <5000 # ... ... .. I am using the aggregate function to pull out data in the following way: aggregate( leads ~ manufacturer + pricegroup, data=leaddata, FUN=sum, subset=(manufacturer==c("honda","harley"))) I noticed this is...

R - Subsetting to group of matrices based on a condition

r,subset
I am trying to subset a matrix based on a specific value in a column. But I want my subsets in a number of separate matrices. For eg, say I have a matrix ccc which is aaa=c(1,1,1,2,5,1,2,1,1,3,1,1,1,1,1,1,4) bbb=c(4,4,4,4,3,3,3,3,2,2,2,2,3,4,5,6,7) ccc=cbind(aaa,bbb) I want to subset using a condition which is ccc[,1]==1 and...

How to Remove Patients with Less Than 2 Visits of Data [duplicate]

r,subset,panel-data
This question already has an answer here: subset() a factor by its number of observation 1 answer I have a longitudinal dataset structured as 1 row per visit. A numerical patient ID number indicates unique patients. How can I remove all patients with less than 2 observations from my...

CPLEX/OPL model - constraints with subset index

constraints,scheduling,subset,cplex,opl
I am currently programming a CPLEX/OPL model using IBM ILOG CPLEX Optimization Studio. I have a problem with using sums or indexes which contain a subset and depend on another parameter/variable, e.g. check the following constraints: NB 2,3,4,8). Can anyone help me with incorporating these constraints properly? Please find the...

subseting data that starts with a letter

r,grep,subset
I have a dataset with 45 columns and >8000 observations. One of the variables in the columns is city-name. I want to remove all observations that are located in cities that begin with the letter "S". How would I do this? I'm pretty new to R, so sorry if this...

Calculating the slope of each row in a large data set using R

r,data.frame,subset,linear-regression
I have a large data set of the following format: First column is type, and the subsequent columns are different times that 'type' happens. I want to calculate the slope of each row (~7000 rows) for subset T0-T2 and then t0-t2 and output that information, then get the average of...

split dataframe in groups before each non-NA

r,split,subset,apply,na
I am looking to split my dataframe into subsets according to the column "Height" with each subset having one row with a value and 0-Inf rows with NAs. This is, to be able to apply functions to the subsets afterwards, specifically order the rows according to their "Diameter" value,...

Filtering a Dataset by another Dataset in R

r,data.frame,subset
The task I am trying to accomplish is essentially filtering one dataset by the entries in another dataset by entries in an "id" column. The data sets I am working with are quite large having 10 of thousands of entries and 30 or so variables. I have made toy datasets...

Subset column based on a range of time

r,time,filter,subset,plyr
I am trying to subset a data frame based on a range of time. Someone has asked this question in the past and the answer was to use R CMD INSTALL lubridate_1.3.1.tar.gz (see link: subset rows according to a range of time. The issue with this answer is that I...

Sum of all subparts of an array of integers

arrays,algorithm,sum,big-o,subset
Given an array {1,3,5,7}, its subparts are defined as {1357,135,137,157,357,13,15,17,35,37,57,1,3,5,7}. I have to find the sum of all these numbers in the new array. In this case sum comes out to be 2333. Please help me find a solution in O(n). My O(n^2) solution times out. link to the problem...

Slicing rows of pandas dataframe between

pandas,subset,slice
I have a pandas dataframe with a column that marks interesting points of data in another column (e.g. the locations of peaks and troughs). I often need to do some computation on the values between each marker. Is there a neat way to slice the dataframe using the markers as...

Include column after grouping using datatable

r,include,data.table,subset
My goal is to calculate a group % column by zip. I created the % column by zip, but keep losing my group ('cgrp') variable. How can I keep this in my end results? My data table script is giving me the below results: zip V1 1: 12007 19.35484 2:...

how to subset in r for this particular condition?

r,subset
df1 and df2 have columns a,b. I want to subset data from df1 such that each entry in df1$a along with df1$b is in df2$a along with df2$b. df1 a b c 1 m df1 2 f df1 3 f df1 4 m df1 5 f df1 6 m df1...

Return duplicates in a list based on 2 criteria

r,list,duplicates,subset
I have a list that contains 2 data sets. a = data.frame(c(1,1,1,1,1,2,2,2,2,2), c("a","b", "c", "d","e","e","f", "g", "h","i")) colnames(a) = c("Numbers","Letters") c = data.frame(c(3,3,3,3,3,4,4,4,4,4), c("q","r", "s", "t","u","u","v", "w", "x","y")) colnames(c) = c("Numbers","Letters") my.list = list(a,c) my.list I am interest in returning only the letters that are found in common between the...

How to choose N number of assets that satisfy a parameter in R

r,subset
I would like to choose 4 assets that its BETAdn column sum to 0. The large matrix is called ALPHABETA & i run the following to get the a subset of assets that satisfy the sum to zero parameter. AB <- subset(data.frame((ALPHABETA)), BETAdn+BETAdn >= -0.01 & BETAdn+BETAdn <= .01) The...

R: Subsetting and plotting a SpatialPoints object

r,plot,subset,sapply,sp
It seems this question has been asked a couple of times in different forms, but I could't find the right solution. I have a SpatialPoint object with several Polygons and would like to subset and plot one polygon using the slot "ID". Using the example from this question: Sr1 =...

Subqueries: What am I doing fundamentally wrong?

sql,sqlite3,subquery,subset
I thought that selecting values from a subquery in SQL would only yield values from that subset until I found a very nasty bug in code. Here is an example of my problem. I'm selecting the rows that contain the latest(max) function by date. This correctly returns 4 rows with...

R: Avoid repeating lines of code using R subsets in scripts

r,loops,macros,subset
I'm very new to R - but have been developing SAS-programs (and VBA) for some years. Well, the thing is that I have 4 lines of R-code (scripts?) that I would like to repeat 44 times. Two times for each of 22 different train stations, indicating whether the train is...

R: Handling subsets using dynlm

r,time-series,subset
I want to compute the following two regressions using R: library("dynlm") zooX = zoo(test[, -1]) lmx <- dynlm(d(Euribor3)~d(Ois3)+d(CDS)+d(Vstoxx)+d(log(omo))+d(L(Euribor3, 1)), data=zooX[1:16]) summary(lmx) zooX = zoo(test[, -1]) lmx <- dynlm(d(Euribor3)~d(Ois3)+d(CDS)+d(Vstoxx)+d(log(omo))+d(L(Euribor3, 1)), data=zooX[17:31]) summary(lmx) The only difference between those two models is the subset (the first[1:16] and the second [17:31]). Now these two...

Extract series of observations from dataframe for complete sets of data

r,loops,data.frame,pattern-matching,subset
I have a data frame of values composed of 5 variables (class in brackets) 1) DateTime (as.POSIXct), 2) ID (character), 3) Sensor 1 (numeric), 4) Sensor 2 (numeric), 5) Sensor 3 (numeric) This data was collected from 5 tagged fish. Each fish has one tag with 3 sensors on it,...

efficient way to remove duplicates from list of custom objects in python

python,list,filtering,subset,subsetting
I have a custom class of objects with an assortment of various attributes of different types. I would like to remove duplicates from a list of these objects based on one of these attributes. Something like this, but actually get a list of the objects rather than a list of...

How to get subset of two arrays into a different array using javascript?

javascript,arrays,subset
i have 2 arrays. arr1=[1,8,1,3,2] arr2=[3,8,1] I want to put elements [8,1] subset into arr3. How can i do this using javascript?I used the following code. But doesn't seemed to be working. function subsetFind() { var arr1 = [1,8,1,3,2] var arr2 = [3,8,1] var arr3 = []; var arr1length =...

How to subset a longtable in R and create boxplots with ggplot

r,plot,ggplot2,subset,plyr
Given the following example: set.seed(1) tmp.data<-data.frame(group=rep(c("x","y","z"),8), year=rep(c(2000:2003),6), value=runif(24, 1, 100)) I can create a simple boxplot with group affiliations: boxplot.example<-ggplot(data=tmp.data) boxplot.example.simple<-boxplot.example + geom_boxplot(aes(x=group,y=value)) # plot boxplot.example.simple However I would like to create seperate Boxplots for each group and year in the same graphic. I tried it with the group function...

How to select columns by values in a row in R

r,subset,trigram
I have a large data frame marking occurrences of trigrams in a string, where the strings are the rows, the trigrams are the columns, and the values mark whether an trigram occurs in a string. so something like this: strs <- c('this', 'that', 'chat', 'chin') thi <- c(1, 0, 0,...

Subsetting / unmerging data frames in R based on a variable

r,merge,dataframes,subset
I have two data frames in R. one <- data.frame( x = letters[1:10] , y = 1:10, z = rnorm(10)) two <- data.frame( x = letters[1:20] , y = 1:20, z = one$z) I want to "un-merge" these data frames based on the variable x... What I mean is that......

R keep rows where all values are < a threshold

r,rows,subset
I have a data.frame with 19 columns and 2000+ rows. Column 1 is the dependent variable, and columns V1:V17 independent variables. I would like to keep only the rows where the value for EVERY independent variable listed is between 0 and 0.30. However, each row has a varying number of...

R returns the same number of rows after subsetting but clearly deletes rows

r,subset
Basically I have a matrix with 24028 rows and I want to extract a subset of this matrix that meets a certain condition. I use: Sin <- actulab[actulab[,"Atteint_Limite"] == "0",] Here's what I get when I use tail(Sin) INDEX Atteint_Limite Limite Sev_cen FRANC ANNEE MOISSIN MONTBATI 24019 24019 0 50000...

Convert a character variable into a logical expression in order to use it later inside the subset argument of the subset() function

r,subset,logical
I'm trying to convert a character variable into a logical expression in order to use it later inside the subset argument of the subset() function, and all of this is inside a bigger function called early_prep() I created. The problem is when I execute early_prep(file_name = "n44.txt", keep_rows = "block...

subset my df provided that each ID has >10 obs a month

r,subset
I am trying to clean my stocks' df and I need to get rid of the ones that have less than 10 observations per month. Already checked these 2 threads: subsetting-based-on-observations-in-a-month and ddply-for-sum-by-group-in-r But I'm a noob and I cannot figure it out yet. In short: Please, help me out...

Need to count the number of times a threshold value is met (or exceeded) per year (using R)

r,count,subset
I am working with several temperature datasets and trying to pull out when the temperature meets or exceeds a threshold value. Ideally, I want to know how many times (count) that value is met/ exceeded each year for ~100 yrs of data AND when (what date) that value is first...

Subset specific rows with vector of identifiers - warning messages

r,data.frame,subset,which
I would like to subset specific rows with a vector of identifiers. Here is my data data = rbind(c('B11008Z', 'Men', '13'), c('B11040Z', 'Women', '14'), c('B11040E', 'Women', '12') ) colnames(data) <- c('id', 'sex', 'age') data = as.data.frame(data) When I enter the personal id one by one, there is not problem. data[data$id...

combination of pair subsets from a list in lisp

list,lisp,subset
How to create all possible pairs subsets from a list in conman lisp. For example the list A contain four elements list A= ("A" "B" "C" "D") the expected output is as follows: (("A","B"),("A","C"), ("A","D"),("B","C"),("B","D"), ("C","D")) Could someone please help me out to generate these subsets. Thanks a lot...

R use string to refer to column

r,string,subset
I would like to subset a dataframe by referring to a column with a string and select the values of that column that fulfill a condition. From the following code employee <- c('John Doe','Peter Gynn','Jolie Hope') salary <- c(21000, 23400, 26800) startdate <- as.Date(c('2010-11-1','2008-3-25','2007-3-14')) employ.data <- data.frame(employee, salary, startdate) salary_string...

How do I query a mongo document containing subset of nested array

mongodb,meteor,subset
Here is a doc I have: var docIHave = { _id: "someId", things: [ { name: "thing1", stuff: [1,2,3,4,5,6,7,8,9] }, { name: "thing2", stuff: [4,5,6,7,8,9,10,11,12,13,14] }, { name: "thing3", stuff: [1,4,6,8,11,21,23,30] } ] } This is the doc I want: var docIWant = { _id: "someId", things: [ { name:...

Subset data with a weighting factor

r,subset,weight
I am trying to subset a data frame and use a column value as the weighting factor. For example, lets say we have these data. set.seed(123) Data <- data.frame(x1 = sample(c(0,1),100, replace = T), x2 = round(runif(100, min=0, max=100),0), Prob = round(runif(100),2)) head(Data) > head(Data) x1 x2 Prob 1 0...

Outputting various subsets from one data frame based on dates

r,loops,subset,lubridate
I want to create numerous subsets of data based on date sequences defined from a separate dataframe. For example, one dataframe will have dates and daily recorded values across multiple years. I have created a hypothetical dataframe below. I want to conduct various subsets from this dataframe based on start...

subset data from matching package

r,match,subset,dplyr
So following the example from the Matching package and in particular the GenMatch example Link to package description pp11. We have the following code library(Matching) data(lalonde) attach(lalonde) lalonde$ID <- 1:length(lalonde$age) X = cbind(age, educ, black, hisp, married, nodegr, u74, u75, re75, re74) BalanceMat <- cbind(age, educ, black, hisp, married, nodegr,...

Getting specific rows according to a subset in R

r,data.frame,subset
I've been trying for hours with this one. I have a dataset with two columns, let's call them V1 and V2.I also have a list of imporatant V1 values - Vx. I managed to acquire a subset of V1 with intersect function, so: intersect <- intersect(df$V1,Vx) Now I am desperately...

subset data.table keeping only elements greater than certain value applied to all columns

r,data.table,subset
I would like to subset news (below) to create news2 (further below) which will only include the rows/columns where the abs(value) in each element of news > 0.01. Below is the code that I have tried: gr <- data.frame(which(abs(news[, 1:ncol(news), with = FALSE]) > 0.01, arr.ind = TRUE)) news2a <-...

Filter rows in R using subset

r,filter,subset
I have 12 columns of data in a table called df, the first column contains several thousand strings such as AA150502-01,AA150502-02,BB150502-01,BB150502-03 etc. I want to filter the table so that i only see the rows ending with the suffix -01, how do i go about doing this? I so far...

Make two random teams form dataset and get overall team score in R

r,random,dataset,subset
Lets say I have a soccer team of 10 players (players) from which I should make two subteams of 5 players each and then compute the overall score for each team. players <- read.table(text= "paul 3 ringo 3 george 5 john 5 mick 1 ron 2 charlie 3 ozzy 5...

Subset based on a criteria

macros,sas,subset
I want to subset my dataset (using sas) every time the count variable equals to 1. For example the following dataset would split into two datasets: Over Ball Bowling Runs_scored Count 39 1 Ali 1 1 39 2 Ali 1 2 39 3 Ali 2 3 39 4 Ali 1...

Arguments for Subset within a function in R colon v. greater or equal to

r,operators,subset
Suppose I have the following data. x<- c(1,2, 3,4,5,1,3,8,2) y<- c(4,2, 5,6,7,6,7,8,9) data<-cbind(x,y) x y 1 1 4 2 2 2 3 3 5 4 4 6 5 5 7 6 1 6 7 3 7 8 8 8 9 2 9 Now, if I subset this data to select...

How to subset by distinct rows in a data frame or matrix?

r,matrix,filter,data.frame,subset
Suppose I had the following matrix: matrix(c(1,1,2,1,2,3,2,1,3,2,2,1),ncol=3) Result: [,1] [,2] [,3] [1,] 1 2 3 [2,] 1 3 2 [3,] 2 2 2 [4,] 1 1 1 How can I filter/subset this matrix by whether or not each row has duplicate values? For example, in this case, I would only...

How to filter dataframe with multiple conditions?

r,data.frame,subset,dplyr
I have this dataframe that I'll like to subset (if possible, with dplyr or base R functions): df <- data.frame(x = c(1,1,1,2,2,2), y = c(30,10,8,10,18,5)) x y 1 30 1 10 1 8 2 10 2 18 2 5 Assuming x are factors (so 2 conditions/levels), how can I subset/filter...

Subset Data in R excluding 13

r,subset
Date Check 201006 1649.515 201007 1825.828 201008 1878.926 201009 1637.491 201010 1664.938 201011 1973.294 201012 2714.054 201013 24086.797 201101 2888.64 201102 2452.403 201103 2230.493 201104 1825.023 201105 1667.396 201106 1657.334 201107 1890.515 201108 1891.783 201109 1655.634 201110 1744.454 201111 2031.872 201112 2541.878 201113 24477.425 I have a dataset. All data...

Subsetting a data frame based on key spanning several columns in another (summary) data frame

r,data.frame,subset
I have a data frame a with 4 identifying columns: A, B, C, D. A second data frame b, created with ddply(), contains a summary of all the values for different Ds for every set of A,B,C. A third data frame c contains a subset of b with bad values...

Loop through various data subsets in lm() in R

r,loops,regression,subset
I would like to loop over various regressions referencing different data subsets, however I'm unable to appropriately call different subsets. For example: dat <- data.frame(y = rnorm(10), x1 = rnorm(10), x2 = rnorm(10), x3 = rnorm(10) ) x.list <- list(dat$x1,dat$x2,dat$x3) dat1 <- dat[-9,] fit <- list() for(i in 1:length(x.list)){ fit[[i]]...

Subset a dataframe based on time variable

r,date,datetime,time,subset
I have the following dataframe which is already a subset of a much larger dataframe: Time X.N2O._ppm 1 15/05/2015 13:30:07.291 0.03941801 2 15/05/2015 13:30:08.307 0.01014003 3 15/05/2015 13:30:09.323 0.02577801 4 15/05/2015 13:30:10.338 0.02554231 5 15/05/2015 13:30:11.354 0.02489800 6 15/05/2015 13:30:12.370 0.02417584 7 15/05/2015 13:30:13.386 0.02489115 8 15/05/2015 13:30:14.402 0.02524912 9...

R - How to change values in one Matrix based on elements in another Matrix

r,if-statement,matrix,subset,covariance
I have the following covariance matrix in R: AB-2000 AB-2600 AB-3500 AC-0100 AD-0100 AF-0200 AB-2000 6.5 NA -1.8 3.65 -17.96 -26.5 AB-2600 NA 7.18 NA NA NA NA AB-3500 -1.79 NA 5.4 NA -4.63 NA AC-0100 3.65 NA NA 4.22 9.8 NA AD-0100 -17.96 NA -4.63 9.8 5.9 NA AF-0200...

Extract grouped Subset with condition

r,subset
I have following data structure: Group Count Value 1 1 1000 1 10 2000 2 6 1000 2 7 2000 Some groups that have a count value and a data value. Now I only want those rows where count > 0.25 * sum(count of group). For example group 1 has...

Creating a subset in R using a double loop [closed]

r,loops,double,conditional,subset
I have a very large csv file I have imported into R and need to make a subset of data. The csv looks something like this: Julian_Day Id Year 52 1 1901 56 5 1901 200 1 1968 ect, where year is 1901-2010, Id 1-58 and Julian_Day 1-200 for about...

Subset data frames inside of a list based on column classes

r,list,data.frame,subset
I have a very large list comprised of data frames, every element of the list is a different data frame, where each column is comprised of different types of variables, and data frames of different lengths. I want to subset the data frames in this list, and keep only those...

R how to retrieve data from data.frame with multiple conditions

r,data.frame,subset
I am wondering how to perform some basic data manipulation in R. What i want to do is the following. I have a data table with the following pattern : V1 V2 V3 ABC X 24 ABC Y 30 EFG X 4 EFG Y 28 HIJ P 40 HIJ Y...

My recursion of getting subset doesn't print out the right answer

algorithm,recursion,combinations,subset
I am trying to implement a method to get all subset of a set. I understand the logic of doing that. i.e. Subset(n) = n + Subset(n-1), but the code I wrote keep printing out the wrong answers. Here is my code: void subset(vector<int> &input, vector<int> output, int current) {...

R - how to replace values in a subset of a vector

r,vector,replace,subset,readline
I have a bash script that is basically a series of commands to download a bunch of climate files. Among many other information on the script, lines 28 to 1027 determine the actual files that should be downloaded. See my file: # point to file file <- 'https://dl.dropboxusercontent.com/u/27700634/wget-ESG-files.sh' # read...

Calculating Function from Variable 1 after Eliminating Level(s) from Variable 2 Using R

r,subset,mean,subsetting
This question is asking which command to use given the following situation: Objective: Calculate mean of iris$Sepal.Length. Constraint: Do not include the iris$Species 'setosa'. My Work: data(iris) levels(iris$Species) output: setosa, versicolor, and virginica mean(iris$Sepal.Length, which(iris$Species != 'setosa')) output: error message 'incompatible dimensions' --- This demo is a stand-in for my...

how find all groups of subsets of set A? Set partitions in Python

python,algorithm,python-3.x,set,subset
I want to find an algorithm that given a set A to find all groups of subsets that satisfy the following condition: x ∪ y ∪ .... z = A, where x, y, ... z ∈ Group and ∀ x,y ∈ Group: x ⊆ A, y ⊆ A, x ∩...

How to create a new variable with values from different variables if another variable equals a set value in R?

r,conditional,condition,subset
I have a complicated question that I will try to simplify by simplifying my dataset. Say I have 5 variables: df$Id <- c(1:12) df$Date <- c(NA,NA,a,a,b,NA,NA,b,c,c,b,a) df$va <- c(1.1, 1.4, 2.5, ...) #12 randoms values df$vb <- c(5.9, 2.3, 4.7, ...) #12 other random values df$vc <- c(3.0, 3.3, 3.7,...

Why subset cut decimal part?

r,data.frame,subset
Hi this is a sample of data.frame / list with two columns containing X and Y. And my problem is when I call subset it will cut decimal part. Can you help me figure why? (row.names | X | Y) > var ... 9150 4246838.57 5785639.07 9152 4462019.15 5756344.11 9153...

R, create a new sorted dataframe with use of dplyr?

r,data.frame,subset
i am new to R and a bit overwhelmed by an assignment. i am asked to create a new dataframe out of an existing one ( the diamonds data that come preinstalled with ggplot2). The dataframe should look as follows: mean_price median_price min_price max_price n All sorted by clarity where...

Filtering a list of integer in range, to exclude the subsets in python

python,range,subset
I'm trying to find a faster way to filter my list of ranges, so that any range that can be covered completely by a larger range will be excluded. For example, #all ranges have width >1, which means no such case like xx=[1,1] in my list #each range itself is...

removing and aggregating duplicates

r,duplicates,subset,lapply
I've posted a sample of the data I'm working with here. "Parcel.." is the main indexing variable and there are good amount of duplicates. The duplicates are not consistent in all of the other columns. My goal is to aggregate the data set so that there is only one observation...

Maintain data frame rows after subet

r,subset
I am trying to calculate a % yield of some data based on a subset: # example data set set.seed(10) Measurement <- rnorm(1000, 5, 2) ID <- rep(c(1:100), each=10) Batch <- rep(c(1:10), each=100) df <- data.frame(Batch, ID, Measurement) df$ID <- factor(df$ID) df$Batch <- factor(df$Batch) # Subset data based on measurement...

Subsetting Column from Matrix in R without specific column names specified

r,matrix,subset
I made a matrix a with character names "0", ..., "10". Now I make a subset list of column names, S. I want to subset the matrix a so that, I won't have the columns with names in S. I am trying to do the following but it's giving error....

Subset of a table that contains at least one element of another table

r,subset
I have two tables that are made by intervals of bp, the Table1 has large intervals and the second has short intervals (just 2bp). I want to make a new table that contains only the Table 1 ranges that have at least one element of table 2 contained in their...

Data cleaning using subset with 2 conditions on same variable

r,subset,data-cleansing
I am a newbie to R, I have at dataset ITEproduction_2014.2015 and I only want to see datapoints between 4 and 39 days. Currently I use 2 separate lines to create a subset. Can I do this in 1 line? something like Data.Difference >3 and < 40? ITEproduction_2014.2015 <- subset(ITEproduction_2014.2015,Date.Difference>3)...

MySQL: Query a subset of rows

mysql,subset
Give the table below: TABLE : USER_ASSETS USER_ID | ASSET_ID ------------------- 1 | 1 ------------------- 1 | 2 ------------------- 1 | 3 ------------------- 2 | 2 ------------------- 2 | 3 ------------------- If I search for the USER_ID with ASSET_ID equals to 1 and 2, it should return USER_ID 1 as...

Mongodb query documents where nested array is equal or subset of fixed array

c#,arrays,mongodb,subset,mongodb-csharp
I keep banging my head against the wall trying to solve the following problem (I'm using the new c# 2.0 driver): The idea is to return all docs where the nested array is equal or a subset of a fixed array. Example: Fixed array: [ "A", "B", "C" ] container...

R: Data frame operations: filtering common rows and removing rows of several data frames

r,merge,dataframes,subset
dfA <- data.frame(Efficiency=c(7,2,8,9), Value=c(3, 4, 7, 8)) dfB <- data.frame(Efficiency=c(7,2,4,2,8,9), Value=c(3, 4, 4, 1, 7, 8)) dfC <- data.frame(Efficiency=c(7,9), Value=c(3, 8)) I want to get the common rows of dfA and dfB. From the resulting data.frame I want to remove the rows which have the same values as dfC....