I'm very new to R - but have been developing SAS-programs (and VBA) for some years. Well, the thing is that I have 4 lines of R-code (scripts?) that I would like to repeat 44 times. Two times for each of 22 different train stations, indicating whether the train is...

I have two tables that are made by intervals of bp, the Table1 has large intervals and the second has short intervals (just 2bp). I want to make a new table that contains only the Table 1 ranges that have at least one element of table 2 contained in their...

This question already has an answer here: How to subset matrix to one column, maintain matrix data type, maintain row/column names? 1 answer When I try to subset a 1-colum matrix by it's row names the subsetting works but an numeric vector is returned. can you somehow prevent that...

This is probably a basic question in R, but I am trying to loop data within subsets of a larger data frame. I have added the 'data=sub240' line within the 'while' command, but this leads to a brackets error, which I suspect is indicative of a larger problem. Can anyone...

i have 2 arrays. arr1=[1,8,1,3,2] arr2=[3,8,1] I want to put elements [8,1] subset into arr3. How can i do this using javascript?I used the following code. But doesn't seemed to be working. function subsetFind() { var arr1 = [1,8,1,3,2] var arr2 = [3,8,1] var arr3 = []; var arr1length =...

I have a very large csv file I have imported into R and need to make a subset of data. The csv looks something like this: Julian_Day Id Year 52 1 1901 56 5 1901 200 1 1968 ect, where year is 1901-2010, Id 1-58 and Julian_Day 1-200 for about...

Lets say I have a soccer team of 10 players (players) from which I should make two subteams of 5 players each and then compute the overall score for each team. players <- read.table(text= "paul 3 ringo 3 george 5 john 5 mick 1 ron 2 charlie 3 ozzy 5...

I am trying to calculate a % yield of some data based on a subset: # example data set set.seed(10) Measurement <- rnorm(1000, 5, 2) ID <- rep(c(1:100), each=10) Batch <- rep(c(1:10), each=100) df <- data.frame(Batch, ID, Measurement) df$ID <- factor(df$ID) df$Batch <- factor(df$Batch) # Subset data based on measurement...

Hi this is a sample of data.frame / list with two columns containing X and Y. And my problem is when I call subset it will cut decimal part. Can you help me figure why? (row.names | X | Y) > var ... 9150 4246838.57 5785639.07 9152 4462019.15 5756344.11 9153...

df1 and df2 have columns a,b. I want to subset data from df1 such that each entry in df1$a along with df1$b is in df2$a along with df2$b. df1 a b c 1 m df1 2 f df1 3 f df1 4 m df1 5 f df1 6 m df1...

I would like to subset specific rows with a vector of identifiers. Here is my data data = rbind(c('B11008Z', 'Men', '13'), c('B11040Z', 'Women', '14'), c('B11040E', 'Women', '12') ) colnames(data) <- c('id', 'sex', 'age') data = as.data.frame(data) When I enter the personal id one by one, there is not problem. data[data$id...

dfA <- data.frame(Efficiency=c(7,2,8,9), Value=c(3, 4, 7, 8)) dfB <- data.frame(Efficiency=c(7,2,4,2,8,9), Value=c(3, 4, 4, 1, 7, 8)) dfC <- data.frame(Efficiency=c(7,9), Value=c(3, 8)) I want to get the common rows of dfA and dfB. From the resulting data.frame I want to remove the rows which have the same values as dfC....

Date Check 201006 1649.515 201007 1825.828 201008 1878.926 201009 1637.491 201010 1664.938 201011 1973.294 201012 2714.054 201013 24086.797 201101 2888.64 201102 2452.403 201103 2230.493 201104 1825.023 201105 1667.396 201106 1657.334 201107 1890.515 201108 1891.783 201109 1655.634 201110 1744.454 201111 2031.872 201112 2541.878 201113 24477.425 I have a dataset. All data...

How to create all possible pairs subsets from a list in conman lisp. For example the list A contain four elements list A= ("A" "B" "C" "D") the expected output is as follows: (("A","B"),("A","C"), ("A","D"),("B","C"),("B","D"), ("C","D")) Could someone please help me out to generate these subsets. Thanks a lot...

I am trying to implement a method to get all subset of a set. I understand the logic of doing that. i.e. Subset(n) = n + Subset(n-1), but the code I wrote keep printing out the wrong answers. Here is my code: void subset(vector<int> &input, vector<int> output, int current) {...

I want to compute the following two regressions using R: library("dynlm") zooX = zoo(test[, -1]) lmx <- dynlm(d(Euribor3)~d(Ois3)+d(CDS)+d(Vstoxx)+d(log(omo))+d(L(Euribor3, 1)), data=zooX[1:16]) summary(lmx) zooX = zoo(test[, -1]) lmx <- dynlm(d(Euribor3)~d(Ois3)+d(CDS)+d(Vstoxx)+d(log(omo))+d(L(Euribor3, 1)), data=zooX[17:31]) summary(lmx) The only difference between those two models is the subset (the first[1:16] and the second [17:31]). Now these two...

I have a large data set of the following format: First column is type, and the subsequent columns are different times that 'type' happens. I want to calculate the slope of each row (~7000 rows) for subset T0-T2 and then t0-t2 and output that information, then get the average of...

I am wondering how to perform some basic data manipulation in R. What i want to do is the following. I have a data table with the following pattern : V1 V2 V3 ABC X 24 ABC Y 30 EFG X 4 EFG Y 28 HIJ P 40 HIJ Y...

This question already has an answer here: R - Remove all unique rows 3 answers My data looks like this: A B 1 2 1A 2 1A 2 2 3 2 4 2 4 3A 0 3A 0 4A 1 4A 1 5 5 I want to subset the...

I've been trying for hours with this one. I have a dataset with two columns, let's call them V1 and V2.I also have a list of imporatant V1 values - Vx. I managed to acquire a subset of V1 with intersect function, so: intersect <- intersect(df$V1,Vx) Now I am desperately...

I am trying to subset a data frame and use a column value as the weighting factor. For example, lets say we have these data. set.seed(123) Data <- data.frame(x1 = sample(c(0,1),100, replace = T), x2 = round(runif(100, min=0, max=100),0), Prob = round(runif(100),2)) head(Data) > head(Data) x1 x2 Prob 1 0...

I have a text file consisting of 6 columns as shown below. the measurements are taken each 30 mint for several years (2001-2013) and sometimes differ each 32 or 39 for certain days. I want to extract and select certain range from this data. to read the file: LR=read.table("C:\\Users\\dat.txt", sep...

I have this dataframe that I'll like to subset (if possible, with dplyr or base R functions): df <- data.frame(x = c(1,1,1,2,2,2), y = c(30,10,8,10,18,5)) x y 1 30 1 10 1 8 2 10 2 18 2 5 Assuming x are factors (so 2 conditions/levels), how can I subset/filter...

I would like to loop over various regressions referencing different data subsets, however I'm unable to appropriately call different subsets. For example: dat <- data.frame(y = rnorm(10), x1 = rnorm(10), x2 = rnorm(10), x3 = rnorm(10) ) x.list <- list(dat$x1,dat$x2,dat$x3) dat1 <- dat[-9,] fit <- list() for(i in 1:length(x.list)){ fit[[i]]...

I am hoping someone can help with the following problem i am having while creating subsets of my data: I have a data set titled 'LakeK_all'. One of the columns is titled 'Lake' and contains a list of lake names (S001-Out, S002-Out, Y001-Out, Y002-Out,...). I would like to pull out...

I have a data frame a with 4 identifying columns: A, B, C, D. A second data frame b, created with ddply(), contains a summary of all the values for different Ds for every set of A,B,C. A third data frame c contains a subset of b with bad values...

Given the following Input 10 4 3 5 5 7 Where 10 = Total Score 4 = 4 players 3 = Score by player 1 5 = Score by player 2 5 = Score by player 3 7 = Score by player 4 I am to print players who's combine...

Suppose I had the following matrix: matrix(c(1,1,2,1,2,3,2,1,3,2,2,1),ncol=3) Result: [,1] [,2] [,3] [1,] 1 2 3 [2,] 1 3 2 [3,] 2 2 2 [4,] 1 1 1 How can I filter/subset this matrix by whether or not each row has duplicate values? For example, in this case, I would only...

I want to create numerous subsets of data based on date sequences defined from a separate dataframe. For example, one dataframe will have dates and daily recorded values across multiple years. I have created a hypothetical dataframe below. I want to conduct various subsets from this dataframe based on start...

I want to compile only rows containing odd numbers in one of the columns. An example of my data frame is below: V1 V2 V3 V4 V5 V6 V7 V8 14221 USDJPY 20030507 20:00:00 116.33 116.19 116.47 116.25 14222 USDJPY 20030507 21:00:00 116.24 116.24 116.42 116.32 14223 USDJPY 20030507 22:00:00...

I have the following dataframe which is already a subset of a much larger dataframe: Time X.N2O._ppm 1 15/05/2015 13:30:07.291 0.03941801 2 15/05/2015 13:30:08.307 0.01014003 3 15/05/2015 13:30:09.323 0.02577801 4 15/05/2015 13:30:10.338 0.02554231 5 15/05/2015 13:30:11.354 0.02489800 6 15/05/2015 13:30:12.370 0.02417584 7 15/05/2015 13:30:13.386 0.02489115 8 15/05/2015 13:30:14.402 0.02524912 9...

Here is a doc I have: var docIHave = { _id: "someId", things: [ { name: "thing1", stuff: [1,2,3,4,5,6,7,8,9] }, { name: "thing2", stuff: [4,5,6,7,8,9,10,11,12,13,14] }, { name: "thing3", stuff: [1,4,6,8,11,21,23,30] } ] } This is the doc I want: var docIWant = { _id: "someId", things: [ { name:...

Give the table below: TABLE : USER_ASSETS USER_ID | ASSET_ID ------------------- 1 | 1 ------------------- 1 | 2 ------------------- 1 | 3 ------------------- 2 | 2 ------------------- 2 | 3 ------------------- If I search for the USER_ID with ASSET_ID equals to 1 and 2, it should return USER_ID 1 as...

The question relates to iris data set: library(datasets) data(iris) How to extract column 'Sepal.Width' for the species virginica?...

I'm a python newbie but have some R experience. In R if I'd like to subset a data.frame I can use a variable to do something like this: # Columns # Assign column names to variable colsToUse <- c('col1','col2','col3') # Use variable to subset df2 <- df1[,colsToUse] # Rows #...

I've the following data.table structure(list(val1 = c(1, 2, 1, 3, 4, 5, 3), val2 = c(4, 5, 6, 4, 2, 4, 5)), .Names = c("val1", "val2"), row.names = c(NA, -7L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0xedae28>) What I would like to do is filter the rows in it...

This question already has an answer here: subset() a factor by its number of observation 1 answer I have a longitudinal dataset structured as 1 row per visit. A numerical patient ID number indicates unique patients. How can I remove all patients with less than 2 observations from my...

I have this data frame, lets call it my_df. It looks like this: my_df <- data.frame(rnorm(n = 30,sd=.5),rep(c("a","b","c"),each=10)) names(my_df) <- c("num","let") head(my_df) num let 1 0.01202600 a 2 1.09025768 a 3 -0.08656178 a 4 -0.04847073 a 5 -0.63750258 a 6 0.58846135 a What I want to do is select all...

I have a complicated question that I will try to simplify by simplifying my dataset. Say I have 5 variables: df$Id <- c(1:12) df$Date <- c(NA,NA,a,a,b,NA,NA,b,c,c,b,a) df$va <- c(1.1, 1.4, 2.5, ...) #12 randoms values df$vb <- c(5.9, 2.3, 4.7, ...) #12 other random values df$vc <- c(3.0, 3.3, 3.7,...

I have 12 columns of data in a table called df, the first column contains several thousand strings such as AA150502-01,AA150502-02,BB150502-01,BB150502-03 etc. I want to filter the table so that i only see the rows ending with the suffix -01, how do i go about doing this? I so far...

I thought that selecting values from a subquery in SQL would only yield values from that subset until I found a very nasty bug in code. Here is an example of my problem. I'm selecting the rows that contain the latest(max) function by date. This correctly returns 4 rows with...

I have a dataset with 45 columns and >8000 observations. One of the variables in the columns is city-name. I want to remove all observations that are located in cities that begin with the letter "S". How would I do this? I'm pretty new to R, so sorry if this...

I have a list that contains 2 data sets. a = data.frame(c(1,1,1,1,1,2,2,2,2,2), c("a","b", "c", "d","e","e","f", "g", "h","i")) colnames(a) = c("Numbers","Letters") c = data.frame(c(3,3,3,3,3,4,4,4,4,4), c("q","r", "s", "t","u","u","v", "w", "x","y")) colnames(c) = c("Numbers","Letters") my.list = list(a,c) my.list I am interest in returning only the letters that are found in common between the...

I have a very large list comprised of data frames, every element of the list is a different data frame, where each column is comprised of different types of variables, and data frames of different lengths. I want to subset the data frames in this list, and keep only those...

I have a data frame of values composed of 5 variables (class in brackets) 1) DateTime (as.POSIXct), 2) ID (character), 3) Sensor 1 (numeric), 4) Sensor 2 (numeric), 5) Sensor 3 (numeric) This data was collected from 5 tagged fish. Each fish has one tag with 3 sensors on it,...

I have a pandas dataframe with a column that marks interesting points of data in another column (e.g. the locations of peaks and troughs). I often need to do some computation on the values between each marker. Is there a neat way to slice the dataframe using the markers as...

I would like to subset news (below) to create news2 (further below) which will only include the rows/columns where the abs(value) in each element of news > 0.01. Below is the code that I have tried: gr <- data.frame(which(abs(news[, 1:ncol(news), with = FALSE]) > 0.01, arr.ind = TRUE)) news2a <-...

I transformed my data into a boxplot (used geom_boxplot of ggplot), so that the outliers got visible. Afterwards I wanted to remove them from my data. That is why I used "ggplot_build" to get on all the informations of the plot and saved it with a new name. Outlier_boxplot<-ggplot_build(boxplot) Now...

I have a large dataset that looks something like this with a few hundred thousand more entries, saved as data: Group1 dtm_Flight_Date Departure Arrival str_Fare_Category_Ident 1 8P104 06/11/2010 9:05 YYJ YVR B 2 8P104 06/11/2010 9:05 YYJ YVR K 3 8P104 06/11/2010 9:05 YYJ YVR L 4 8P104 06/11/2010 9:05...

I have a text file for two years of data that I want to extract from it two different periods(July to sept for each year) Read the file: wg=read.table("C:\\Users\\ERIE.txt", sep ='' , header =TRUE) head(wg) Year day hour mint valu1 valu2 date 105169 2008 1 7 30 0.045 0.014 2008-01-01...

I have following data structure: Group Count Value 1 1 1000 1 10 2000 2 6 1000 2 7 2000 Some groups that have a count value and a data value. Now I only want those rows where count > 0.25 * sum(count of group). For example group 1 has...

I would like to choose 4 assets that its BETAdn column sum to 0. The large matrix is called ALPHABETA & i run the following to get the a subset of assets that satisfy the sum to zero parameter. AB <- subset(data.frame((ALPHABETA)), BETAdn+BETAdn >= -0.01 & BETAdn+BETAdn <= .01) The...

I keep banging my head against the wall trying to solve the following problem (I'm using the new c# 2.0 driver): The idea is to return all docs where the nested array is equal or a subset of a fixed array. Example: Fixed array: [ "A", "B", "C" ] container...

I have a custom class of objects with an assortment of various attributes of different types. I would like to remove duplicates from a list of these objects based on one of these attributes. Something like this, but actually get a list of the objects rather than a list of...

I am looking to split my dataframe into subsets according to the column "Height" with each subset having one row with a value and 0-Inf rows with NAs. This is, to be able to apply functions to the subsets afterwards, specifically order the rows according to their "Diameter" value,...

I am trying to subset a matrix based on a specific value in a column. But I want my subsets in a number of separate matrices. For eg, say I have a matrix ccc which is aaa=c(1,1,1,2,5,1,2,1,1,3,1,1,1,1,1,1,4) bbb=c(4,4,4,4,3,3,3,3,2,2,2,2,3,4,5,6,7) ccc=cbind(aaa,bbb) I want to subset using a condition which is ccc[,1]==1 and...

Suppose I have the following data. x<- c(1,2, 3,4,5,1,3,8,2) y<- c(4,2, 5,6,7,6,7,8,9) data<-cbind(x,y) x y 1 1 4 2 2 2 3 3 5 4 4 6 5 5 7 6 1 6 7 3 7 8 8 8 9 2 9 Now, if I subset this data to select...

I have a large data frame marking occurrences of trigrams in a string, where the strings are the rows, the trigrams are the columns, and the values mark whether an trigram occurs in a string. so something like this: strs <- c('this', 'that', 'chat', 'chin') thi <- c(1, 0, 0,...

I'm trying to convert a character variable into a logical expression in order to use it later inside the subset argument of the subset() function, and all of this is inside a bigger function called early_prep() I created. The problem is when I execute early_prep(file_name = "n44.txt", keep_rows = "block...

I have a data set like this a <- data.frame(var1 = c("patientA", "patientA", "patientA", "patientB", "patientB", "patientB", "patientB"), var2 = as.Date(c("2015-01-02","2015-01-04","2015-02-02","2015-02-06","2015-01-02","2015-01-07","2015-04-02")), var3 = c(F, T, F, F, F, T, F) ) sequ <- rle(as.character(a$var1)) a$sequ <- sequence(sequ$lengths) producing > a var1 var2 var3 sequ 1 patientA 2015-01-02 FALSE 1 2...

So following the example from the Matching package and in particular the GenMatch example Link to package description pp11. We have the following code library(Matching) data(lalonde) attach(lalonde) lalonde$ID <- 1:length(lalonde$age) X = cbind(age, educ, black, hisp, married, nodegr, u74, u75, re75, re74) BalanceMat <- cbind(age, educ, black, hisp, married, nodegr,...

I have a data.frame with 19 columns and 2000+ rows. Column 1 is the dependent variable, and columns V1:V17 independent variables. I would like to keep only the rows where the value for EVERY independent variable listed is between 0 and 0.30. However, each row has a varying number of...

I am currently programming a CPLEX/OPL model using IBM ILOG CPLEX Optimization Studio. I have a problem with using sums or indexes which contain a subset and depend on another parameter/variable, e.g. check the following constraints: NB 2,3,4,8). Can anyone help me with incorporating these constraints properly? Please find the...

I would like to subset the rows that have a concrete sequence of characters in one variable. For example I would like to subset the rows that have at least three consecutive 1 ("111"; e.g. "01110", "11111", "01111") in the variable history. Here is some example data: id <- c(1,2,3,4,5,6,7,8,9,10)...

I am trying to clean my stocks' df and I need to get rid of the ones that have less than 10 observations per month. Already checked these 2 threads: subsetting-based-on-observations-in-a-month and ddply-for-sum-by-group-in-r But I'm a noob and I cannot figure it out yet. In short: Please, help me out...

I want to subtract y from x, which means remove one "A", three "B" and one "E" from x, so xNew will be c("A", "C", "A","B","D"). It also means length(xNew)=length(x) - length(y) x <- c("A","A","C","A","B","B","B","B","D","E") y <- c("A","B","B","B","E") setdiff doesn't work because xNew <- setdiff(x,y) xNew [1] "C" "D" match...

I have two series of constraints S and S', they describe possibly infinitely large sets. Say for example S: x <= 10 and y <= x and S': x <= 20 and y <= 20. Now I want to know if S is a subset of S'? I thought I...

I am trying to subset a data frame based on a range of time. Someone has asked this question in the past and the answer was to use R CMD INSTALL lubridate_1.3.1.tar.gz (see link: subset rows according to a range of time. The issue with this answer is that I...

I have two data frames in R. one <- data.frame( x = letters[1:10] , y = 1:10, z = rnorm(10)) two <- data.frame( x = letters[1:20] , y = 1:20, z = one$z) I want to "un-merge" these data frames based on the variable x... What I mean is that......

I have the following covariance matrix in R: AB-2000 AB-2600 AB-3500 AC-0100 AD-0100 AF-0200 AB-2000 6.5 NA -1.8 3.65 -17.96 -26.5 AB-2600 NA 7.18 NA NA NA NA AB-3500 -1.79 NA 5.4 NA -4.63 NA AC-0100 3.65 NA NA 4.22 9.8 NA AD-0100 -17.96 NA -4.63 9.8 5.9 NA AF-0200...

Given the following example: set.seed(1) tmp.data<-data.frame(group=rep(c("x","y","z"),8), year=rep(c(2000:2003),6), value=runif(24, 1, 100)) I can create a simple boxplot with group affiliations: boxplot.example<-ggplot(data=tmp.data) boxplot.example.simple<-boxplot.example + geom_boxplot(aes(x=group,y=value)) # plot boxplot.example.simple However I would like to create seperate Boxplots for each group and year in the same graphic. I tried it with the group function...

This question is asking which command to use given the following situation: Objective: Calculate mean of iris$Sepal.Length. Constraint: Do not include the iris$Species 'setosa'. My Work: data(iris) levels(iris$Species) output: setosa, versicolor, and virginica mean(iris$Sepal.Length, which(iris$Species != 'setosa')) output: error message 'incompatible dimensions' --- This demo is a stand-in for my...

The task I am trying to accomplish is essentially filtering one dataset by the entries in another dataset by entries in an "id" column. The data sets I am working with are quite large having 10 of thousands of entries and 30 or so variables. I have made toy datasets...

I am working with several temperature datasets and trying to pull out when the temperature meets or exceeds a threshold value. Ideally, I want to know how many times (count) that value is met/ exceeded each year for ~100 yrs of data AND when (what date) that value is first...

My goal is to calculate a group % column by zip. I created the % column by zip, but keep losing my group ('cgrp') variable. How can I keep this in my end results? My data table script is giving me the below results: zip V1 1: 12007 19.35484 2:...

Given an array {1,3,5,7}, its subparts are defined as {1357,135,137,157,357,13,15,17,35,37,57,1,3,5,7}. I have to find the sum of all these numbers in the new array. In this case sum comes out to be 2333. Please help me find a solution in O(n). My O(n^2) solution times out. link to the problem...

I have a bash script that is basically a series of commands to download a bunch of climate files. Among many other information on the script, lines 28 to 1027 determine the actual files that should be downloaded. See my file: # point to file file <- 'https://dl.dropboxusercontent.com/u/27700634/wget-ESG-files.sh' # read...

I have a list of five dataframes. Each dataframe contains one dimension column and 4 value columns. I would like to subset each dataframe in the list based on the contents of a vector. df <- data.frame(x = 1:100, y2 = runif(100, 0, 100), y3 = runif(100, 0, 100), y4...

I have a complicated question that I will try to simplify by simplifying my dataset. Say I have 5 variables: df$Id <- c(1:12) df$Date <- c(NA,NA,a,a,b,NA,NA,b,c,c,b,a) df$va <- c(1.1, 1.4, 2.5, ...) #12 randoms values df$vb <- c(5.9, 2.3, 4.7, ...) #12 other random values df$vc <- c(3.0, 3.3, 3.7,...

I am a newbie to R, I have at dataset ITEproduction_2014.2015 and I only want to see datapoints between 4 and 39 days. Currently I use 2 separate lines to create a subset. Can I do this in 1 line? something like Data.Difference >3 and < 40? ITEproduction_2014.2015 <- subset(ITEproduction_2014.2015,Date.Difference>3)...

This question already has an answer here: subset() a factor by its number of observation 1 answer I am trying to solve a data formatting problem. I have a data frame where the variables are leveled into schools and students. For example: Schools Students SchoolA Student1 SchoolA Student2 SchoolA...

It seems this question has been asked a couple of times in different forms, but I could't find the right solution. I have a SpatialPoint object with several Polygons and would like to subset and plot one polygon using the slot "ID". Using the example from this question: Sr1 =...

I made a matrix a with character names "0", ..., "10". Now I make a subset list of column names, S. I want to subset the matrix a so that, I won't have the columns with names in S. I am trying to do the following but it's giving error....

I want to subset my dataset (using sas) every time the count variable equals to 1. For example the following dataset would split into two datasets: Over Ball Bowling Runs_scored Count 39 1 Ali 1 1 39 2 Ali 1 2 39 3 Ali 2 3 39 4 Ali 1...

I have a data frame that contains with the following format: manufacturers pricegroup leads harley <2500 # honda <5000 # ... ... .. I am using the aggregate function to pull out data in the following way: aggregate( leads ~ manufacturer + pricegroup, data=leaddata, FUN=sum, subset=(manufacturer==c("honda","harley"))) I noticed this is...

I would like to subset a dataframe by referring to a column with a string and select the values of that column that fulfill a condition. From the following code employee <- c('John Doe','Peter Gynn','Jolie Hope') salary <- c(21000, 23400, 26800) startdate <- as.Date(c('2010-11-1','2008-3-25','2007-3-14')) employ.data <- data.frame(employee, salary, startdate) salary_string...

I've posted a sample of the data I'm working with here. "Parcel.." is the main indexing variable and there are good amount of duplicates. The duplicates are not consistent in all of the other columns. My goal is to aggregate the data set so that there is only one observation...

I am having trouble mutating a subset of rows in dplyr. I am using the chaining command: %>% to say: data <- data %>% filter(ColA == "ABC") %>% mutate(ColB = "XXXX") This works fine but the problems is that I want to be able to select the entire original table...

I have the following data frame which I imported into R using read.table() (I incorporated read.table() within read_data() which is a function I created that also throw messages in case the file name is not written appropriately): > raw_data <- read_data("n44.txt") [1] #### Reading txt file #### > head(raw_data) subject...

I have quite a simple dataset: ID Value Time 1 censored 1 1 censored 2 1 uncensored 3 1 uncensored 4 1 censored 5 1 censored 6 2 censored 1 2 uncensored 2 2 uncensored 3 2 uncensored 4 2 censored 5 I want to keep the first uncensored occurrence,...

I am conducting an analysis where I am chosing between variables in two steps. Step 1: choose the best variables and combinations of variables from each of two set of variables (e.g., intrinsic & extrinsic variables). Step 2: take the best variables combinations of each subset and create new set...

I'm trying to create a filter to remove lines from a dataset using grep and subset together. Sample dataset: id <- 1:10 problem <- c("a" , "b", "c", "d", "a","b","c","a", "b", "a") solution1 <- c("eat", "sleep", "drink", "play", "sleep", "play", "play", "drink", "play", "eat") solution2 <- c("read", "read", "eat", "drink",...

Basically I have a matrix with 24028 rows and I want to extract a subset of this matrix that meets a certain condition. I use: Sin <- actulab[actulab[,"Atteint_Limite"] == "0",] Here's what I get when I use tail(Sin) INDEX Atteint_Limite Limite Sev_cen FRANC ANNEE MOISSIN MONTBATI 24019 24019 0 50000...

I am new in this and i am stuck. I have a list of data frames that have information about pressure, temperature and salinity. I want to subset all of them and keep only the values of temperature and salinity when the pressure is equal to 5. Below this is...

I am trying to calculate and plot % yield of some data based on user definable inputs. I am using rmarkdown and shiny to do this. I keep getting stuck when passing a reactive subset through ddply to count the number of rows in the subset.."invalid (null) left side of...

I want to find an algorithm that given a set A to find all groups of subsets that satisfy the following condition: x ∪ y ∪ .... z = A, where x, y, ... z ∈ Group and ∀ x,y ∈ Group: x ⊆ A, y ⊆ A, x ∩...

i am new to R and a bit overwhelmed by an assignment. i am asked to create a new dataframe out of an existing one ( the diamonds data that come preinstalled with ggplot2). The dataframe should look as follows: mean_price median_price min_price max_price n All sorted by clarity where...

I'm trying to find a faster way to filter my list of ranges, so that any range that can be covered completely by a larger range will be excluded. For example, #all ranges have width >1, which means no such case like xx=[1,1] in my list #each range itself is...