I am hoping someone can help with the following problem i am having while creating subsets of my data: I have a data set titled 'LakeK_all'. One of the columns is titled 'Lake' and contains a list of lake names (S001-Out, S002-Out, Y001-Out, Y002-Out,...). I would like to pull out...

I want to subset my dataset (using sas) every time the count variable equals to 1. For example the following dataset would split into two datasets: Over Ball Bowling Runs_scored Count 39 1 Ali 1 1 39 2 Ali 1 2 39 3 Ali 2 3 39 4 Ali 1...

I am trying to clean my stocks' df and I need to get rid of the ones that have less than 10 observations per month. Already checked these 2 threads: subsetting-based-on-observations-in-a-month and ddply-for-sum-by-group-in-r But I'm a noob and I cannot figure it out yet. In short: Please, help me out...

I thought that selecting values from a subquery in SQL would only yield values from that subset until I found a very nasty bug in code. Here is an example of my problem. I'm selecting the rows that contain the latest(max) function by date. This correctly returns 4 rows with...

How to create all possible pairs subsets from a list in conman lisp. For example the list A contain four elements list A= ("A" "B" "C" "D") the expected output is as follows: (("A","B"),("A","C"), ("A","D"),("B","C"),("B","D"), ("C","D")) Could someone please help me out to generate these subsets. Thanks a lot...

I'm very new to R - but have been developing SAS-programs (and VBA) for some years. Well, the thing is that I have 4 lines of R-code (scripts?) that I would like to repeat 44 times. Two times for each of 22 different train stations, indicating whether the train is...

So following the example from the Matching package and in particular the GenMatch example Link to package description pp11. We have the following code library(Matching) data(lalonde) attach(lalonde) lalonde$ID <- 1:length(lalonde$age) X = cbind(age, educ, black, hisp, married, nodegr, u74, u75, re75, re74) BalanceMat <- cbind(age, educ, black, hisp, married, nodegr,...

I am trying to calculate a % yield of some data based on a subset: # example data set set.seed(10) Measurement <- rnorm(1000, 5, 2) ID <- rep(c(1:100), each=10) Batch <- rep(c(1:10), each=100) df <- data.frame(Batch, ID, Measurement) df$ID <- factor(df$ID) df$Batch <- factor(df$Batch) # Subset data based on measurement...

It seems this question has been asked a couple of times in different forms, but I could't find the right solution. I have a SpatialPoint object with several Polygons and would like to subset and plot one polygon using the slot "ID". Using the example from this question: Sr1 =...

I made a matrix a with character names "0", ..., "10". Now I make a subset list of column names, S. I want to subset the matrix a so that, I won't have the columns with names in S. I am trying to do the following but it's giving error....

I have this data frame, lets call it my_df. It looks like this: my_df <- data.frame(rnorm(n = 30,sd=.5),rep(c("a","b","c"),each=10)) names(my_df) <- c("num","let") head(my_df) num let 1 0.01202600 a 2 1.09025768 a 3 -0.08656178 a 4 -0.04847073 a 5 -0.63750258 a 6 0.58846135 a What I want to do is select all...

I am new in this and i am stuck. I have a list of data frames that have information about pressure, temperature and salinity. I want to subset all of them and keep only the values of temperature and salinity when the pressure is equal to 5. Below this is...

Given an array {1,3,5,7}, its subparts are defined as {1357,135,137,157,357,13,15,17,35,37,57,1,3,5,7}. I have to find the sum of all these numbers in the new array. In this case sum comes out to be 2333. Please help me find a solution in O(n). My O(n^2) solution times out. link to the problem...

I am conducting an analysis where I am chosing between variables in two steps. Step 1: choose the best variables and combinations of variables from each of two set of variables (e.g., intrinsic & extrinsic variables). Step 2: take the best variables combinations of each subset and create new set...

I have a complicated question that I will try to simplify by simplifying my dataset. Say I have 5 variables: df$Id <- c(1:12) df$Date <- c(NA,NA,a,a,b,NA,NA,b,c,c,b,a) df$va <- c(1.1, 1.4, 2.5, ...) #12 randoms values df$vb <- c(5.9, 2.3, 4.7, ...) #12 other random values df$vc <- c(3.0, 3.3, 3.7,...

I have quite a simple dataset: ID Value Time 1 censored 1 1 censored 2 1 uncensored 3 1 uncensored 4 1 censored 5 1 censored 6 2 censored 1 2 uncensored 2 2 uncensored 3 2 uncensored 4 2 censored 5 I want to keep the first uncensored occurrence,...

Date Check 201006 1649.515 201007 1825.828 201008 1878.926 201009 1637.491 201010 1664.938 201011 1973.294 201012 2714.054 201013 24086.797 201101 2888.64 201102 2452.403 201103 2230.493 201104 1825.023 201105 1667.396 201106 1657.334 201107 1890.515 201108 1891.783 201109 1655.634 201110 1744.454 201111 2031.872 201112 2541.878 201113 24477.425 I have a dataset. All data...

This question is asking which command to use given the following situation: Objective: Calculate mean of iris$Sepal.Length. Constraint: Do not include the iris$Species 'setosa'. My Work: data(iris) levels(iris$Species) output: setosa, versicolor, and virginica mean(iris$Sepal.Length, which(iris$Species != 'setosa')) output: error message 'incompatible dimensions' --- This demo is a stand-in for my...

I've the following data.table structure(list(val1 = c(1, 2, 1, 3, 4, 5, 3), val2 = c(4, 5, 6, 4, 2, 4, 5)), .Names = c("val1", "val2"), row.names = c(NA, -7L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0xedae28>) What I would like to do is filter the rows in it...

I have a data frame that contains with the following format: manufacturers pricegroup leads harley <2500 # honda <5000 # ... ... .. I am using the aggregate function to pull out data in the following way: aggregate( leads ~ manufacturer + pricegroup, data=leaddata, FUN=sum, subset=(manufacturer==c("honda","harley"))) I noticed this is...

I have a pandas dataframe with a column that marks interesting points of data in another column (e.g. the locations of peaks and troughs). I often need to do some computation on the values between each marker. Is there a neat way to slice the dataframe using the markers as...

I have a list of five dataframes. Each dataframe contains one dimension column and 4 value columns. I would like to subset each dataframe in the list based on the contents of a vector. df <- data.frame(x = 1:100, y2 = runif(100, 0, 100), y3 = runif(100, 0, 100), y4...

I have a data set like this a <- data.frame(var1 = c("patientA", "patientA", "patientA", "patientB", "patientB", "patientB", "patientB"), var2 = as.Date(c("2015-01-02","2015-01-04","2015-02-02","2015-02-06","2015-01-02","2015-01-07","2015-04-02")), var3 = c(F, T, F, F, F, T, F) ) sequ <- rle(as.character(a$var1)) a$sequ <- sequence(sequ$lengths) producing > a var1 var2 var3 sequ 1 patientA 2015-01-02 FALSE 1 2...

I have two data frames in R. one <- data.frame( x = letters[1:10] , y = 1:10, z = rnorm(10)) two <- data.frame( x = letters[1:20] , y = 1:20, z = one$z) I want to "un-merge" these data frames based on the variable x... What I mean is that......

I have a text file for two years of data that I want to extract from it two different periods(July to sept for each year) Read the file: wg=read.table("C:\\Users\\ERIE.txt", sep ='' , header =TRUE) head(wg) Year day hour mint valu1 valu2 date 105169 2008 1 7 30 0.045 0.014 2008-01-01...

I've posted a sample of the data I'm working with here. "Parcel.." is the main indexing variable and there are good amount of duplicates. The duplicates are not consistent in all of the other columns. My goal is to aggregate the data set so that there is only one observation...

I have a very large list comprised of data frames, every element of the list is a different data frame, where each column is comprised of different types of variables, and data frames of different lengths. I want to subset the data frames in this list, and keep only those...

This is probably a basic question in R, but I am trying to loop data within subsets of a larger data frame. I have added the 'data=sub240' line within the 'while' command, but this leads to a brackets error, which I suspect is indicative of a larger problem. Can anyone...

I have a large dataset that looks something like this with a few hundred thousand more entries, saved as data: Group1 dtm_Flight_Date Departure Arrival str_Fare_Category_Ident 1 8P104 06/11/2010 9:05 YYJ YVR B 2 8P104 06/11/2010 9:05 YYJ YVR K 3 8P104 06/11/2010 9:05 YYJ YVR L 4 8P104 06/11/2010 9:05...

Lets say I have a soccer team of 10 players (players) from which I should make two subteams of 5 players each and then compute the overall score for each team. players <- read.table(text= "paul 3 ringo 3 george 5 john 5 mick 1 ron 2 charlie 3 ozzy 5...

dfA <- data.frame(Efficiency=c(7,2,8,9), Value=c(3, 4, 7, 8)) dfB <- data.frame(Efficiency=c(7,2,4,2,8,9), Value=c(3, 4, 4, 1, 7, 8)) dfC <- data.frame(Efficiency=c(7,9), Value=c(3, 8)) I want to get the common rows of dfA and dfB. From the resulting data.frame I want to remove the rows which have the same values as dfC....

i am new to R and a bit overwhelmed by an assignment. i am asked to create a new dataframe out of an existing one ( the diamonds data that come preinstalled with ggplot2). The dataframe should look as follows: mean_price median_price min_price max_price n All sorted by clarity where...

I have a bash script that is basically a series of commands to download a bunch of climate files. Among many other information on the script, lines 28 to 1027 determine the actual files that should be downloaded. See my file: # point to file file <- 'https://dl.dropboxusercontent.com/u/27700634/wget-ESG-files.sh' # read...

I've been trying for hours with this one. I have a dataset with two columns, let's call them V1 and V2.I also have a list of imporatant V1 values - Vx. I managed to acquire a subset of V1 with intersect function, so: intersect <- intersect(df$V1,Vx) Now I am desperately...

I'm a python newbie but have some R experience. In R if I'd like to subset a data.frame I can use a variable to do something like this: # Columns # Assign column names to variable colsToUse <- c('col1','col2','col3') # Use variable to subset df2 <- df1[,colsToUse] # Rows #...

I have a data.frame with 19 columns and 2000+ rows. Column 1 is the dependent variable, and columns V1:V17 independent variables. I would like to keep only the rows where the value for EVERY independent variable listed is between 0 and 0.30. However, each row has a varying number of...

I want to compute the following two regressions using R: library("dynlm") zooX = zoo(test[, -1]) lmx <- dynlm(d(Euribor3)~d(Ois3)+d(CDS)+d(Vstoxx)+d(log(omo))+d(L(Euribor3, 1)), data=zooX[1:16]) summary(lmx) zooX = zoo(test[, -1]) lmx <- dynlm(d(Euribor3)~d(Ois3)+d(CDS)+d(Vstoxx)+d(log(omo))+d(L(Euribor3, 1)), data=zooX[17:31]) summary(lmx) The only difference between those two models is the subset (the first[1:16] and the second [17:31]). Now these two...

This question already has an answer here: subset() a factor by its number of observation 1 answer I am trying to solve a data formatting problem. I have a data frame where the variables are leveled into schools and students. For example: Schools Students SchoolA Student1 SchoolA Student2 SchoolA...

I have a complicated question that I will try to simplify by simplifying my dataset. Say I have 5 variables: df$Id <- c(1:12) df$Date <- c(NA,NA,a,a,b,NA,NA,b,c,c,b,a) df$va <- c(1.1, 1.4, 2.5, ...) #12 randoms values df$vb <- c(5.9, 2.3, 4.7, ...) #12 other random values df$vc <- c(3.0, 3.3, 3.7,...

I'm trying to find a faster way to filter my list of ranges, so that any range that can be covered completely by a larger range will be excluded. For example, #all ranges have width >1, which means no such case like xx=[1,1] in my list #each range itself is...

Give the table below: TABLE : USER_ASSETS USER_ID | ASSET_ID ------------------- 1 | 1 ------------------- 1 | 2 ------------------- 1 | 3 ------------------- 2 | 2 ------------------- 2 | 3 ------------------- If I search for the USER_ID with ASSET_ID equals to 1 and 2, it should return USER_ID 1 as...

I am trying to calculate and plot % yield of some data based on user definable inputs. I am using rmarkdown and shiny to do this. I keep getting stuck when passing a reactive subset through ddply to count the number of rows in the subset.."invalid (null) left side of...

This question already has an answer here: subset() a factor by its number of observation 1 answer I have a longitudinal dataset structured as 1 row per visit. A numerical patient ID number indicates unique patients. How can I remove all patients with less than 2 observations from my...

I am trying to subset a data frame and use a column value as the weighting factor. For example, lets say we have these data. set.seed(123) Data <- data.frame(x1 = sample(c(0,1),100, replace = T), x2 = round(runif(100, min=0, max=100),0), Prob = round(runif(100),2)) head(Data) > head(Data) x1 x2 Prob 1 0...

I have the following covariance matrix in R: AB-2000 AB-2600 AB-3500 AC-0100 AD-0100 AF-0200 AB-2000 6.5 NA -1.8 3.65 -17.96 -26.5 AB-2600 NA 7.18 NA NA NA NA AB-3500 -1.79 NA 5.4 NA -4.63 NA AC-0100 3.65 NA NA 4.22 9.8 NA AD-0100 -17.96 NA -4.63 9.8 5.9 NA AF-0200...

I have a text file consisting of 6 columns as shown below. the measurements are taken each 30 mint for several years (2001-2013) and sometimes differ each 32 or 39 for certain days. I want to extract and select certain range from this data. to read the file: LR=read.table("C:\\Users\\dat.txt", sep...

This question already has an answer here: R - Remove all unique rows 3 answers My data looks like this: A B 1 2 1A 2 1A 2 2 3 2 4 2 4 3A 0 3A 0 4A 1 4A 1 5 5 I want to subset the...

I have this dataframe that I'll like to subset (if possible, with dplyr or base R functions): df <- data.frame(x = c(1,1,1,2,2,2), y = c(30,10,8,10,18,5)) x y 1 30 1 10 1 8 2 10 2 18 2 5 Assuming x are factors (so 2 conditions/levels), how can I subset/filter...

I want to subtract y from x, which means remove one "A", three "B" and one "E" from x, so xNew will be c("A", "C", "A","B","D"). It also means length(xNew)=length(x) - length(y) x <- c("A","A","C","A","B","B","B","B","D","E") y <- c("A","B","B","B","E") setdiff doesn't work because xNew <- setdiff(x,y) xNew [1] "C" "D" match...

I am trying to implement a method to get all subset of a set. I understand the logic of doing that. i.e. Subset(n) = n + Subset(n-1), but the code I wrote keep printing out the wrong answers. Here is my code: void subset(vector<int> &input, vector<int> output, int current) {...

I want to compile only rows containing odd numbers in one of the columns. An example of my data frame is below: V1 V2 V3 V4 V5 V6 V7 V8 14221 USDJPY 20030507 20:00:00 116.33 116.19 116.47 116.25 14222 USDJPY 20030507 21:00:00 116.24 116.24 116.42 116.32 14223 USDJPY 20030507 22:00:00...

I have a list that contains 2 data sets. a = data.frame(c(1,1,1,1,1,2,2,2,2,2), c("a","b", "c", "d","e","e","f", "g", "h","i")) colnames(a) = c("Numbers","Letters") c = data.frame(c(3,3,3,3,3,4,4,4,4,4), c("q","r", "s", "t","u","u","v", "w", "x","y")) colnames(c) = c("Numbers","Letters") my.list = list(a,c) my.list I am interest in returning only the letters that are found in common between the...

I keep banging my head against the wall trying to solve the following problem (I'm using the new c# 2.0 driver): The idea is to return all docs where the nested array is equal or a subset of a fixed array. Example: Fixed array: [ "A", "B", "C" ] container...

Here is a doc I have: var docIHave = { _id: "someId", things: [ { name: "thing1", stuff: [1,2,3,4,5,6,7,8,9] }, { name: "thing2", stuff: [4,5,6,7,8,9,10,11,12,13,14] }, { name: "thing3", stuff: [1,4,6,8,11,21,23,30] } ] } This is the doc I want: var docIWant = { _id: "someId", things: [ { name:...

I would like to subset a dataframe by referring to a column with a string and select the values of that column that fulfill a condition. From the following code employee <- c('John Doe','Peter Gynn','Jolie Hope') salary <- c(21000, 23400, 26800) startdate <- as.Date(c('2010-11-1','2008-3-25','2007-3-14')) employ.data <- data.frame(employee, salary, startdate) salary_string...

Given the following Input 10 4 3 5 5 7 Where 10 = Total Score 4 = 4 players 3 = Score by player 1 5 = Score by player 2 5 = Score by player 3 7 = Score by player 4 I am to print players who's combine...

I want to create numerous subsets of data based on date sequences defined from a separate dataframe. For example, one dataframe will have dates and daily recorded values across multiple years. I have created a hypothetical dataframe below. I want to conduct various subsets from this dataframe based on start...

I have two tables that are made by intervals of bp, the Table1 has large intervals and the second has short intervals (just 2bp). I want to make a new table that contains only the Table 1 ranges that have at least one element of table 2 contained in their...

The task I am trying to accomplish is essentially filtering one dataset by the entries in another dataset by entries in an "id" column. The data sets I am working with are quite large having 10 of thousands of entries and 30 or so variables. I have made toy datasets...

I would like to subset news (below) to create news2 (further below) which will only include the rows/columns where the abs(value) in each element of news > 0.01. Below is the code that I have tried: gr <- data.frame(which(abs(news[, 1:ncol(news), with = FALSE]) > 0.01, arr.ind = TRUE)) news2a <-...

I have a data frame of values composed of 5 variables (class in brackets) 1) DateTime (as.POSIXct), 2) ID (character), 3) Sensor 1 (numeric), 4) Sensor 2 (numeric), 5) Sensor 3 (numeric) This data was collected from 5 tagged fish. Each fish has one tag with 3 sensors on it,...

I am trying to subset a data frame based on a range of time. Someone has asked this question in the past and the answer was to use R CMD INSTALL lubridate_1.3.1.tar.gz (see link: subset rows according to a range of time. The issue with this answer is that I...

I have a very large csv file I have imported into R and need to make a subset of data. The csv looks something like this: Julian_Day Id Year 52 1 1901 56 5 1901 200 1 1968 ect, where year is 1901-2010, Id 1-58 and Julian_Day 1-200 for about...

I am a newbie to R, I have at dataset ITEproduction_2014.2015 and I only want to see datapoints between 4 and 39 days. Currently I use 2 separate lines to create a subset. Can I do this in 1 line? something like Data.Difference >3 and < 40? ITEproduction_2014.2015 <- subset(ITEproduction_2014.2015,Date.Difference>3)...

I would like to subset the rows that have a concrete sequence of characters in one variable. For example I would like to subset the rows that have at least three consecutive 1 ("111"; e.g. "01110", "11111", "01111") in the variable history. Here is some example data: id <- c(1,2,3,4,5,6,7,8,9,10)...

I have a custom class of objects with an assortment of various attributes of different types. I would like to remove duplicates from a list of these objects based on one of these attributes. Something like this, but actually get a list of the objects rather than a list of...

I have a data frame a with 4 identifying columns: A, B, C, D. A second data frame b, created with ddply(), contains a summary of all the values for different Ds for every set of A,B,C. A third data frame c contains a subset of b with bad values...

Basically I have a matrix with 24028 rows and I want to extract a subset of this matrix that meets a certain condition. I use: Sin <- actulab[actulab[,"Atteint_Limite"] == "0",] Here's what I get when I use tail(Sin) INDEX Atteint_Limite Limite Sev_cen FRANC ANNEE MOISSIN MONTBATI 24019 24019 0 50000...

This question already has an answer here: How to subset matrix to one column, maintain matrix data type, maintain row/column names? 1 answer When I try to subset a 1-colum matrix by it's row names the subsetting works but an numeric vector is returned. can you somehow prevent that...

I'm trying to convert a character variable into a logical expression in order to use it later inside the subset argument of the subset() function, and all of this is inside a bigger function called early_prep() I created. The problem is when I execute early_prep(file_name = "n44.txt", keep_rows = "block...

Hi this is a sample of data.frame / list with two columns containing X and Y. And my problem is when I call subset it will cut decimal part. Can you help me figure why? (row.names | X | Y) > var ... 9150 4246838.57 5785639.07 9152 4462019.15 5756344.11 9153...

i have 2 arrays. arr1=[1,8,1,3,2] arr2=[3,8,1] I want to put elements [8,1] subset into arr3. How can i do this using javascript?I used the following code. But doesn't seemed to be working. function subsetFind() { var arr1 = [1,8,1,3,2] var arr2 = [3,8,1] var arr3 = []; var arr1length =...

I would like to subset specific rows with a vector of identifiers. Here is my data data = rbind(c('B11008Z', 'Men', '13'), c('B11040Z', 'Women', '14'), c('B11040E', 'Women', '12') ) colnames(data) <- c('id', 'sex', 'age') data = as.data.frame(data) When I enter the personal id one by one, there is not problem. data[data$id...

I have the following dataframe which is already a subset of a much larger dataframe: Time X.N2O._ppm 1 15/05/2015 13:30:07.291 0.03941801 2 15/05/2015 13:30:08.307 0.01014003 3 15/05/2015 13:30:09.323 0.02577801 4 15/05/2015 13:30:10.338 0.02554231 5 15/05/2015 13:30:11.354 0.02489800 6 15/05/2015 13:30:12.370 0.02417584 7 15/05/2015 13:30:13.386 0.02489115 8 15/05/2015 13:30:14.402 0.02524912 9...

I have two series of constraints S and S', they describe possibly infinitely large sets. Say for example S: x <= 10 and y <= x and S': x <= 20 and y <= 20. Now I want to know if S is a subset of S'? I thought I...

I have 12 columns of data in a table called df, the first column contains several thousand strings such as AA150502-01,AA150502-02,BB150502-01,BB150502-03 etc. I want to filter the table so that i only see the rows ending with the suffix -01, how do i go about doing this? I so far...

Suppose I have the following data. x<- c(1,2, 3,4,5,1,3,8,2) y<- c(4,2, 5,6,7,6,7,8,9) data<-cbind(x,y) x y 1 1 4 2 2 2 3 3 5 4 4 6 5 5 7 6 1 6 7 3 7 8 8 8 9 2 9 Now, if I subset this data to select...

I am looking to split my dataframe into subsets according to the column "Height" with each subset having one row with a value and 0-Inf rows with NAs. This is, to be able to apply functions to the subsets afterwards, specifically order the rows according to their "Diameter" value,...

I am working with several temperature datasets and trying to pull out when the temperature meets or exceeds a threshold value. Ideally, I want to know how many times (count) that value is met/ exceeded each year for ~100 yrs of data AND when (what date) that value is first...

I am having trouble mutating a subset of rows in dplyr. I am using the chaining command: %>% to say: data <- data %>% filter(ColA == "ABC") %>% mutate(ColB = "XXXX") This works fine but the problems is that I want to be able to select the entire original table...

I am wondering how to perform some basic data manipulation in R. What i want to do is the following. I have a data table with the following pattern : V1 V2 V3 ABC X 24 ABC Y 30 EFG X 4 EFG Y 28 HIJ P 40 HIJ Y...

I would like to choose 4 assets that its BETAdn column sum to 0. The large matrix is called ALPHABETA & i run the following to get the a subset of assets that satisfy the sum to zero parameter. AB <- subset(data.frame((ALPHABETA)), BETAdn+BETAdn >= -0.01 & BETAdn+BETAdn <= .01) The...

I am trying to subset a matrix based on a specific value in a column. But I want my subsets in a number of separate matrices. For eg, say I have a matrix ccc which is aaa=c(1,1,1,2,5,1,2,1,1,3,1,1,1,1,1,1,4) bbb=c(4,4,4,4,3,3,3,3,2,2,2,2,3,4,5,6,7) ccc=cbind(aaa,bbb) I want to subset using a condition which is ccc[,1]==1 and...

I would like to loop over various regressions referencing different data subsets, however I'm unable to appropriately call different subsets. For example: dat <- data.frame(y = rnorm(10), x1 = rnorm(10), x2 = rnorm(10), x3 = rnorm(10) ) x.list <- list(dat$x1,dat$x2,dat$x3) dat1 <- dat[-9,] fit <- list() for(i in 1:length(x.list)){ fit[[i]]...

df1 and df2 have columns a,b. I want to subset data from df1 such that each entry in df1$a along with df1$b is in df2$a along with df2$b. df1 a b c 1 m df1 2 f df1 3 f df1 4 m df1 5 f df1 6 m df1...

I am currently programming a CPLEX/OPL model using IBM ILOG CPLEX Optimization Studio. I have a problem with using sums or indexes which contain a subset and depend on another parameter/variable, e.g. check the following constraints: NB 2,3,4,8). Can anyone help me with incorporating these constraints properly? Please find the...

My goal is to calculate a group % column by zip. I created the % column by zip, but keep losing my group ('cgrp') variable. How can I keep this in my end results? My data table script is giving me the below results: zip V1 1: 12007 19.35484 2:...

Given the following example: set.seed(1) tmp.data<-data.frame(group=rep(c("x","y","z"),8), year=rep(c(2000:2003),6), value=runif(24, 1, 100)) I can create a simple boxplot with group affiliations: boxplot.example<-ggplot(data=tmp.data) boxplot.example.simple<-boxplot.example + geom_boxplot(aes(x=group,y=value)) # plot boxplot.example.simple However I would like to create seperate Boxplots for each group and year in the same graphic. I tried it with the group function...

I have following data structure: Group Count Value 1 1 1000 1 10 2000 2 6 1000 2 7 2000 Some groups that have a count value and a data value. Now I only want those rows where count > 0.25 * sum(count of group). For example group 1 has...

I transformed my data into a boxplot (used geom_boxplot of ggplot), so that the outliers got visible. Afterwards I wanted to remove them from my data. That is why I used "ggplot_build" to get on all the informations of the plot and saved it with a new name. Outlier_boxplot<-ggplot_build(boxplot) Now...

I have the following data frame which I imported into R using read.table() (I incorporated read.table() within read_data() which is a function I created that also throw messages in case the file name is not written appropriately): > raw_data <- read_data("n44.txt") [1] #### Reading txt file #### > head(raw_data) subject...

I'm trying to create a filter to remove lines from a dataset using grep and subset together. Sample dataset: id <- 1:10 problem <- c("a" , "b", "c", "d", "a","b","c","a", "b", "a") solution1 <- c("eat", "sleep", "drink", "play", "sleep", "play", "play", "drink", "play", "eat") solution2 <- c("read", "read", "eat", "drink",...

I have a large data set of the following format: First column is type, and the subsequent columns are different times that 'type' happens. I want to calculate the slope of each row (~7000 rows) for subset T0-T2 and then t0-t2 and output that information, then get the average of...

I have a dataset with 45 columns and >8000 observations. One of the variables in the columns is city-name. I want to remove all observations that are located in cities that begin with the letter "S". How would I do this? I'm pretty new to R, so sorry if this...

I have a large data frame marking occurrences of trigrams in a string, where the strings are the rows, the trigrams are the columns, and the values mark whether an trigram occurs in a string. so something like this: strs <- c('this', 'that', 'chat', 'chin') thi <- c(1, 0, 0,...

Suppose I had the following matrix: matrix(c(1,1,2,1,2,3,2,1,3,2,2,1),ncol=3) Result: [,1] [,2] [,3] [1,] 1 2 3 [2,] 1 3 2 [3,] 2 2 2 [4,] 1 1 1 How can I filter/subset this matrix by whether or not each row has duplicate values? For example, in this case, I would only...

I want to find an algorithm that given a set A to find all groups of subsets that satisfy the following condition: x ∪ y ∪ .... z = A, where x, y, ... z ∈ Group and ∀ x,y ∈ Group: x ⊆ A, y ⊆ A, x ∩...

The question relates to iris data set: library(datasets) data(iris) How to extract column 'Sepal.Width' for the species virginica?...