FAQ Database Discussion Community


How to aggregate in R with a custom function that uses two columns

r,aggregate,distance
Is it possible to aggregate with a custom function that uses two columns to return one column? Say I have a dataframe: x <- c(2,4,3,1,5,7) y <- c(3,2,6,3,4,6) group <- c("A","A","A","A","B","B") data <- data.frame(group, x, y) data # group x y # 1 A 2 3 # 2 A 4...

SQL find references with single occurences

sql,group-by,subquery,aggregate
I am trying to find a pure SQL solution for the following problem: If I sell paint and paintbrushes, and keep a record of which colors are sold like this: select OrderNumber, Product, Product_Type, Qty from Sales +-------------+------------+--------------+-----+ | OrderNumber | Product | Product_Type | Qty | +-------------+------------+--------------+-----+ | 0001...

Use loop inside aggregate but draw from different groups?

r,loops,aggregate
I don't know how to think my way out of this one. Here is my data: group <- c(1, 1, 1, 2, 2, 2, 2, 2, 3, 1, 1, 1, 2, 2) gdp <- c(3.5, 4.2, 5, 4, 4.2, 5, 5.5, 6, 3.5, 3.4, 4.0, 4.1, 4.3, 4.7) year <-...

java8 stream grouping and sorting on aggregate sum

java-8,grouping,aggregate,java-stream
Given a java class Something class Something { private int parentKey; private String parentName; private int childKey; private int noThings; public Something(int parentKey, String parentName, int childKey, int noThings) { this.parentKey = parentKey; this.parentName = parentName; this.childKey = childKey; this.noThings = noThings; } public int getParentKey() { return this.parentKey; }...

R: Roll up column values containing NA's by sum while grouping by ID's

r,data.table,aggregate,plyr,dplyr
I have a data frame that I got from ID <- c("A","A","A","A","B","B","B","B") Type <- c(45,45,46,46,45,45,46,46) Point_A <- c(10,NA,30,40,NA,80,NA,100) Point_B <- c(NA,32,43,NA,65,11,NA,53) df <- data.frame(ID,Type,Point_A,Point_B) ID Type Point_A Point_B 1 A 45 10 NA 2 A 45 NA 32 3 A 46 30 43 4 A 46 40 NA 5 B...

Nesting aggregate within apply to aggregate multiple columns by multiple variables in R

r,aggregate,nested-loops,apply,summary
I have a dataframe with sets of scores, and sets of grouping variables, something like: s1 s2 s3 g1 g2 g3 4 3 7 F F T 6 2 2 T T T 2 4 9 G G F 1 3 1 T F G I want to run an...

finding percentage frequency of outcomes over groups in R

r,aggregate,group-summaries
I have a very large data frame, representing time series data from an agent-based model, that looks like this: Each row in this dataset represents a single cycle of the model, which can run for an arbitrary length of time and terminate in one of three endings: "unity," "stability," or...

trouble with a user-defined function in aggregate

r,aggregate
I am trying to graduate from for loops by writing a function to plug into aggregate. It is not going well. The sample data is: group <- c(1, 1, 1, 2, 2, 2, 2, 2, 3, 1, 1, 1, 2, 2) gdp <- c(3.5, 4.2, 5, 4, 4.2, 5, 5.5,...

R - aggregate results in disparity between names(data) and columns in head(data)

r,aggregate,head,names
I have a dataset of long format with 3 factors (strain, genotype, region) and 1 value (volume). This dataset is called individualData. Basically what I'm trying to do is calculate the mean and standard deviation of volume for every combination of strain * genotype * region, with the exception of...

Design a email daily digest feature with spring integration

spring,aggregate,spring-batch,spring-integration
I currently have a SI project that listen on a JMS queue do some processing and depending on the messages send emails, write files, ... Now I want to add a daily digest feature on the sending emails part. I want email messages to be stored somewhere and that once...

Using aggregate() on data.frame objects

r,aggregate
Why aggregate() doesn't work here? > aggregate(cbind(var1 = 1:10, var2 = 101:110), by=list(range=cut(1:10, breaks=c(2,4,8,10))), FUN = function(x) { c(obs=length(x[, "var2"]), avg=mean(x[, "var2"]), sd=dev(x[, "var2"])) }) Error in x[, "var2"] (from #1) : incorrect number of dimensions > cbind(var1 = 1:10, var2 = 101:110)[, "var2"] [1] 101 102 103 104 105...

Aggregate count by several weeks after field data in PostgreSQL

postgresql,aggregate
I have a query returns something like that: registered_at - date of user registration; action_at - date of some kind of action. | registered_at | user_id | action_at | ------------------------------------------------------- | 2015-05-01 12:00:00 | 1 | 2015-05-04 12:00:00 | | 2015-05-01 12:00:00 | 1 | 2015-05-10 12:00:00 | | 2015-05-01...

Subtract subdocuments from collection in aggregate pipeline

mongodb,mongoose,aggregate,subdocument
I'm trying to retrieve a document from a mongodb and remove some objects from an array based on a condition (deleted != null). Here is a simplified example of the item I'm targeting: { "_id" : ObjectId("54ec9cac83a214491d2110f4"), "name" : "my_images", "images" : [ { "ext" : "jpeg", "type" : "image/jpeg",...

Scala parallel frequency calculation using aggregate doesn't work

multithreading,scala,concurrency,parallel-processing,aggregate
I'm learning Scala by working the exercises from the book "Scala for the Impatient". Please see the following question and my answer and code. I'd like to know if my answer is correct. Also the code doesn't work (all frequencies are 1). Where's the bug? Q10: Harry Hacker reads a...

R stats on data from multiple rows and columns by name

r,aggregate
I have a large dataframe with 12 rows corresponding to each unique ID. I want to calculate the mean for values from 8 columns by ID. In other words, I want single value means for all values in a 12x8 block (some have many NAs) Here's a simpler version with...

How to reference groupby index when using apply, transform, agg - Python Pandas?

python,pandas,group-by,dataframes,aggregate
To be concrete, say we have two DataFrames: df1: date A 0 12/1/14 3 1 12/1/14 1 2 12/3/14 2 3 12/3/14 3 4 12/3/14 4 5 12/6/14 5 df2: B 12/1/14 10 12/2/14 20 12/3/14 10 12/4/14 30 12/5/14 10 12/6/14 20 Now I want to groupby date in...

Aggregating monthly column values into quarterly values

r,aggregate
Hello Everybody I am pretty much completely new to R and any help is greatly appreciated. I have the following data (called "depressionaggregate") from 2004 until 2013 for each month: Month Year DepressionCount 1 01 2004 285 2 02 2004 323 3 03 2004 267 4 04 2004 276 5...

Aggregate() - R - Is is possible to mask one of the aggregated columns from displaying and adding another column [duplicate]

r,aggregate
This question already has an answer here: Aggregate a dataframe on a given column and display another column 7 answers I have a "emp" dataset with name, grade and value(based on manager's feedback) name grade value 1 Ram R 2.1 2 Sam R 2.4 3 Jam R 5.3 4...

Aggregate time column on hourly interval in R

r,aggregate
I have a df in R in the following format. What could be the easiest way to aggregate this on an hourly time interval basis, currently it is every minute theTime24 Amount 988 2015-02-04 23:53:00 2 989 2015-02-04 23:55:00 1 990 2015-02-04 23:56:00 3 991 2015-02-04 23:57:00 2 992 2015-02-04...

Trouble publishing/subscribing to a Mongo aggregate query

mongodb,meteor,aggregate,publish-subscribe,iron-router
Using Meteor and Iron-Router, I'm trying to publish the following mongo query (in the server folder): Meteor.publish("getTestList", function() { return Tests.aggregate( [{ $project : { "name" : 1, "description" : 1, "testNumber" : 1 } }, { $sort : { "testNumber" : 1 } } ]) }); Note that I...

Fastest way to count the occurrences of each unique column in a matrix in R

r,performance,matrix,aggregate
I'm new to R (and to stackoverflow) and I would appreciate your help. I would like to count the number of occurences of each unique column in a matrix. I have written the following code, but it is extremely slow : frequencyofequalcolumnsinmatrix = function(matrixM){ # returns a matrix columnswithfrequencyofmtxM that...

Is Spark Appropriate for Analyzing (Without Redistributing) Logs from Many Machines?

apache-spark,aggregate,analytics,bigdata
I've got a number of logs spread across a number of machines, and I'd like to collect / aggregate some information about them. Maybe first I want to count the number of lines which contain the string "Message", then later I'll add up the numbers in the fifth column of...

R dataframe select distinct list of value per group

r,group-by,aggregate,dataframes
I have a dataframe consists of 2 columns: > head(session_item_pairs) session_id item_id 53 811 111502665618 78 841 111502665618 127 901 111502665618 151 931 111502665618 477 1321 111502665618 501 1321 115641561585 I need to group the dataframe by session_id and for each session_id return the list of unique items in ascending...

Exclude field in array of subdocument in mongodb

node.js,mongodb,aggregate
I have two document image document: {_id:123,user:{user_sub_docuemnt},thumb:'dqwdqwdqwdw'} post document: {_id:444, user:{user_sub_document}, attach:[{_id:123, user:{user_sub_docuemnt}, thumb:'dqwdqwdqwdw'} ] } the user_sub_document contain password field, so I need to exclude that field. This is what i have so far: Post.aggregate([ {$match: {'user._id': {$in:idArr}}}, {$project:{content:1,attach:1,pub_date:1,'user.avatar':1}}, ],function(err,posts){ if(err){ throw err }else{ res.send(posts) } }) this will...

Receive a new formatted table via aggregation and group by

sql-server,group-by,aggregate
I am having a big issue with a SQL Server query here and I really don't know how to go on with it. The aim is to receive a table differentiated by different time-intervals going from 00:00 - 00:29 to 23:30 - 23:59. In each of these intervals I want...

Aggregate big dataset for a list of columns and using different FUN

r,data.table,aggregate,plyr,summary
I have a big dataset, and I need to summarise most of the columns by one single factor (CODE_PLOT). This is the list of columns I need to aggregate: > names(soil)[4:30] [1] "PH" "CONDUCTIVITY" "K" "CA" "MG" "N_NO3" [7] "S_SO4" "ALKALINITY" "AL" "DOC" "WATER_CONTENT" "Na" [13] "AL_LABILE" "FE" "MN" "P"...

Aggregate for one entity

entity,domain-driven-design,aggregate,ddd-repositories,aggregateroot
In Domain-driven design if I want to use a repository I need to have an aggregate for it - as I understand. So I have a User, that has id, login, email, and password. A user is a domain Entity with unique Id. When i want to add a User...

MongoDb Pipeline Aggregation $unwind sub sub documents

mongodb,mongoose,aggregate
I'm trying to list unwind a nested array using MongoDb. A product contains tasks, each task can have zero, one or more subTasks. Here is a sample product document: db.products.find({_id: ObjectId("554a13d4b692088a38f01f3b")}) Result: { "_id" : ObjectId("554a13d4b692088a38f01f3b"), "title" : "product title", "order" : 3, "description" : "Description here ", "status" :...

Add URL to aggregated variable SQL

sql,postgresql,url,aggregate
I join two SQL tables (applications, areas_application1) and group they by area. In the first table I have columns url_title and url_link. I want to display for eah area a list of url_title 's and add href link to each title. I tried to aggregate url_title by space, but I...

Filter by count in elasticsearch

elasticsearch,aggregate-functions,aggregate
Consider the following data POST programming/languages/1 { "name":"java", "type":"general_purpose" } POST programming/languages/2 { "name":"javascript", "type":"scripting" } POST programming/languages/3 { "name":"c", "type":"general_purpose" } GET programming/languages/_search { "query": { "match": { "type":"general_purpose" } } } If I need to find the docs which has more than one match of the field type,...

Date difference for each customer and product in R

r,aggregate
custid <- c(1,2,2,2) prod <- c("books", "highlighters", "books", "pens" ) qdate <- c(20130401, 20130403, 20130403, 20130404) tdate <- c(20130405, 20130804, 20130405, 20130405) data <- data.frame(custid, prod, qdate, tdate) data$qdate <- as.Date(as.character(data$qdate), "%Y%m%d") data$tdate <- as.Date(as.character(data$tdate), "%Y%m%d") (data2 <- difftime(data$tdate, data$qdate, data$custid, units="days")) #works data2 <- aggregate(cbind(data$tdate=format(date, '%Y-%m-%d'))~cbind(data$qdate=format(date,...

Summing rows by month in R

r,data.table,aggregate,dplyr,lubridate
So I have a data frame that has a date column, an hour column and a series of other numerical columns. Each row in the data frame is 1 hour of 1 day for an entire year. The data frame looks like this: Date Hour Melbourne Southern Flagstaff 1 2009-05-01...

Mongodb how can I query a subset of an array for a specific document

node.js,mongodb,aggregate
I want to run a query to select specific documents. Then on each document, open up an array of sub documents and run a query to filter those sub docs. Example: { "_id" : ObjectID(23412351346435), "list" : [ {date: ISODate(2015-01-12T00:00:00.000Z), name: "Jan 12"}, {date: ISODate(2015-01-13T00:00:00.000Z), name: "Jan 13"}, {date: ISODate(2015-01-14T00:00:00.000Z),...

R aggregate dataframe on two variables and apply function

r,aggregate
I have a dataframe which I'd like to aggregate over two variables applying the function mean on each meassurement. Here the head of the dataframe: Subject Activity meassureA meassureB meassureC meassureD 1 1 running 0.2820216 -0.037696218 -0.13489730 -0.3282802 2 1 running 0.2558408 -0.064550029 -0.09518634 -0.2292069 3 1 walking 0.2548672 0.003814723...

How to calculate the frequency of each value in a column corresponding to each value in another column in R?

r,aggregate
I have a dataset as follows: col1 col2 A 1 A 2 A 2 B 1 B 1 C 1 C 1 C 2 I want the output as: col1 col2 Frequency A 1 1 A 2 2 B 1 2 C 1 2 C 2 1 I tried using...

How can I operate on elements of a data.frame in r, that creates a new column? [duplicate]

r,merge,data.frame,aggregate
This question already has an answer here: Idiomatic R code for partitioning a vector by an index and performing an operation on that partition 3 answers Suppose I have a data.frame, df. a b d 1 2 4 1 2 5 1 2 6 2 1 5 2 3...

MongoDb Pipeline Aggregation sorting sub sub documents

mongodb,mongoose,aggregate
I've having a slight issue when trying to sort by a nested array in Mongodb using MongooseJs. a) A product contains tasks, each task has subTasks. b) The task has a order & so does each subTask (task.order & task.subtask.order). Here is a sample product document: db.products.find({_id: ObjectId("554a13d4b692088a38f01f3b")}) Result: {...

Mysql Agregate function to select maximum and then select minimum price within that group

mysql,sql,group-by,aggregate
I am trying to get the maximum value out of a aggregate function, and then also get the min value out of a Price column which comes back in results. id | discount | price 1 | 60 | 656 2 | 60 | 454 3 | 60 | 222...

average an unknown number of responses per respondent; R [duplicate]

r,split,aggregate
This question already has an answer here: Mean per group in a data.frame 4 answers Scenario: I have a df, "scores" of multiple users attempt(s) at passing a test. Each observation is an attempt with the userID, and score. Some users may pass on their first attempt, some might...

Mongo: get hashmap in aggregate

javascript,mongodb,hashmap,aggregate,mongodb-query
I have collection db.problems. Each document contains nested area and category objects. Document's example: { _id: 1, text: 'Hello', area: {_id: 23, title: 'Area 61'}, category: {_id: 12, title: 'Just a category'} } I'm trying to count problems in every area by category and get something like this: [ {...

MySQL - help writing group query

mysql,sql,subquery,group,aggregate
Here are my tables: Teacher: Student: Test a teacher has students, a student takes tests. SELECT t.name as teacherName, s.name as studentName, t.id as teacherID, s.id as studentID, MIN(tt.grade) as grade FROM teacher t JOIN student s ON s.`teacher_id` = t.id JOIN test tt ON tt.student_id = s.id GROUP BY...

Counting occurrences based on criterion

python,csv,pandas,aggregate
I'm trying to count how many total times a color shows up in a dataframe, but I only want it to select them on a selected criteria. for example I have: imageName color1 color2 color3 color4 shape img1 Red Red Red Red circle img2 Blue Green Red Blue circle img3...

aggregate by two columns R [duplicate]

r,aggregate
This question already has an answer here: Aggregating by unique identifier and concatenating related values into a string 4 answers I have a dataset with three column Date1 StudentId Status 08/04/2014 155261 Yes 08/04/2014 155261 No 08/25/2014 236991 Yes 08/27/2014 236991 Yes 08/29/2014 236991 Yes I am trying to...

Aggregate by specific year in R

r,aggregate
Apologies if this question has already been dealt with already on SO, but I cannot seem to find a quick solution as of yet. I am trying to aggregate a dataset by a specific year. My data frame consists of hourly climate data over a period of 10 years. head(df)...

MySQL number of rows per time

mysql,sql,aggregate
I'm trying to build a query that shows the total number of users for every month/year. The query I have right now shows me the users created in any particular month/year point. But what I want is the sum of users till that point, not just the ones created in...

exception: can't convert from BSON type EOO to Date

mongodb,date,mongoose,aggregate,bson
I am getting an issue for running the following aggregate query: db.snippets.aggregate([ { '$project': { month: { '$month': '$created_at' }} } ]) The error message for the same is: assert: command failed: { "errmsg" : "exception: can't convert from BSON type EOO to Date", "code" : 16006, "ok" : 0...

Taking column mean over a list of data frames in R

r,list,data.frame,aggregate,group-summaries
Here's what I'm trying to do. My data frame has a factor variable, "country", and I want to split the data frame based on country. Then, I want to take the column mean over every variable for every country's data frame. Data here: https://github.com/pourque/country-data I've done this so far... myList...

SUM query, between dates, grouped by employee

sql-server-2012,sum,aggregate
I have 4 tables, structured as follows: More verbosely, a project consists of a set of tasks. An employee can add one or more notes to the database, specifying the amount of time spent on that task on any given date. I would like to find for each employee, how...

RethinkDB - How to group on aggregated time intervals

aggregate,rethinkdb
I'd like to group and average on dynamic time intervals. Similar question to: Grouping into interval of 5 minutes within a time range. Right now, I am using the query: r.db('windMachine').table('stats3') .between(r.time(2015, 5, 7, 18, 0, 0, "Z"), r.time(2015, 5, 7, 24, 0, 0, "Z"), {index: "date"}) .group([r.row("date").hours(), r.row("date").minutes()]) .avg("TMP0")...

using MongoDB aggregate count subdata

mongodb,aggregate
here is my data in mongodb: { "data": { "order_goods": [{ "category": 235 }, { "category": 666 }] } }, { "data": { order_goods: [{ "category": 235 }] } } here is my expected output: {"category":235, "total":2} {"category":666, "total":1} I have try many ways about aggregate such as $group, but...

Sum of values from 3rd previous month

sql,postgresql,aggregate,date-arithmetic,aggregate-filter
I'm having difficulty grabbing rows from December (anything from the 3rd previous month). I'm attempting to count the amount of products sold within a certain time period. This is my current query: SELECT a.id, a.default_code, ( SELECT SUM(product_uom_qty) AS "Total Sold" FROM sale_order_line c WHERE c.product_id = a.id ), (...

Does aggregate() guarantee that the result will be ordered by the grouping columns?

r,aggregate
I've noticed that aggregate() appears to return its result ordered by the grouping column(s). Is this a guarantee? Can this be relied upon in surrounding logic? A couple of examples: set.seed(1); df <- data.frame(group=sample(letters[1:3],10,replace=T),value=1:10); aggregate(value~group,df,sum); ## group value ## 1 a 16 ## 2 b 22 ## 3 c 17...

Group and Match aggregrate MongoDB

mongodb,aggregate,aggregation-framework
I have a documents like this: { u '_id': ObjectId('5534cd32e4b0d5f14e6aa27d'), u 'geoip': { u 'coordinates': [-96.8353, 32.9299 ], u 'region_name': u 'TX', u 'latitude': 32.9299, u 'ip': u '173.193.154.240', u 'area_code': 972, u 'continent_code': u 'NA', u 'country_code3': u 'USA', u 'country_code2': u 'US', u 'city_name': u 'Dallas', u 'longitude':...

Using aggregate with a complex FUN argument

r,aggregate,circle
I am interested in calculating hourly averages. In the past I have had success using the aggregate function like so: Time<-c("1:00","1:20","1:40","2:00","2:20","2:40","3:00","3:20","3:40","4:00","4:20","4:40") Hour<-c(1,1,1,2,2,2,3,3,3,4,4,4) Bearing<-c(10,20,30,350,3,20,170,195,200,300,20,50) DF<-data.frame(Time,Hour,Bearing) ByHR<-aggregate(.~Hour,data=DF,mean) In this way, I get an hourly average of my Bearing data. But now I want to use the circ.mean() function to calculate means of...

How do you return multiple results from a mongoDB aggregate() query?

performance,mongodb,aggregate,mongodb-query
Here is a query which runs a match, and then returns the number of unique entries in a specific field: db.sequences.aggregate([{$match: {id: 5}}, {$group : {_id : "$field"} }, {$group: {_id:1, count: {$sum : 1 }}}]) However, what if I wanted to return multiple values, such as a count of...

Pandas aggregating average while excluding current row

python,pandas,aggregate
How to aggregate in the way to get the average of b for group a, while excluding the current row (the target result is in c)? a b c 1 1 0.5 # (avg of 0 & 1, excluding 1) 1 1 0.5 # (avg of 0 & 1, excluding...

Allow grouping with NA in aggregate function

r,aggregate
Here is dummy data temp.df <- data.frame(count = rep(1,6), x = c(1,1,NA,NA,3,10), y=c("A","A","A","A","B","B")) When I apply aggregate as given below: aggregate(count ~ x + y, data=temp.df, FUN=sum, na.rm=FALSE, na.action=na.pass) I get: x y count 1 1 A 2 2 3 B 1 3 10 B 1 However, I would like...

MongoDB - Aggregate max/min/average for multiple variables at once

mongodb,aggregate,multiple-columns
I am logging data into MongoDB in the following format: { "_id" : ObjectId("54f2393f80b72b00079d1a53"), "outT" : 10.88, "inT3" : 22.3, "light" : 336, "humidity" : 41.4, "pressure" : 990.31, "inT1" : 22.81, "logtime" : ISODate("2015-02-28T21:55:11.838Z"), "inT2" : 21.5 } { "_id" : ObjectId("54f2394580b72b00079d1a54"), "outT" : 10.88, "inT3" : 22.3, "light"...

Implementing the All functionality in SQL

c#,sql,linq,aggregate
I'm trying to write an SQL query where all of a certain group meets a condition. Update The Simplifed Table Structure would look like this ID, TitleID, BlockFromSale --------------------------- 1 | 1 | true 2 | 1 | true 3 | 1 | true 4 | 2 | false 5...

Summing Only Visible Rows in SSRS

ssrs-2008,aggregate,visibility,ssrs-tablix
I'm trying to sum only the visible rows for a report and I know the format is: =Sum( iif( <use the condition of the Visibility.Hidden expression>, 0, Fields!A.Value)) In my report, I've set row visbility to: =IIF(CInt(Fields!EM_ET.Value)=1 Or CInt(Fields!EM_ET.Value)= 2,True,False) Not exactly sure what I'm missing, but when I use...

R - Aggregate data in dataframe by creating groups “1:10” and “11 or more”

r,aggregate,dataframes
I am adding an auxiliary column to lately group a serie of data in different categories: for (i in 1:nrow(A) ) { if (A$B[i] < 11) { A$C[i] <- A$B[i] } else { A$C[i] <- "11 or more" } } B is a serie of numeric values [1,2,3,4,5...n]. After the...

mysql aggregate function for varchar column

mysql,sum,aggregate,varchar2
I need to calculate the sum of one column(col2) , but the column has both numbers and text. How do I exclude the text alone before I use sum()? The table has around 1 million rows, so is there any way other than replacing the text first? My query will...

Matrix Transformation in R - from aggregate output to outer-like matrix

r,matrix,aggregate,transformation,outer-join
I need to transform the output of an aggregate (mean) into a matrix outer-like style. data(mtcars) aggregate(disp ~ cyl + gear, data = mtcars, FUN = mean ) cyl gear disp 4 3 120.1000 6 3 241.5000 8 3 357.6167 4 4 102.6250 6 4 163.8000 4 5 107.7000 6...

R: “Binning” categorical variables

r,if-statement,aggregate,level
working in R, I have a data.frame which has 13 columns with factos. One of the columns contains credit rating data and has 54 different values: levels(TR_factor$crclscod) [1] "A" "A2" "AA" "B" "B2" "BA" "C" "C2" "C5" "CA" "CC" "CY" "D" [14] "D2" "D4" "D5" "DA" "E" "E2" "E4" "EA"...

SQL: Column Sum

sql,select,aggregate
Lets have following sample table: Person Quantity A 1 B 2 C 3 D 4 E 5 Result should be: PersonAggregate 1 (0+Quantity of PersonA)=sumA 3 (sumA+Quantity of PersonB)=sumB 6 (sumB+Quantity of PersonC)=sumC 10 (sumC+Quantity of PersonD)=sumD 15 (sumD+Quantity of PersonE) Is it possible to get this result in onq...

R data.table: reuse an aggregation

r,expression,data.table,aggregate,quoting
I want to apply the same aggregation to multiple data tables, without rewriting the aggregation scheme. Consider dt1 <- data.table(id = c(1,2), a = rnorm(10), b = rnorm(10), c = rnorm(10)) dt2 <- data.table(id = c(1,2), a = rnorm(10), b = rnorm(10), c = rnorm(10)) dt1_aggregates <- dt1[, .(mean_a=mean(a), sd_a=sd(a),...

Return the proportionate share of the same type

mysql,sql,aggregate
just for example lets say I have this table sales(ID,type,price). Now, for each entry I have to return its proportion of its type sellings. For instance, if my total revenue from selling bikes is 1000, and I sold 1 pair of bike for 100 so it's proportion will be 0.1....

OrientDB average over collection

sql,aggregate,average,orient-db
Im trying orientdb with some simple apps. In this app I have People (persona) and places to eat (sitio), people go to those places (I store how many times they do) and score them. Using this query: SELECT *, inE('puntua').comida as comida, inE('puntua').servicio as servicio, inE('puntua').extras as extras from Sitio...

R aggregate by large number of columns

r,aggregate
I have a data frame (df) that has about 40 columns, and I want to aggregate using a sum on 4 of the columns. Outside of the 4 I want to sum, each unique value in column 1 corresponds to identical values across the rest of the columns, and I...

Writing my own aggregate function in postgresql

sql,postgresql,aggregate
I've never wriiten my own aggreagtes, only store procedures and I need some advice. I want to write a custom aggregate which is going to return maximum value of the integer rows and incerement it by 10. How can I do that? I tried this: CREATE AGGREGATE incremented_max ( v...

Sort files in a folder and its subfolders by type and size

bash,file,size,aggregate,file-extension
I have to count a huge number of files in a directory and subdirectories recursively by extension and get the total size for each extension. I do not want to see individual files with their type and size next to them. For example, this would be one potential output: JPEG...

Aggregate variables separetly in R [lapply + aggregate]

r,aggregate,lapply
I have a data.frame with a set of records and as variables different measurements. I would like to create a new data.frame containing the amount of records having a specific measurement value for each measurement. Basically what I am trying to do is: record <- c("r1", "r2", "r3") firstMeasurement <-...

Aggregating connected sets of nodes / edges

postgresql,aggregate,common-table-expression,recursive-query,window-functions
I have a connected set of edges with unique nodes. They are connected using a parent node. Consider the following example code and illustration: CREATE TABLE network ( node integer PRIMARY KEY, parent integer REFERENCES network(node), length numeric NOT NULL ); CREATE INDEX ON network (parent); INSERT INTO network (node,...

MongoDB: find documents with a given array of subdocuments

arrays,mongodb,find,aggregate,subdocument
I want to find documents which contain given subdocuments, let's say I have the following documents in my commits collection: // Document 1 { "commit": 1, "authors" : [ {"name" : "Joe", "lastname" : "Doe"}, {"name" : "Joe", "lastname" : "Doe"} ] } // Document 2 { "commit": 2, "authors"...

Neo4j Aggregate Multiple Lines into a Map

neo4j,cypher,aggregate
I have the following Cypher script: MATCH (sy:SchoolYear)<-[:TERM_OF*]-()<-[:DAY_OF]-(d:Day) WHERE sy.year = 2015 OPTIONAL MATCH (d)<-[:START]-(e:Enrollment)-[:AT]->(s:School) RETURN d.date, s.abbreviation, count(e) ORDER BY d.date This gives me all of the dates in the range that I want and returns number of students that have enrolled for each school for that date, or...

How to combine rows based on unique values in R? [duplicate]

r,aggregate
This question already has an answer here: Collapse text by group in data frame 2 answers I'm a pretty beginner at R. I've a CSV file where data is as follows, for example: ID Values 820 D1,D2,FE 730 D1,D2,D3,PC,Io,He,Bt,Te,AR,PG 730 DV,GTH,LYT 567 EDR,TYU,EOP,OMN 567 FGH,KIH,IOP I want to remove...

I need to join 3 tables avoiding duplicates and aggregating data from 2 of the tables

sql-server,join,aggregate
I am working with some Email data and have 3 files: sent, opens (a subset of sent), and clicks (a subset of opens). Basically i want to join opens and clicks to the Send file by SubID (a unique identifier). Also, there have been 3 email deployments (JobID); I want...

R aggregating on date then character

r,aggregate
I have a table that looks like the following: Year Country Variable 1 Variable 2 1970 UK 1 3 1970 USA 1 3 1971 UK 2 5 1971 UK 2 3 1971 UK 1 5 1971 USA 2 2 1972 USA 1 1 1972 USA 2 5 I'd be grateful...

R - aggregate create strange size of dataset after store

r,aggregate
When I try to make a statistics using aggregate with below data - as a example - strange size of output was stored group O18 D 1 st1 -6.89 -53.52709 2 st1 -7.16 -56.75802 3 st1 -7.17 -54.79471 4 st2 -7.09 -54.98376 5 st2 -7.02 -54.56050 6 st2 -6.99 -55.55472...

Computing MAD(mean absolute deviation) GroupBy Pandas

python,pandas,group-by,dataframes,aggregate
I have a dataframe: Type Name Cost A X 545 B Y 789 C Z 477 D X 640 C X 435 B Z 335 A X 850 B Y 152 I have all such combinations in my dataframe with Type ['A','B','C','D'] and Names ['X','Y','Z'] . I used the groupby...

MongoDB aggregate group multiple fields

mongodb,group,project,aggregate
Given the following dataset: { "_id" : 1, "city" : "Yuma", "cat": "roads", "Q1" : 0, "Q2" : 25, "Q3" : 0, "Q4" : 0 } { "_id" : 2, "city" : "Reno", "cat": "roads", "Q1" : 30, "Q2" : 0, "Q3" : 0, "Q4" : 60 } { "_id"...

Unexpected behavior in subsetting aggregate function in R

r,logic,aggregate,subset,subsetting
I have a data frame that contains with the following format: manufacturers pricegroup leads harley <2500 # honda <5000 # ... ... .. I am using the aggregate function to pull out data in the following way: aggregate( leads ~ manufacturer + pricegroup, data=leaddata, FUN=sum, subset=(manufacturer==c("honda","harley"))) I noticed this is...

monthly means with apply for multidimensional arrays

r,aggregate,apply
I want to compute the mean over the 3-D of a multidimensional array. As this dimension is supposed to be the time, I wanted to computed monthly means. For that, I tried to use apply, but I am not sure where the problem is. Let's say my data is as...

Why does post-increment fail in .Aggregate(…) but pre-increment succeeds?

c#,linq,aggregate,reduce
I was fiddling with one of my project euler answers to try and make it a bit shorter/cleaner/succinct. I came up with this: Sequences.FibonacciBig() // infinite fib sequence of type BigInteger .TakeWhile(f => f.ToString().Length < 1000) .Aggregate(1, (i, _) => i++); My test failed as the actual was 1, which...

sum column when creating table r

r,table,aggregate
I have the following data frame in r: origin destination amount 1 1 2 50 2 1 2 100 3 1 2 20 4 1 3 100 5 2 3 30 6 2 3 50 7 2 1 20 8 3 2 10 9 3 2 40 10 3 1...

Display mongodb multiple group by query results in a nested format

mongodb,group-by,nested,aggregate,multiple-columns
Given the following dataset: [ { "account_id" : "1111" "task_id" : "aaaa", "workweek" : "20", "hours": "18" }, { "account_id" : "1111" "task_id" : "aaaa", "workweek" : "20", "hours": "12" }, { "account_id" : "1111" "task_id" : "aaaa", "workweek" : "21", "hours": "10" }, { "account_id" : "1111" "task_id" :...

R summarize unique values across columns based on values from one column

r,unique,aggregate
I want to know the total number of unique values for each column based on the values of var_1. For example: Test <- data.frame(var_1 = c("a","a","a", "b", "b", "c", "c", "c", "c", "c"), var_2 = c("bl","bf","bl", "bl","bf","bl","bl","bf","bc", "bg" ), var_3 = c("cf","cf","eg", "cf","cf","eg","cf","dr","eg","fg")) The results I am looking for would...

How to calculate count by group, then keep only one per group

r,merge,data.table,aggregate
Say that I have this data.frame, data: data <- data.frame(val=c(rep(6,10), rep(7, 15), rep(8, 20), rep(9, 25), rep(10, 100), rep(11, 20), rep(12, 15), rep(13, 10))) data$plus <- data$val + 100 My goal is to create a new data.frame that has the frequencies of each val, and the associated plus value. My...

How to match two time columns and print value if they match?

r,datetime,match,data.table,aggregate
I am running my head against the wall here. Hope someone can help. I have an aggregated data frame (d1) in R with a time column and a column with binary values. The time column does not have a uniform time step. d1: Time Set 1: 2015-01-03 14:55:00 0 2:...

r aggregate dataframe: some columns unchanged, some columns aggregated

r,aggregate
I have tried to look at manuals for aggregate, apply, etc, but I can't find examples where some columns are skipped by the function being applied, while other columns are copied as-is. Example: > olddf = data.frame(code=c("one","one","two"), val1=c(1,2,3), val2=c(4,5,6), val3=c(7,8,9)) > olddf code val1 val2 val3 1 one 1 4...

Simple aggregation by id variables in R

r,aggregate
I'm having a problem with the aggregate function. My data.frame looks like this: **Region Sex SNI value** orebro Man 0 497 orebro Man 0 1 orebro Man 1 120 referens Man 96 3045 referens Man 96 41 referens Woman 0 2061 referens Woman 0 2450 But I want it to...

Aggregate NAs in R

r,aggregate,nan,na
I'm having trouble handling NAs while calculating aggregated means. Please see the following code: tab=data.frame(a=c(1:3,1:3), b=c(1,2,NA,3,NA,NA)) tab a b 1 1 1 2 2 2 3 3 NA 4 1 3 5 2 NA 6 3 NA attach(tab) aggregate(b, by=list(a), data=tab, FUN=mean, na.rm=TRUE) Group.1 x 1 1 2 2 2...

Create view with aggregated columns from three levels of nested tables

sql,postgresql,view,left-join,aggregate
I'm using Postgres 9.4 and I have 3 tables: truck , container and container_metadata. A truck can have many container and a container many container_metadata. I'll add more description later with the tables below (I did my best creating them), so here goes: A truck can contain many containers referred...

Aggregating while merging two dataframes in R

r,merge,data.table,aggregate,dplyr
The ultimate goal is to sum the total quantity(transact_data$qty) for each record in product_info where the transact_data$productId exists in product_info, and where transact_data$date is between product_info$beg_date and product_info$end_date. The dataframes are below: product_info <- data.frame(productId = c("A", "B", "A", "C","C","B"), old_price = c(0.5,0.10,0.11,0.12,0.3,0.4), new_price = c(0.7,0.11,0.12,0.11,0.2,0.3), beg_date = c("2014-05-01", "2014-06-01",...

Descriptive Statistic for Multilevel (clustered) Data

r,function,aggregate,plyr,multi-level
I am having trouble generating complex cross-sections of descriptive statistics for data that are multilevel in nature. I have tried to go at this from a couple of different angles, but to no avail. Below please find some code I used for a plyr solution that failed. The issue is...

Fastest way to PostgreSQL Distinct and Format

sql,postgresql,distinct,aggregate,postgresql-performance
I have a 3.5 million rows in table acs_objects and I need to retrieve column creation_date with year only format and distinct. My first attempt : 180~200 Sec (15 Rows Fetched) SELECT DISTINCT to_char(creation_date,'YYYY') FROM acs_objects My second attempt : 35~40 Sec (15 Rows Fetched) SELECT DISTINCT to_char(creation_date,'YYYY') FROM (SELECT...

Fetch object with aggregations on related table

postgresql,laravel,laravel-4,eloquent,aggregate
How to fetch object with aggregations on related table using eloquent laravel 4. sql structre: TABLE companies id PRIMARY KEY title TEXT ... TABLE ratings id PRIMATRY KEY company_id FK KEY some_rating_1 INTEGER -- for example rating for delivery some_rating_2 INTEGER -- for example rating for customer service some_rating_2 INTEGER...

MongoDB C# 2.0 Driver Multiple Unwinds

c#,mongodb,aggregate,mongodb-query
I'm trying out Mongo for the first time and am having an issue where I have: public class A { public int ID {get;set;} . . . public List<B> Bs { get; set; } } public class B { public int ID { get; set; } . . . public...

Create a new column by grouping on type, new column's value is entire group's column value

string,python-2.7,pandas,group-by,aggregate
I have a pandas dataframe that essentially looks like this: type item string 1 0 aa 1 1 bb 1 2 cc 2 0 dd 2 1 ee 2 2 ff I want to somehow create a new column 'newstring' based off of the group's 'string' column type item string...

MongoDB aggregate for Arithmetic Operations - Subdocument Field

javascript,node.js,mongodb,mongoose,aggregate
We have a aggregate query projecting few sub documents. We wanted to apply a few arithmetic operations like Sum and Product on these projected values. Aggregation Query -- Item.aggregate([ { $unwind: '$dummy'}, { $match: {'dummy.storage': {$gt: 0}} }, { $group: {_id: '$_id', dummy: {$push: '$dummy'}, original_y: { $first: "$original_y" },...