group,stata , Defining groups within an interval

Defining groups within an interval


Tag: group,stata

In Stata with the following data

 ID      Date
 1      1/1/2010
 2      1/1/2010
 3      1/4/2010
 4      1/5/2010
 5      1/8/2010
 6      1/10/2010
 7      1/11/2010

I am trying to create a variable Dummyi that gives a unique variable to all of the IDs that occurred within three days (before or after) of the focal ID.

I first wanted to identify the IDs that were within a three day window of a given ID and then assign a unique number to all of those.

 qui forvalues i = 1/`=_N' {
     gen Dummy`i'
     replace Dummy`i' = `i' if Date <= (Date[`i']-3) & ID == `i' 

This approach was getting there but there are missing ID values, they are not neatly sequential and this wasn't taking the IDs that occurred before into account. Finally, multiple IDs fall into two groups (e.g. ID==5) and I was not sure how to separate without creating separate Dummy variables, which is fine.

Resulting data should look like the following.

 ID      Date      Dummy1   Dummy2  Dummy3  Dummy4   Dummy5  Dummy6  Dummy7
 1      1/1/2010     1        1       1        0       0       0       0
 2      1/1/2010     1        1       1        0       0       0       0
 3      1/4/2010     1        1       1        1       0       0       0
 4      1/5/2010     0        0       1        1       1       0       0
 5      1/8/2010     0        0       0        1       1       1       1
 6      1/10/2010    0        0       0        0       1       1       1
 7      1/11/2010    0        0       0        0       1       1       1

set more off

*----- example data -----

input ///
 id str10 date
 1      "1/1/2010" 
 2      "1/1/2010"       
 3      "1/4/2010" 
 4      "1/5/2010"
 5      "1/8/2010"
 6      "1/10/2010"
 7      "1/11/2010"

gen date2 = date(date, "MDY")
format %td date2
drop date


*----- what you want -----

isid id
levelsof id, local(levid)

forvalues i = 1/`=_N' {
    local lid : word `i' of `levid'
    gen ind`lid' = inrange(date2[`i'], date2 - 3, date2 + 3)

list, sep(0)

levelsof I used in case id is some irregular sequence. The indicator variable (you call it dummy) is named according to the corresponding id.

See help extended_fcn if you don't have experience with extended macro functions (local lid : word ...).


Align bars in ciplot

I'm working with the ciplot graphing module for Stata and am encountering a problem with the alignment of bars when I use the by() option. Here's a trivial example demonstrating the issue: webuse citytemp, clear ciplot heatdd cooldd, by(region) horizontal recast(conn) So, the graph shows means and confidence intervals for...

How to achieve MySQL Query combining multiple COUNT() and GROUP BY

So I have this type of data: id date user_id result 1 2015-05-04 1 win 2 2015-05-06 1 loss 3 2015-05-09 2 loss 4 2015-05-10 2 win 5 2015-05-16 1 win I need to get the top 4 users sorted by most wins. But I also need to get the...

Difference between Pane and Group

My question is simple : in JavaFX, what is the difference between a Pane and a Group ? I can't make any difference ... Thank for your answer...

How to retrieve the most present pair in groups of three columns, with MySQL?

I am working on ice hockey software: trying to find out who in your team has collected the most points with a specific player (in this example user_id = 1). Data structure: goal_user_id | assist_user_id | second_assist_user_id ----------------------------------- 1 | 13856 | null 1 | 15157 | null 1 |...

How to do it without the while loop in sql

I have a table like this, A | B ---------------------- 1 | 10 1 | 20 2 | 30 2 | 40 I need output as, A | B ------------------------ 1 | 10,20 2 | 30,40 Thank you in advance...

Using Count and date

I have the problem where I need to select records as this: If the # of Job "Plumbers" > 3 (Per House) , return only 03 records, with the plumbers order by "hire date". If there are more than 03 records per "House", I need to return ALWAYS 03 records...

SQL grouping results by first date of the week

Currently writing an sql query on sql server 2008 and have a query which counts the amount of sales in a week, currently my code is grouping by week number however I am wanting to return the first date in the week. currently it's returning week number count 1 5...

how to: Separate a continuous variable by % proportions?

I have a continuous variable (in this case, fees spent). How do I determine % spending cutoffs? i.e. how do I know what dollar amount separates the bottom 50% from the top 50% (similarly for any other % I may be interested in). Thank you very much for any help

Merge multiple records in single row

I am trying to get two separate records to display on a single line, but keep getting multiple lines with nulls. I thought that if I grouped by Test_Date, I'd be able to get a single row for each test date, with the scores on a single line, but as...

Assign rows to a group based on spatial neighborhood and temporal criteria in R

I have an issue that I just cannot seem to sort out. I have a dataset that was derived from a raster in arcgis. The dataset represents every fire occurrence during a 10-year period. Some raster cells had multiple fires within that time period (and, thus, will have multiple rows...

Translating Stata to R: collapse

Just came across a .do file that I need to translate into R because I don't have a Stata license; my Stata is rusty, so can someone confirm that the code is doing what I think it is? Here's the Stata code: collapse (min) MinPctCollected = PctCollected /// (mean) AvgPctCollected...

Plotting Pandas groupby groups using subplots and loop

I am trying to generate a grid of subplots based off of a Pandas groupby object. I would like each plot to be based off of two columns of data for one group of the groupby object. Fake data set: C1,C2,C3,C4 1,12,125,25 2,13,25,25 3,15,98,25 4,12,77,25 5,15,889,25 6,13,56,25 7,12,256,25 8,12,158,25 9,13,158,25...

Stata: Retrieve variable label in a macro

I'm generating graphs for several variables using a do-file, I would like to be able to retrieve a variable label (so that I could use it for the graph title). In my dreams, something along those lines: sysuse auto, replace local pricelabel = varlab(price) display "Label for price variable is...

Run a regression of countries by quartiles for a specific year

I am exploring an effect that I think will vary by GDP levels, from a data set that has, vertically, country and year (1960 to 2015), so each country label is on 55 rows. I ran sort year by year: egen yrank = xtile(rgdp), nquantiles(4) which tags every year row...

How to group related .tif files?

I am attempting to group related files associated with a .tif image. You can see from the list that there are 7 related files per group. I am looking for a way to group these files so that I can move them via shutil.move() into various folders. The following script...


I am having a problem with my sql statement, I have two tables (ogrenci and taksit). I am trying to count the number of taksits where they are zero and group them by KategoriID.KategoriID is both in ogrenci and taksit tables while odendi is only in taksit table. I could...

Export average value of one variable over 2 other variables

I can use tabout var1 to export a matrix-like object with var1 on one axis, var2 on the other, and the frequencies that each combination of the values var1 and var2 occur. Is it possible to use tabout or some other commmand (preferably native to Stata) to do something...

Replace values within group

I have a following data where I am trying to replace income for years 1980 and 1981 with that of year 1979 (no change for year 1978) [for each state]. state year size income 1 1978 1 1000 1 1978 1.5 100 1 1978 2 5000 1 1979 1 3779.736...

Mysql conditional count with group by in where clause

I have a simple messaging system - keeping all the messages in a single table. Each message can(and should) be associated with one of the 3 other tables, that represent some sections of the website. Here is the create table statement CREATE TABLE `messages` ( `id` bigint(20) unsigned NOT NULL...

How to get certain information out of arraylist grouped into other lists in Java

I wrote a program, that reads multiple (similar) textfiles out of a Folder. Im splitting the information by space and store everything in one arraylist which contains data kind of this: key1=hello key2=good key3=1234 ... key15=repetition key1=morning key2=night key3=5678 ... Now I'm looking for a way to get those information...

Stata: Keep only observations with minimum, maximum and median value of a given variable

In Stata, I have a dataset with two variables: id and var, and say 1000 observations. The variable var is of type float and takes distinct values for all observations. I would like to keep only the three observations where var is either the minimum of var, the maximum of...

conditional grouping and summarising data frame in [R]

I have a data frame like this: df <- data.frame(ID = c("A", "A", "B", "B", "C", "C"), time = c(3.1,3.2,6.5,12.3, 3.2, 3.4), intensity = c(10, 20, 30, 40, 50, 60)) |ID | time| intensity| |:--|----:|---------:| |A | 3.1| 10| |A | 3.2| 20| |B | 6.5| 30| |B | 12.3|...

Group query results by “today”, “in last 7 days”, “in last month” and “older”

I have a simple question related to grouping rows by date with some "narrative" periods. Let's assume that I have very simple table with articles. ID which is PK, title and date. The date column is datetime / timestamp. I would like to group somehow my results so I can...

Linear algebra on dataframes in Stata vs R/Python [closed]

In R (and I think Panda in Python), datasets roughly correspond to a list of vectors. Before applying linear algebra on a set of numeric variables in a dataset, one first need to convert them into a matrix (see for instance the code in R lm). This requires a deep...

LinkedIn Group API changes

As far as I can tell, with the new changes to the API's, LinkedIn's Group api's are no longer public at all. Are there any ways to access the posts made in a group without using the Group api? All I'm looking to do is to read the posts from...

Stata- Stopping at the variable before a specified variable in a varlist

I'm stuck on a tricky data management question, which I need to do in Stata. I'm using Stata 13.1. I have 40+ datasets I need to work on using a subset of variables that is different in each dataset. I can't include the data or specific analysis I'm doing for...

Grouping by similar categoies over time sql

I looked around for awhile, but couldn't find anything. I have a table that looks like this: DATE | Shift | Parts Used 1/1/15 1:15.....1........1 1/1/15 2:06.....1........2 1/1/15 3:45.....1........3 1/1/15 7:33.....2........1 1/1/15 8:14.....2........2 1/1/15 9:00.....2........3 1/1/15 23:01....1........1 1/1/15 23:55....1........2 I would like to group by each individual shift. UNFORTUNATELY shift...

Pairwise correlations over rolling periods ignoring double calculations

I am trying to compute pairwise correlations over rolling windows for n= 40 variables where all rolled pairwise correlations for 2 given variables are saved in a new variable. My dataset has the following structure: Date V1 V2 V3 . . . 01/01/2009 0.3 0.6 0.5 02/01/2009 0.1 0.5 0.2...

When was a file used by another program

I have a file with a series of 750 csv files. I wrote a Stata that runs through each of these files and performs a single task. The files are quite big, and my program has been running for more than 12 hours. Is there is a way to know...

Stata: Storing only part of a FE regression output for graphing

I am running a regression with two fixed effects categories (country and year, is economic macro data). Since I am using xtreg, one is autohid, but the other is a variable: xtreg fiveyearyg taxratio i.year if taxratiocut == 1, i(wbcode1) fe cluster(wbcode1) estimates store yi I am running a number...

Find social network components in Stata

[I copied part of the below example from a separate post and changed it to suit my specific needs] pos_1 pos_2 2 4 2 5 1 2 3 9 4 2 9 3 The above is read as person_2 is connected to person_4,...,person_4 is connected to person_2, and person_9 is...

Group array by common id and date and sumarize to calendar table

I've been ripping my hair out on this one. Struggling with understanding how i really want to attack this problem, even the basic logic... I've got this dataset: Array ( [0] => stdClass Object ( [id] => 233773869 [pid] => 9919304 [tid] => 6754304 [uid] => 1502708 [project] => **HIDDENNAME**...

SQL: Select new columns based on group order

I went through many questions on this site but wasn’t able to find solution. I’ve got a table: Date GroupID CHANNEL 24/02/2015 1 A 26/02/2015 1 B 27/02/2015 1 C 21/03/2015 2 D 20/02/2015 3 E 25/02/2015 3 D 28/02/2015 4 C 04/03/2015 5 B 05/03/2015 5 E 10/03/2015 5...

Groupby without using underscore js

I have a collection which is group by ID. Each id contains array of objects. I want to loop through it and create a target collection which will be group by "Id" column as shown on the example. I will not use underscore js. I have to use javascript reduce...

Drop random effects parameters from output table in Stata

I want to create a regression table (using esttab) from a mixed-effects regression estimated via xtmixed in Stata, but I want the output without the random effects parameters. How can I drop the random effects parameters from the output table? E.g., in the case of two variables... xtmixed (Dependent Variable)...

Generating report directly from Mysql Query data (using groupby, count)

I have two tables for storing image and its related exif data: Image table has records like below: (query: select * from image_table where order_id = 3030303) Image_Exif_Info table has records like below: (query: select * from image_exif_info where image_id in (select image_id from image_table where order_id = 3030303) As...

Stata: refer to categorical fields by their labels instead of their numbers

I am trying to use categorical variables more efficiently. Suppose I have a categorical variable phone, which has the following values: ---------------------- phone | Freq. ----------+----------- Landline | 223 Mobile | 49,297 Both | 1,308 I want to run a command something like this: sum x if phone == Mobile...

SQL: Query to merge (some) rows

I'm currently trying to create a query that merges some rows to one category and sums up the related values. I'll try to illustrate it with the following example: Country | Sales Jan | Sales Feb | Sales Mar --------+-----------+-----------+---------- Austria | 100 | 110 | 120 Spain | 120...

MongoDB aggregate group multiple fields

Given the following dataset: { "_id" : 1, "city" : "Yuma", "cat": "roads", "Q1" : 0, "Q2" : 25, "Q3" : 0, "Q4" : 0 } { "_id" : 2, "city" : "Reno", "cat": "roads", "Q1" : 30, "Q2" : 0, "Q3" : 0, "Q4" : 60 } { "_id"...

Group By Date and SUM int - Date formatting and Order

I'm trying to get the SUM of the quantity sold on a day. I need the results to be in ORDER BY date. The query below gives me the exact result I need, except the date is not formatted to what I need. SELECT CAST(Datetime AS DATE) AS 'date', SUM(quantity)...

read Excel cells into Stata global as variables

There are several panel datasets I'd like to join. The observations in these datasets are identified by an id variable and a variable identifying the time the observation was made. All datasets include some variables I need, some I don't need and never the same variables (excluding the id and...

Combine two queries in Oracle

I have 2 queries to retrieve faultCount and responseCount as follows and it works fine. select count(*) as faultCount, COMP_IDENTIFIER from CORDYS_NCB_LOG where AUDIT_CONTEXT='FAULT' group by COMP_IDENTIFIER order by responseCount; select count(*) as responseCount, COMP_IDENTIFIER from CORDYS_NCB_LOG where AUDIT_CONTEXT='RESPONSE' group by COMP_IDENTIFIER order by responseCount; I need to join to...

Create a variable by dividing a variable by IQR in Stata

How could I create a variable by dividing it by an IQR? I have done it through a long way as follows. Sample data and code is the following: use, clear foreach var of varlist read-socst { egen `var'75 = pctile(`var'), p(75) egen `var'25 = pctile(`var'), p(25) gen `var'q...

Replace loop in Stata

I have two variables: patient id and date. Many patients on my database are duplicated. I want to keep the duplication, but apply to each patient the earliest appearing date. Ex: ID Date 1 8/9/07 1 6/3/07 1 11/15/08 2 8/6/06 2 8/6/06 2 11/5/09 would become ID Date 1...

Regex closest text matching [duplicate]

This question already has an answer here: Regular expression to stop at first match 4 answers I have string like below. I can't match closest group. I want to extract "text" word. Text Example: some string foo text doo foo doo some string I'm using this pattern. (foo)([\s\S]+)(doo|some) This...

Gaps and Islands solution in Oracle - use of recursive

I have a problem that could be easily solved using curser in Oracle. However, I wonder if that could be done using select only. I have 1 data set that contains the following fields: Start, Description, MaximumRow, SequentialOrder. The data set is ordered by Description, Start, SequentialOrder. This is the...

Group json object in javascript

I want to group json array by first letter This is my data records it quesry from sqlitedb Ex : [ {"pid":2,"ID":1,"title":"aasas as"}, {"pid":3,"ID":2,"title":"family"}, {"pid":4,"ID":3,"title":"fat111"} ] I need this output { A: [{ title: "aasas as", ID: 1 }], F: [{ title: "family", ID: 2 }, { title: "fat111", ID:...

Send a success signal when the group of tasks in celery is finished

So i have a basic configuration django 1.6 + celery 3.1. Say i have an example task: @app.task def add(x, y): time.sleep(6) return {'result':x + y} And a function that groups and returns job id def nested_add(x,y): grouped_task = group(add.s(x,y) for i in range(0,2)) job = result_array.apply_async() return

Dynamic forecasting (arima) with multiple regressors in Stata

I have a small time series dataset, a sample of which is below: year AvgU5MR AvgPov AvgEnrol 2000 126.9307 41.0109 67.11833 2001 123.4138 39.9748 68.66798 2002 119.93 45.85194 65.82739 2003 116.4923 55.3706 69.17756 2004 113.1362 32.63662 70.83884 2005 109.9008 41.08603 75.35649 2006 106.816 43.45722 75.98755 2007 103.8878 19.19114 76.86299 2008...

insert a group of rows to a table in PostgreSQL

i have a table with a lot of rows and i want to create a new table and copy just a bunch of rows (like 30) in my new table.... - the table name is account (code,code_activation,email,password) and i'm using PostgreSQL. and thanks in advance.