FAQ Database Discussion Community

## create quantile category variables using defined cut-points in Stata

category,stata,quantile
I am trying to create indicator variables using different quantile levels. I am creating a variable that contains categories corresponding to quantiles. For one variable, the code I am using is xtile PH_scale = PH, nq(4) tab PH_scale, gen(PH_scale_) Also, I know that if I want to use my own...

## Stata: order a dataset using a custom sorting order

sorting,stata
I have a dataset where numeric variable VARSORT takes only 3 values: 10, 20 and 30 (there are no missings). I would like to sort observations based on VARSORT but where the custom sort order would be the following : 20 first, then 10, then 30. Is it possible to...

## Drop random effects parameters from output table in Stata

stata,mixed-models
I want to create a regression table (using esttab) from a mixed-effects regression estimated via xtmixed in Stata, but I want the output without the random effects parameters. How can I drop the random effects parameters from the output table? E.g., in the case of two variables... xtmixed (Dependent Variable)...

## How can I change the value stored in local macro from string to numeric in Stata?

types,macros,global,local,stata
I want to generate a variable with lagged year depending on the year stored in "$S_DATE" macro I have stored the year in date macro: . local date substr("$S_DATE",8,.) . display date' 2015 And I want to generate the new variable with gen start_year =date'- y_passed where y_passed is a...

## Align bars in ciplot

plot,stata
I'm working with the ciplot graphing module for Stata and am encountering a problem with the alignment of bars when I use the by() option. Here's a trivial example demonstrating the issue: webuse citytemp, clear ciplot heatdd cooldd, by(region) horizontal recast(conn) So, the graph shows means and confidence intervals for...

## Translating Stata to R: collapse

r,data.table,stata,code-translation
Just came across a .do file that I need to translate into R because I don't have a Stata license; my Stata is rusty, so can someone confirm that the code is doing what I think it is? Here's the Stata code: collapse (min) MinPctCollected = PctCollected /// (mean) AvgPctCollected...

## Stata overwrite all observations in cross section except last 20 non NA

panel,stata
I have a large strongly unbalanced panel in Stata, where each cross section only has a few observations, and the rest is NA (.). I want to overwrite all non NA observations that are not the last 20 non NA observations, in each cross section. I'm not sure how to...

## Fixed Compositional Weighting in Stata

stata
I'm looking at the Current Population Survey in Stata, although this question could apply to any survey with individual weights. It's straightforward to generate a table showing the mean of a variable -- say wages -- over time given individual weights: table qtr [aw=pworwgt], contents(mean wage) What I'd like to...

## Stata: estimating monthly weighted mean for portfolio

stata,weighted-average
I have been struggling to write optimal code to estimate monthly, weighted mean for portfolio returns. I have following variables: firm stock returns (ret) month1, year1 and date portfolio (port1): this defines portfolio of the firm stock returns market capitalisation (mcap): to estimate weights (by month1 year1 port1) I want...

## Stata: refer to categorical fields by their labels instead of their numbers

stata
I am trying to use categorical variables more efficiently. Suppose I have a categorical variable phone, which has the following values: ---------------------- phone | Freq. ----------+----------- Landline | 223 Mobile | 49,297 Both | 1,308 I want to run a command something like this: sum x if phone == Mobile...

## Retain the cluster number for each member of a cluster within an id variable

stata
I would like to label how many unique clusters of data are in a longitudinal dataset and have each member of the cluster carry the cluster count. Distinct clusters are those sharing a set of dates within an id. The order of those distinct cluster relative to previous (earlier) clusters...

## Stata: Keep only observations with minimum, maximum and median value of a given variable

select,stata,vlookup
In Stata, I have a dataset with two variables: id and var, and say 1000 observations. The variable var is of type float and takes distinct values for all observations. I would like to keep only the three observations where var is either the minimum of var, the maximum of...

## Stata: Saving to a specific folder

stata
I would like to save my outputs to a folder with the date of the analysis. My code looks like this local d = c(current_date) cd "c\RA-outputs" mkdir "d'" twoway bar weeksum week graph export "c\RA-outputs\d'\out1.png", as(png) replace However when I run this code I get an error saying the...

## Stata - panel data from column data

panel,stata,reshape
Panel data newbie here! I have data in Stata in the following format: Name Company1 Company2 Company3 Company4 Company5 Company6 1985 6.0781 2.4766 1.4258 2.6508 13.2083 1986 6.4844 3.0938 2.1953 3.1351 15.7917 1987 10.1563 .2769 5.7109 3.6406 4.4058 15.5833 1988 10.4688 .4219 5.125 3.75 3.6767 8.1667 1989 11.0625 .4289 5.4453...

## R equivalent of Stata *

r,stata
In Stata, if I have these variables: var1, var2, var3, var4, var5, and var6, I can select all of them with the command var*. Does R have a similar functionality?

## Wrap local macro in double quotes that will be visible to esttab's cells() option

stata
I am trying to lazily create a table of means and standard errors for a longish list of variables. It seems that the estout package from SSC and tabstat are the best tools, but I can't get the local macros to work properly to specify esttab's cells() option. sysuse auto,...

## Cartesian product in Mata

set,stata,cartesian-product
To construct a set of vectors, I'll need to take the Cartesian product of sets C[1]..C[d], D := {x : x[i] ϵ C[i], i = 1..d} Example: If *C[1]=(5,6,7)';*C[2]=(3,5,6)';*C[3]=(1,3,5)', then some elements of D are (5,3,1), (5,3,3) ... I would like to know: What is the best way to take...

## Export average value of one variable over 2 other variables

stata
I can use tabout var1 var2...style(csv) to export a matrix-like object with var1 on one axis, var2 on the other, and the frequencies that each combination of the values var1 and var2 occur. Is it possible to use tabout or some other commmand (preferably native to Stata) to do something...

## Twoway tabulations in Stata with all possible responses to each variable

stata
I am trying to create many tables of cross-tabs in the style of tab (twoway) or tabout in Stata. However, I want to include response options for both variables, even if every cell for that variable in the twoway tabulation would be zero. For instance, if we alter the auto...

## Using “input” command in forvalues; Stata acts like I hit break

loops,input,stata
I am trying to create empty observations for a number of fruits and a number of years. I thought it would be very straightforward, but Stata acts like I hit the "Break" key after the first time it enters the loop. What am I doing wrong? clear all gen fruit...

## How to add Stata programs in subdirectories?

stata
I'd like to add a directory containing .ado files to the ado-path. This directory contains several subdirectories, corresponding to different projects. The .ado files are in these subdirectories. However, when I type adopath + directory, commands in the .ado files are not recognized by Stata. I need to enter adopath...

## Stata — predict after regression by group_id

regression,stata,predict
I have to run regressions by group_id and then generate the predictions. It doesn't seem like predict allows the "by" option. Is there a way I can predict after running regressions by group_id? The data are stacked by group_id. The regression command I am thinking of using is as follows:...

## Stata: looping and appending

stata
I have a situation where I need to import all the files in a directory and append them. My code is this local files : dir "C:\Users\xx" files "*.xls" local n: word count files' tokenize files'' cd "C:\Users\xx" forval k =1/n'{ foreach file in files' { import excel "file'", sheet("Time...

## Stata- Stopping at the variable before a specified variable in a varlist

macros,stata
I'm stuck on a tricky data management question, which I need to do in Stata. I'm using Stata 13.1. I have 40+ datasets I need to work on using a subset of variables that is different in each dataset. I can't include the data or specific analysis I'm doing for...

## Create a variable by dividing a variable by IQR in Stata

stata
How could I create a variable by dividing it by an IQR? I have done it through a long way as follows. Sample data and code is the following: use http://www.ats.ucla.edu/stat/stata/notes/hsb2, clear foreach var of varlist read-socst { egen var'75 = pctile(var'), p(75) egen var'25 = pctile(var'), p(25) gen `var'q...

## ASCII Code for Uppercase/Capital R with a Tilde Character Above

ascii,stata
I am trying to get the equivalent of LaTeX's $\tilde R$ in a Stata graph axis label. I don't thinks there's a SMCL way of doing that, but it's possible to use ASCII characters. However, there does not seem to be an ASCII code for an uppercase/capital R with a...

## Stata bar charts with dates

stata
I'm trying to create a bar chart with a time axis. This is the code that I'm using: twoway (bar weeksum week) The week variable is a time variable and has the format %td. However, when I create the bar chart, the X axis does not follow the format specified...

## How to calculate PRESS and $R^2_{predicted}$ in Stata automatically

stata,prediction
So I have two models and I want to calculate these statistics. Is there any package to calculate them in Stata? PRESS statistic (wiki) And, if I am not mistaken. $$R^2_{predicted} = 1 - \frac{RESET}{ESS}$$. ...

## Replace loop in Stata

replace,stata
I have two variables: patient id and date. Many patients on my database are duplicated. I want to keep the duplication, but apply to each patient the earliest appearing date. Ex: ID Date 1 8/9/07 1 6/3/07 1 11/15/08 2 8/6/06 2 8/6/06 2 11/5/09 would become ID Date 1...

## How to substitute two variables in one loop in Stata

stata
I want to be able to carry out the following idea in Stata. I have a bunch of paired names. For instance Ryan and King is a pair. In a pseudo code keep if product_name == "i" | product_name == "j" where Ryan should substitute the i and King sub...

## Replace values within group

stata
I have a following data where I am trying to replace income for years 1980 and 1981 with that of year 1979 (no change for year 1978) [for each state]. state year size income 1 1978 1 1000 1 1978 1.5 100 1 1978 2 5000 1 1979 1 3779.736...

## Run a regression of countries by quartiles for a specific year

stata
I am exploring an effect that I think will vary by GDP levels, from a data set that has, vertically, country and year (1960 to 2015), so each country label is on 55 rows. I ran sort year by year: egen yrank = xtile(rgdp), nquantiles(4) which tags every year row...

## Stata putexcel summary statistics by group to MS Excel

excel,stata
I'm trying to get the Stata command putexcel to give me summary statistics for a continuous variable, grouped by a categorical variable, one after another, in the same worksheet. This should be repeated/looped through a number of years, where each year has its own sheet. This poses two problems: using...

## Find social network components in Stata

grouping,social-networking,stata
[I copied part of the below example from a separate post and changed it to suit my specific needs] pos_1 pos_2 2 4 2 5 1 2 3 9 4 2 9 3 The above is read as person_2 is connected to person_4,...,person_4 is connected to person_2, and person_9 is...

## How do I save the relative row or column frequencies after tab twoway in a matrix?

stata
If I run this code: sysuse auto, clear tab rep78 foreign, nofreq row matcell(freqs) matrix list freqs it's clear that tab only saved the actual counts in each cell, not the frequencies that were calculated and displayed with the nofreq row options. How do I save these relative frequencies in...

## Calculating difference in survival functions at time t in Stata

stata
I am estimating a Cox model in Stata using stcox. I estimate the model at stcox treat x1 x2 x3 I can then use the stcurve command to plot the survival function for treatment and control groups, with the x1, x2 and x3 variables set at their means by doing...

## Stata read date variable from MS SQL

sql-server,date,format,stata
I connected Stata via ODBC to a SQL Database. My problem is Stata reads date variables as Strings. In SQL they have a date format. How can I import a date variable in SQL as a date in Stata?...

## Recode the same value pattern for all variables in Stata

binary,stata,recode
In my dataset, I have a bunch of Yes/No type variables. For some reason, "Yes" is coded as 1 and "No" is coded as 2 instead of 0. Now I want to recode 2 to 0 based on the value label "No". How can I do it without having to...

## Reuse lines of code in Stata, Similar to JavaScript function?

stata
I have some lines of code in a Stata do-file that I would like to reuse/execute and different points in the do file. Similar to a JavaScript function... however I do not necessarily need an input variable. For example, I have some code: *code to reuse foreach x in test1...

## read Excel cells into Stata global as variables

excel,variables,import,stata
There are several panel datasets I'd like to join. The observations in these datasets are identified by an id variable and a variable identifying the time the observation was made. All datasets include some variables I need, some I don't need and never the same variables (excluding the id and...

## Defining groups within an interval

group,stata
In Stata with the following data ID Date 1 1/1/2010 2 1/1/2010 3 1/4/2010 4 1/5/2010 5 1/8/2010 6 1/10/2010 7 1/11/2010 I am trying to create a variable Dummyi that gives a unique variable to all of the IDs that occurred within three days (before or after) of the...

## When was a file used by another program

osx,performance,terminal,stata
I have a file with a series of 750 csv files. I wrote a Stata that runs through each of these files and performs a single task. The files are quite big, and my program has been running for more than 12 hours. Is there is a way to know...

## Dynamic forecasting (arima) with multiple regressors in Stata

dynamic,stata,forecasting
I have a small time series dataset, a sample of which is below: year AvgU5MR AvgPov AvgEnrol 2000 126.9307 41.0109 67.11833 2001 123.4138 39.9748 68.66798 2002 119.93 45.85194 65.82739 2003 116.4923 55.3706 69.17756 2004 113.1362 32.63662 70.83884 2005 109.9008 41.08603 75.35649 2006 106.816 43.45722 75.98755 2007 103.8878 19.19114 76.86299 2008...

## Stata — create rows of repetitive data

data,rows,stata
I have to create the following data in Stata several times (say 50,000) -- one below the other. There need to be two variables: (1) a counter going from 1 to 500; and (2) a string variable that is A for the first 25 observations and then B for observations...

## How to reshape long to wide data in Stata?

stata,data-management
I have the following data: id tests testvalue 1 A 4 1 B 5 1 C 3 1 D 3 2 A 3 2 B 3 3 C 3 3 D 4 4 A 3 4 B 5 4 A 1 4 B 3 I would like to change the...

## Stata: How to name a variable with a value

stata
I would like to create a variable that takes a name of a value in particular cell. For example my data set looks like this var1 count xx 1 xc 2 xv 3 xj 4 I would like to create 4 new variables that take names from the values of...

## How to export Stata xtcsd test results?

stata,panel-data
I would like to export results of cross section dependence tests for 12 panel data sets to a table in order to compare them with similar tests done with different software. Below is the regression and test instruction example from the xtcsd help page (unfortunately the example dataset is not...

## Why does accessing coefficients following estimation with nl require slightly different syntax than for other estimation commands?

syntax,stata
Following most estimation commands in Stata (e.g. reg, logit, probit, etc.) one may access the estimates using the _b[ParameterName] syntax (or the synonymous _coef[ParameterName]). For example: regress y x followed by di _b[x] will display the estimate of the coefficient of x. di _b[_cons] will display the coefficient of the...

## foreach using numlist of numbers with leading 0s

loops,for-loop,stata
In Stata, I am trying to use a foreach loop where I am looping over numbers from, say, 05-11. The problem is that I wish to keep the 0 as part of the value. I need to do this because the 0 appears in variable names. For example, I may...

## Generate samples with big dataset

stata
I have a large dataset and sine it's large I have to either split it or load one variable at a time. I have loaded the unique identifier id and I need to select at random 50 observations 100 times. I searched and I found sample and runiform to generate...

## Stata Regular expressions extracting numerical values

stata
I have some data that looks like this var1 h 01 .00 .0 abc d 1.0 .0 14.0abc 1,0.0 0.0 .0abc It should be noted that the last three alpha values are the same, and I am hoping to extract all the numerical values within the string. The code that...

## Stata: Aggregating by week

stata
I have a dataset that has a date variable with missing dates. var1 15sep2014 15sep2014 17sep2014 18sep2014 22sep2014 22sep2014 22sep2014 29sep2014 06oct2014 I aggregated the data using this command. gen week = week(var1) and the results look like this var 1 week 15sep2014 37 15sep2014 37 17sep2014 38 18sep2014 38...

## How to remove duplicate observations in Stata

duplicates,stata
Let's say I have the following data: id disease 1 0 1 1 1 0 2 0 2 1 3 0 4 0 4 0 I would like to remove the duplicate observations in Stata. For example id disease 1 1 2 1 3 0 4 0 For group id=1,...

## Separate country and year element of a variable

statistics,stata
I am attempting to use a dataset which has inconveniently merged country and year as the country variables. For example, for the US in 2006, the respective observation within the country variable would be US2006. Is there a way that I can separate the two and having done so, generate...

## Stata: Retrieve variable label in a macro

variables,stata,labels
I'm generating graphs for several variables using a do-file, I would like to be able to retrieve a variable label (so that I could use it for the graph title). In my dreams, something along those lines: sysuse auto, replace local pricelabel = varlab(price) display "Label for price variable is...

## Edit the x-axis ticks in Stata

graph,stata
Without using the graph editor, I would like to know if the there is a way (code-wise) to costumize the label of the ticks in your output graph with string characters. Say, for instance, I have four ticks in my x-axis (the following years): 2010, 2011, 2012 and 2013. If...

## to create highest & lowest quartiles of a variable in Stata

stata
This is the Stata code I used to divide a Winsorised & centred variable (num_exp, denoting number of experienced managers) based on 4 quartiles & thereafter to generate the highest & lowest quartile dummies thereof: egen quartile_num_exp = xtile(WC_num_exp), n(4) gen high_quartile_numexp = 1 if quartile_num_exp==4 (1433 missing values generated);...

## Graph weighted averages in Stata

stata
I'm doing an analysis of the Current Population Survey. I have a wage variable (wage), a time-series variable (qtr), and an observational weight (pworwgt). Each quarter has thousands of observations. I can easily make a table showing the weighted average wage in each quarter: table qtr [iw=pworwgt], contents(mean wage) What...

## How to point Stata at the same help file for two different commands?

stata
Stata's swilk command performs the Shapiro-Wilk test for non-normality. Likewise, Stata's sfrancia performs the Shapiro-Francia test for non-normality. However, both commands share the same help file swilk.sthlp. How does Stata know that a single help file serves for more than one command? For example, how does Stata know that swilk.sthlp...

## logging in Stata batch mode

batch-file,stata
Nice I use Windows batch files to execute various Stata do-files in a certain order. For example, when I do StataMP-64 /e do myDoFile1.do StataMP-64 /e do myDoFile2.do Stata executes myDoFile1.do and nicely routes its output to myDoFile1.log, and then executes myDoFile2.do and routes its output to myDoFile2.log. By the...

## How do I calculate the p-value of a one-tailed test in Stata?

stata
I have the following model: ln(MPG_{i}) = \beta _{0} + \beta {1}WEIGHT{i} + \beta {1}FOREIGN{i} + \beta {3}FOREIGN{i} * WEIGHT_{i} + \varepsilon_{i,j} I want to use the test command to test whether the coefficient on $\beta_{3} >0.5$ in STATA. I have used the following code and obtain this result: test...

## most efficient I/O setup between Stata and Python (Pandas)

python,csv,pandas,io,stata
I am using Stata to process some data, export the data in a csv file and load it in Python using the pandas read_csv function. The problem is that everything is so slow. Exporting from Stata to a csv file takes ages (exporting in the dta Stata format is much...

## Mutually exclusive conditionals don't work in Stata

if-statement,stata
I have some data on Stata with some variables like logTA and class. I have more than a thousand observations and logTA doesn't have any missing values. Data looks like this: logTA class -------- -------- . . 21.26871 . Now, what I want to do is to assign values to...

## 150x150 crosstab in stata, showing timeseries movement between categories

time-series,stata,crosstab
I'm a bit over my head here and I hope you can help me, or at least point me in the right direction. I got a massive dataset (5.8 mio observations per year, over 14 years), which deals with individuals' occupation over time. I need to sum up the changes...

## Stata odbc SQL multiple CTE in one query

sql,stata,common-table-expression
I'm trying to conduct a multiple CTE expression within the "odbc load, exec("WITH..." statement. I confirmed the two CTEs extract the information needed. However, Stata doesn't appear to like the use of two CTEs. I tried to separate the two, like SQL states, with a semicolon, but it returns "Incorrect...

## Pairwise correlations over rolling periods ignoring double calculations

matrix,stata,correlation
I am trying to compute pairwise correlations over rolling windows for n= 40 variables where all rolled pairwise correlations for 2 given variables are saved in a new variable. My dataset has the following structure: Date V1 V2 V3 . . . 01/01/2009 0.3 0.6 0.5 02/01/2009 0.1 0.5 0.2...

## Post e(b) vector from a custom program in Stata

stata
I wrote a program that computes a weighted regression and now I want my estimation results to be stored as an e(b) vector so that the bootstrap command can easily access the results, but I keep getting an error. My program looks like: capture program drop mytest program mytest, eclass...

## Linear algebra on dataframes in Stata vs R/Python [closed]

python,r,stata
In R (and I think Panda in Python), datasets roughly correspond to a list of vectors. Before applying linear algebra on a set of numeric variables in a dataset, one first need to convert them into a matrix (see for instance the code in R lm). This requires a deep...

## Stata: Storing only part of a FE regression output for graphing

stata
I am running a regression with two fixed effects categories (country and year, is economic macro data). Since I am using xtreg, one is autohid, but the other is a variable: xtreg fiveyearyg taxratio i.year if taxratiocut == 1, i(wbcode1) fe cluster(wbcode1) estimates store yi I am running a number...

## Stata - Cohort Study - Display Crude Risk Ratio (like r(rr_crude))

stata
I'm starting to use Stata 14. I'm trying to do some basic risk ratio analysis, but I don't know how to extract single results. Given the following code: clear all webuse ugdp cs case exposed [fw=pop], by(age) we get an output with four risk ratios, for both age categories, a...

## How to find maximum distance apart of values within a variable

stata
I create a working example dataset: input /// group value 1 3 1 2 1 3 2 4 2 6 2 7 3 4 3 4 3 4 3 4 4 17 4 2 5 3 5 5 5 12 end My goal is to figure out the maximum distance...

## how to: Separate a continuous variable by % proportions?

statistics,stata
I have a continuous variable (in this case, fees spent). How do I determine % spending cutoffs? i.e. how do I know what dollar amount separates the bottom 50% from the top 50% (similarly for any other % I may be interested in). Thank you very much for any help

## Create flexible ggplot2 theme that 1) makes the legend and titles larger, 2) will look good irrespective of the final dimensions

r,ggplot2,stata
I currently am one of the few R users in my company, which consists predominantly of stata users. One problem I've had with making plots using ggplot2 is that the default (theme_grey()) settings have much smaller axis font and a smaller legend than what is found in stata. Moreover, in...

## Dropping observations in Stata based on length?

stata
I have a string variable in Stata called Cod. I want to drop the observations such that Cod has less than 16 characters. Any suggestion?

## Marginal effect of interaction variable in probit regression using Stata

stata
I am running a probit regression with an interaction between one continuous and one dummy variable. The coefficient is displayed in the regression output but when I look at the marginal effects the interaction is missing. How can I get the marginal effect of the interaction variable? probit move_right c.real_income_change_percent##i.gender...

## Is there a shortkey or command for the “Clear results” function

stata,shortcut
There's no shortkey right to the Clear results menu item in the Edit menu. I would like to be able to clear the main screen without using the mouse. I'm using Stata 13 on Windows 2008 Server.