So I have two models and I want to calculate these statistics. Is there any package to calculate them in Stata?

And, if I am not mistaken. $$ R^2_{predicted} = 1 - \frac{RESET}{ESS} $$.

Answer:

```
clear all
program define press, rclass
syntax varlist(fv) [if] [in] ///
[fweight aweight pweight iweight] , ///
[nodots]
gettoken y x : varlist
marksample touse
preserve
quietly keep if `touse'
if "`weight'" != "" {
local wgt "[`weight'`exp']"
}
tempvar pred temp prs
quietly gen double `pred' = .
if "`dots'" == "" _dots 0, title(Jackknife replications) reps(`=_N')
forvalues i = 1/`=_N' {
capture {
reg `y' `x' `wgt' if _n != `i'
predict double `temp'
replace `pred' = `temp' in `i'
drop `temp'
}
if "`dots'" == "" _dots `i' `=_rc > 0'
}
quietly gen double `prs' = (`y' - `pred')^2
sum `prs', meanonly
if "`dots'" == "" di _n _n
di as txt "The predicted residual sum of squares is " as result r(sum)
return scalar press = r(sum)
restore
end
sysuse auto
press price mpg i.foreign
```

stata

I have a situation where I need to import all the files in a directory and append them. My code is this local files : dir "C:\Users\xx" files "*.xls" local n: word count `files' tokenize ``files'' cd "C:\Users\xx" forval k =1/`n'{ foreach file in `files' { import excel "`file'", sheet("Time...

stata

I would like to label how many unique clusters of data are in a longitudinal dataset and have each member of the cluster carry the cluster count. Distinct clusters are those sharing a set of dates within an id. The order of those distinct cluster relative to previous (earlier) clusters...

weka,k-means,prediction

i'm using weka to do some text mining, i'm a little bit confused so i'm here to ask how can i ( with a set of comments that are in a some way classified as: notes, status of work, not conformity, warning) predict if a new comment belong to a...

stata

I can use tabout var1 var2...style(csv) to export a matrix-like object with var1 on one axis, var2 on the other, and the frequencies that each combination of the values var1 and var2 occur. Is it possible to use tabout or some other commmand (preferably native to Stata) to do something...

stata

I'd like to add a directory containing .ado files to the ado-path. This directory contains several subdirectories, corresponding to different projects. The .ado files are in these subdirectories. However, when I type adopath + directory, commands in the .ado files are not recognized by Stata. I need to enter adopath...

stata

I create a working example dataset: input /// group value 1 3 1 2 1 3 2 4 2 6 2 7 3 4 3 4 3 4 3 4 4 17 4 2 5 3 5 5 5 12 end My goal is to figure out the maximum distance...

r,ggplot2,stata

I currently am one of the few R users in my company, which consists predominantly of stata users. One problem I've had with making plots using ggplot2 is that the default (theme_grey()) settings have much smaller axis font and a smaller legend than what is found in stata. Moreover, in...

stata

I have a string variable in Stata called Cod. I want to drop the observations such that Cod has less than 16 characters. Any suggestion?

macros,stata

I'm stuck on a tricky data management question, which I need to do in Stata. I'm using Stata 13.1. I have 40+ datasets I need to work on using a subset of variables that is different in each dataset. I can't include the data or specific analysis I'm doing for...

python,csv,pandas,io,stata

I am using Stata to process some data, export the data in a csv file and load it in Python using the pandas read_csv function. The problem is that everything is so slow. Exporting from Stata to a csv file takes ages (exporting in the dta Stata format is much...

osx,performance,terminal,stata

I have a file with a series of 750 csv files. I wrote a Stata that runs through each of these files and performs a single task. The files are quite big, and my program has been running for more than 12 hours. Is there is a way to know...

stata

I am running a regression with two fixed effects categories (country and year, is economic macro data). Since I am using xtreg, one is autohid, but the other is a variable: xtreg fiveyearyg taxratio i.year if taxratiocut == 1, i(wbcode1) fe cluster(wbcode1) estimates store yi I am running a number...

stata

I am trying to use categorical variables more efficiently. Suppose I have a categorical variable phone, which has the following values: ---------------------- phone | Freq. ----------+----------- Landline | 223 Mobile | 49,297 Both | 1,308 I want to run a command something like this: sum x if phone == Mobile...

group,stata

In Stata with the following data ID Date 1 1/1/2010 2 1/1/2010 3 1/4/2010 4 1/5/2010 5 1/8/2010 6 1/10/2010 7 1/11/2010 I am trying to create a variable Dummyi that gives a unique variable to all of the IDs that occurred within three days (before or after) of the...

statistics,genetic-algorithm,prediction,generalization

I've written a GA to model a handful of stocks (4) over a period of time (5 years). It's impressive how quickly the GA can find an optimal solution to the training data, but I am also aware that this is mainly due to it's tendency to over-fit in the...

stata

Stata's swilk command performs the Shapiro-Wilk test for non-normality. Likewise, Stata's sfrancia performs the Shapiro-Francia test for non-normality. However, both commands share the same help file swilk.sthlp. How does Stata know that a single help file serves for more than one command? For example, how does Stata know that swilk.sthlp...

plot,stata

I'm working with the ciplot graphing module for Stata and am encountering a problem with the alignment of bars when I use the by() option. Here's a trivial example demonstrating the issue: webuse citytemp, clear ciplot heatdd cooldd, by(region) horizontal recast(conn) So, the graph shows means and confidence intervals for...

stata

I am exploring an effect that I think will vary by GDP levels, from a data set that has, vertically, country and year (1960 to 2015), so each country label is on 55 rows. I ran sort year by year: egen yrank = xtile(rgdp), nquantiles(4) which tags every year row...

stata,mixed-models

I want to create a regression table (using esttab) from a mixed-effects regression estimated via xtmixed in Stata, but I want the output without the random effects parameters. How can I drop the random effects parameters from the output table? E.g., in the case of two variables... xtmixed (Dependent Variable)...

dynamic,stata,forecasting

I have a small time series dataset, a sample of which is below: year AvgU5MR AvgPov AvgEnrol 2000 126.9307 41.0109 67.11833 2001 123.4138 39.9748 68.66798 2002 119.93 45.85194 65.82739 2003 116.4923 55.3706 69.17756 2004 113.1362 32.63662 70.83884 2005 109.9008 41.08603 75.35649 2006 106.816 43.45722 75.98755 2007 103.8878 19.19114 76.86299 2008...

stata

If I run this code: sysuse auto, clear tab rep78 foreign, nofreq row matcell(freqs) matrix list freqs it's clear that tab only saved the actual counts in each cell, not the frequencies that were calculated and displayed with the nofreq row options. How do I save these relative frequencies in...

stata

I have a following data where I am trying to replace income for years 1980 and 1981 with that of year 1979 (no change for year 1978) [for each state]. state year size income 1 1978 1 1000 1 1978 1.5 100 1 1978 2 5000 1 1979 1 3779.736...

loops,input,stata

I am trying to create empty observations for a number of fruits and a number of years. I thought it would be very straightforward, but Stata acts like I hit the "Break" key after the first time it enters the loop. What am I doing wrong? clear all gen fruit...

stata

I wrote a program that computes a weighted regression and now I want my estimation results to be stored as an e(b) vector so that the bootstrap command can easily access the results, but I keep getting an error. My program looks like: capture program drop mytest program mytest, eclass...

machine-learning,amazon,prediction,ibm-watson,predictionio

I am working on machine learning and prediction for about a month. I have tried IBM watson with bluemix, amazon machine learning and predictionIO. What I want to do is to predict a text field based on other fields. My csv file have four text fields named Question,Summary,Description,Answer and about...

select,stata,vlookup

In Stata, I have a dataset with two variables: id and var, and say 1000 observations. The variable var is of type float and takes distinct values for all observations. I would like to keep only the three observations where var is either the minimum of var, the maximum of...

panel,stata

I have a large strongly unbalanced panel in Stata, where each cross section only has a few observations, and the rest is NA (.). I want to overwrite all non NA observations that are not the last 20 non NA observations, in each cross section. I'm not sure how to...

stata

I have some lines of code in a Stata do-file that I would like to reuse/execute and different points in the do file. Similar to a JavaScript function... however I do not necessarily need an input variable. For example, I have some code: *code to reuse foreach x in test1...

replace,stata

I have two variables: patient id and date. Many patients on my database are duplicated. I want to keep the duplication, but apply to each patient the earliest appearing date. Ex: ID Date 1 8/9/07 1 6/3/07 1 11/15/08 2 8/6/06 2 8/6/06 2 11/5/09 would become ID Date 1...

r,maps,prediction,cross-validation,maxent

I hope I have come to the right forum. I'm an ecologist making species distribution models using the maxent (version 3.3.3, http://www.cs.princeton.edu/~schapire/maxent/) function in R, through the dismo package. I have used the argument "replicates = 5" which tells maxent to do a 5-fold cross-validation. When running maxent from the...

python,time-series,scikit-learn,regression,prediction

Long time lurker first time poster. I have data that roughly follows a y=sin(time) distribution, but also depends on other variables than time. In terms of correlations, since the target y-variable oscillates there is almost zero statistical correlation with time, but y obviously depends very strongly on time. The goal...

binary,stata,recode

In my dataset, I have a bunch of Yes/No type variables. For some reason, "Yes" is coded as 1 and "No" is coded as 2 instead of 0. Now I want to recode 2 to 0 based on the value label "No". How can I do it without having to...

if-statement,stata

I have some data on Stata with some variables like logTA and class. I have more than a thousand observations and logTA doesn't have any missing values. Data looks like this: logTA class -------- -------- . . 21.26871 . Now, what I want to do is to assign values to...

r,data.table,stata,code-translation

Just came across a .do file that I need to translate into R because I don't have a Stata license; my Stata is rusty, so can someone confirm that the code is doing what I think it is? Here's the Stata code: collapse (min) MinPctCollected = PctCollected /// (mean) AvgPctCollected...

stata

How could I create a variable by dividing it by an IQR? I have done it through a long way as follows. Sample data and code is the following: use http://www.ats.ucla.edu/stat/stata/notes/hsb2, clear foreach var of varlist read-socst { egen `var'75 = pctile(`var'), p(75) egen `var'25 = pctile(`var'), p(25) gen `var'q...

python,r,stata

In R (and I think Panda in Python), datasets roughly correspond to a list of vectors. Before applying linear algebra on a set of numeric variables in a dataset, one first need to convert them into a matrix (see for instance the code in R lm). This requires a deep...

stata,shortcut

There's no shortkey right to the Clear results menu item in the Edit menu. I would like to be able to clear the main screen without using the mouse. I'm using Stata 13 on Windows 2008 Server.

stata

I am running a probit regression with an interaction between one continuous and one dummy variable. The coefficient is displayed in the regression output but when I look at the marginal effects the interaction is missing. How can I get the marginal effect of the interaction variable? probit move_right c.real_income_change_percent##i.gender...

matrix,stata,correlation

I am trying to compute pairwise correlations over rolling windows for n= 40 variables where all rolled pairwise correlations for 2 given variables are saved in a new variable. My dataset has the following structure: Date V1 V2 V3 . . . 01/01/2009 0.3 0.6 0.5 02/01/2009 0.1 0.5 0.2...

types,macros,global,local,stata

I want to generate a variable with lagged year depending on the year stored in "$S_DATE" macro I have stored the year in date macro: . local date substr("$S_DATE",8,.) . display `date' 2015 And I want to generate the new variable with gen start_year =`date'- y_passed where y_passed is a...

r,statistics,probability,prediction,calibration

I work on calibration of probabilities. I'm using a probability mapping approach called generalized additive models. The algorithm I wrote is: probMapping = function(x, y, datax, datay) { if(length(x) < length(y))stop("train smaller than test") if(length(datax) < length(datay))stop("train smaller than test") datax$prob = x # trainset: data and raw probabilities datay$prob...

statistics,stata

I have a continuous variable (in this case, fees spent). How do I determine % spending cutoffs? i.e. how do I know what dollar amount separates the bottom 50% from the top 50% (similarly for any other % I may be interested in). Thank you very much for any help

grouping,social-networking,stata

[I copied part of the below example from a separate post and changed it to suit my specific needs] pos_1 pos_2 2 4 2 5 1 2 3 9 4 2 9 3 The above is read as person_2 is connected to person_4,...,person_4 is connected to person_2, and person_9 is...

excel,variables,import,stata

There are several panel datasets I'd like to join. The observations in these datasets are identified by an id variable and a variable identifying the time the observation was made. All datasets include some variables I need, some I don't need and never the same variables (excluding the id and...

r,table,prediction

With my data (2 variables, Xt and Yt), I performed a Linear model in R Commander, which is named as LinearModel.1 Then, I wanted to predict the values that Yt would acquire when using different values of Xt, as in their 95% of confidence limits. After the linear model was...

excel,stata

I'm trying to get the Stata command putexcel to give me summary statistics for a continuous variable, grouped by a categorical variable, one after another, in the same worksheet. This should be repeated/looped through a number of years, where each year has its own sheet. This poses two problems: using...

sorting,stata

I have a dataset where numeric variable VARSORT takes only 3 values: 10, 20 and 30 (there are no missings). I would like to sort observations based on VARSORT but where the custom sort order would be the following : 20 first, then 10, then 30. Is it possible to...

variables,stata,labels

I'm generating graphs for several variables using a do-file, I would like to be able to retrieve a variable label (so that I could use it for the graph title). In my dreams, something along those lines: sysuse auto, replace local pricelabel = varlab(price) display "Label for price variable is...

stata

I would like to create a variable that takes a name of a value in particular cell. For example my data set looks like this var1 count xx 1 xc 2 xv 3 xj 4 I would like to create 4 new variables that take names from the values of...

stata

I'm starting to use Stata 14. I'm trying to do some basic risk ratio analysis, but I don't know how to extract single results. Given the following code: clear all webuse ugdp cs case exposed [fw=pop], by(age) we get an output with four risk ratios, for both age categories, a...