i'm using weka to do some text mining, i'm a little bit confused so i'm here to ask how can i ( with a set of comments that are in a some way classified as: notes, status of work, not conformity, warning) predict if a new comment belong to a...

Long time lurker first time poster. I have data that roughly follows a y=sin(time) distribution, but also depends on other variables than time. In terms of correlations, since the target y-variable oscillates there is almost zero statistical correlation with time, but y obviously depends very strongly on time. The goal...

So I have two models and I want to calculate these statistics. Is there any package to calculate them in Stata? PRESS statistic (wiki) And, if I am not mistaken. $$ R^2_{predicted} = 1 - \frac{RESET}{ESS} $$. ...

I am working on machine learning and prediction for about a month. I have tried IBM watson with bluemix, amazon machine learning and predictionIO. What I want to do is to predict a text field based on other fields. My csv file have four text fields named Question,Summary,Description,Answer and about...

quick question on prediction. The value I’m trying to predict is either 0 or 1 (it is set as numeric, not as a factor) so when I run my random forest: fit <- randomForest(PredictValue ~ <variables>, data=trainData, ntree=50) and predict: pred<-predict(fit, testData) all my predictions are between 0 and 1...

I have a "csv " file which contains the user id, the book he/she has read, the rating for each book. I want to use Lenskit to predict a book rating for a user. For example, the user A has read 3 books,A,B,C, I want to predicate the rating for...

New to R. Looking to limit the range of values that can be predicted. df.Train <- data.frame(S=c(1,2,2,2,1),L=c(1,2,3,3,1),M=c(400,450,400,700,795),V=c(423,400,555,600,800),G=c(4,3.2,2,2.7,3.4), stringsAsFactors=FALSE) m.Train <- lm(G~S+L+M+V,data=df.Train) df.Test <- data.frame(S=c(1,2,1,2,1),L=c(1,2,3,1,1),M=c(400,450,500,800,795),V=c(423,475,555,600,555), stringsAsFactors=FALSE) round(predict(m.Train, df.Test, type="response"),digits=1) #seq(0,4,.1) #Predicted values should fall in this range I've experimented with the...

I've written a GA to model a handful of stocks (4) over a period of time (5 years). It's impressive how quickly the GA can find an optimal solution to the training data, but I am also aware that this is mainly due to it's tendency to over-fit in the...

I work on calibration of probabilities. I'm using a probability mapping approach called generalized additive models. The algorithm I wrote is: probMapping = function(x, y, datax, datay) { if(length(x) < length(y))stop("train smaller than test") if(length(datax) < length(datay))stop("train smaller than test") datax$prob = x # trainset: data and raw probabilities datay$prob...

I hope I have come to the right forum. I'm an ecologist making species distribution models using the maxent (version 3.3.3, http://www.cs.princeton.edu/~schapire/maxent/) function in R, through the dismo package. I have used the argument "replicates = 5" which tells maxent to do a 5-fold cross-validation. When running maxent from the...

With my data (2 variables, Xt and Yt), I performed a Linear model in R Commander, which is named as LinearModel.1 Then, I wanted to predict the values that Yt would acquire when using different values of Xt, as in their 95% of confidence limits. After the linear model was...