I am currently trying to interpret a set of results gleaned from running SimpleKMeans clustering on the Diabetes.arff data set. http://i.stack.imgur.com/T4eho.jpg - link to clustered instances (figure 1) So far I can understand that the clustered instances (figure 1) show that 500 variables have been classified as tested negative and...

I am currently working on an existing system that recommends items that are similar to previous items that the user has liked. It uses Alternating least squares Collaborative Filtering to find feature vectors of users and items. It then uses the feature vectors of the items and uses the cosine...

I'm newbie to machine learning and would like to understand what algorithm (Classification algorithm or co-relation algorithm?) to use in order to understand what is the relationship between one or more attributes. for example consider I have following set of attributes, Bill No, Bill Amount, Tip amount, Waiter Name and...

I'm using decision tree classifier from the scikit-learn package in python 3.4, and I want to get the corresponding leaf node id for each of my input data point. For example, my input might look like this: array([[ 5.1, 3.5, 1.4, 0.2], [ 4.9, 3. , 1.4, 0.2], [ 4.7,...

I'm trying to implement basic XOR NN with 1 hidden layer in Python. I'm not understanding the backprop algo specifically, so I've been stuck on getting delta2 and updating the weights...help? import numpy as np def sigmoid(x): return 1.0 / (1.0 + np.exp(-x)) vec_sigmoid = np.vectorize(sigmoid) theta1 = np.matrix(np.random.rand(3,3)) theta2...

I have three inputs: x=(A, B, C); and an output y. It needs to be the case that A+B+C=1 and 0<=A<=1, 0<=B<=1, 0<=C<=1. I want to find the x that maximizes y. My approach is to use a regression routine in scikit-learn to train a model f on my inputs...

I am trying to use RBFNN for point cloud to surface reconstruction but I couldn't understand what would be my feature vectors in RBFNN. Can any one please help me to understand this one. A goal to get to this: From inputs like this: ...

I am using DeepLearnToolbox to do CNN (Convolutional Neural Networks). I have computed my network successfully and I've seen my accuracy, but my question is: how can I query one single image into the network in order to get the label predicted? The final result that I want to get...

Usually to show that our results are not by chance we use significant test like t-test. But when we use 10-fold cross validation we learn&test our modals over chunks of dataset. I'm wondering does we need t-test when we have used 10-fold cross validation? To be more precise I mean...

I'm trying to understand brain.js. This is my code; it does not work. (Explaination of what I expect it to do below) <script src="https://cdn.rawgit.com/harthur/brain/gh-pages/brain-0.6.3.min.js"> <script> var net = new brain.NeuralNetwork(); net.train([{input: [0, 0], output: [0]}, {input: [0, 1], output: [1]}, {input: [1, 0], output: [1]}, {input: [1, 1], output: [0]}]);...

So, I have a set of texts I'd like to do some clustering analysis on. I've taken a Normalized Compression Distance between every text, and now I have basically built a complete graph with weighted edges that looks something like this: text1, text2, 0.539 text2, text3, 0.675 I'm having tremendous...

I would like to know what is the default setting for SVM of weka library?. As I know Weka wraps LIVSVM and the default parameter for LIBSVM is the rbf kernel, does this holds true for weka?.

everyone, Recently, I have been using extraTrees model in caret package. However, I noticed that the probability function for extraTrees model is set to NULL by using the following scripts: extratrees_para <- getModelInfo('extraTrees', regex = F)[[1]] extratrees_para$prob I noticed that in the original package of extraTress, it can be used...

Please help me to understand unit thing in neuron networks. From the book I understood that a unit in input layer represents an attribute of training tuple. However, it is left unclear, how exactly it does. Here is the diagram: There are two "thinking paths" about the input units. The...

This is my implementation of CostFunctionJ: function J = CostFunctionJ(X,y,theta) m = size(X,1); predictions = X*theta; sqrErrors =(predictions - y).^2; J = 1/(2*m)* sum(sqrErrors); But when I try to enter the command in MATLAB as: >> X = [1 1; 1 2; 1 3]; >> y = [1; 2; 3];...

I have a feature matrix with missing values NaNs, so I need to initialize those missing values first. However, the last line complains and throws out the following line of error: Expected sequence or array-like, got Imputer(axis=0, copy=True, missing_values='NaN', strategy='mean', verbose=0). I checked, it seems the reason is that train_fea_imputed...

Suppose I have a training set of (x,y)s, where x is the input example and y is the output tag, and y is a value (1....k) (k is the number of classes). When calculating the likelihood of the training set, should it be calculated for the whole training set (all...

I am working on a Word representation algorithm, similar to Word2Vec and GloVe.I have been asked to make it more dynamic, such that new words could be added to the vocabulary,and new documents could be submitted to the program even after the representations (vectors) have been created. The problem is,...

I am trying to build predictive models from text data. I built document-term matrix from the text data (unigram and bigram) and built different types of models on that (like svm, random forest, nearest neighbor etc). All the techniques gave decent results, but I want to improve the results. I...

Yesterday I posted a question about the first piece of the Back propagation aglorithm. Today I'm working to understand the hidden layer. Sorry for a lot of questions, I've read several websites and papers on the subject, but no matter how much I read, I still have a hard time...

My dataset looks something like this ['', 'ABCDH', '', '', 'H', 'HHIH', '', '', '', '', '', '', '', '', '', '', '', 'FECABDAI', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'FABHJJFFFFEEFGEE', 'FFFF', '', '', '', '', '', '', '',...

I used a Random Forest Classifier in Python and MATLAB. With 10 trees in the ensemble, I got ~80% accuracy in Python and barely 30% in MATLAB. This difference persisted even when MATLAB's random forests were grown with 100 or 200 tress. What could be the possible reason for this...

I try to load CSV file to numpy-array and use the array in LogisticRegression etc. Now, I am struggling with error is shown below: import numpy as np import pandas as pd from sklearn import preprocessing from sklearn.linear_model import LogisticRegression dataset = pd.read_csv('../Bookie_test.csv').values X = dataset[1:, 32:34] y = dataset[1:,...

I have a sentiment analysis task, for this Im using this corpus the opinions have 5 classes (very neg, neg, neu, pos, very pos), from 1 to 5. So I do the classification as follows: from sklearn.feature_extraction.text import TfidfVectorizer import numpy as np tfidf_vect= TfidfVectorizer(use_idf=True, smooth_idf=True, sublinear_tf=False, ngram_range=(2,2)) from sklearn.cross_validation...

I'm training a convolutional neural network using pylearn2 library and during all the ephocs, my validation error is consistently higher than the testing error. Is it possible? If so, in what kind of situations?

Using the R Kohonen package, I have obtained a "codes" plot which shows the codebook vectors. I would like to ask, shouldn't the codebook vectors of neighbouring nodes be similar? Why are the top 2 nodes on the left so different? Is there a way to organise it in a...

If it has one feature it's easy. Just graph it. One of the records there looks like (18, 15). Simple. But if we have multiple features that adds more dimensions to the graph, right? So how can you visualize your data set and determine whether or not linear regression is...

I am currently working on the MNIST handwritten digits classification. I built a single FeedForward network with the following structure: Inputs: 28x28 = 784 inputs Hidden Layers: A single hidden layer with 1000 neurons Output Layer: 10 neurons All the neurons have Sigmoid activation function. The reported class is the...

I'm trying to implement the backpropagation algoirthm into my own net. I understand the idea of the backprop agl, however, I'm not strong with math. I'm just working on the first half of the backprop alg, computing the output layer (not worrying about partial derivatives in the hidden layer(s) yet)....

I have written a sample code below import numpy as np import pandas as pd import csv from sklearn.feature_extraction.text import CountVectorizer from sklearn.feature_extraction.text import TfidfTransformer from sklearn.naive_bayes import MultinomialNB text = ["this is dog" , "this is bull dog" , "this is jack"] countVector = CountVectorizer() countmatrix = countVector.fit_transform(text) print...

I am thinking of training word2vec on huge large scale data of more than 10 TB+ in size on web crawl dump. I personally trained c implementation GoogleNews-2012 dump (1.5gb) on my iMac took about 3 hours to train and generate vectors (impressed with speed). I did not try python...

Using the below, I'm able to get both the raw predictions and the final predictions as a file: cat train.vw.txt | vw -c -k --passes 30 --ngram 5 -b 28 --l1 0.00000001 --l2 0.0000001 --loss_function=logistic -f model.vw --compressed --oaa 3 cat test.vw.txt | vw -t -i model.vw --link=logistic -r raw.txt...

I have a database from UCI Machine Learning (Abalone Database)and I need to separate the first column, which is a character, from the other columns, which are double. The second part I already have with this code: abaloneData = csvread('abalone.data',0,1); I tried a lot to gatter the first part to...

I am a starter in Python and Scikit-learn library. I currently need to work on a NLP project which firstly need to represent a large corpus by One-Hot Encoding. I have read Scikit-learn's documentations about the preprocessing.OneHotEncoder, however, it seems like it is not the understanding of my term. basically,...

I am trying to run this code for the Kaggle competition about Titanic for exercise. Its forfree and a beginner case. I am using the neuralnet package within R in this package. This is the train data from the website: train <- read.csv("train.csv") m <- model.matrix( ~ Survived + Pclass...

I'm trying to do simple neural network modelling, but the NNet result gives me poor result. It is simply ' output = 0.5 x input ' model that I want nnet model to learn, but the prediction shows all '1' as a result. What is wrong? library(neuralnet) traininginput <- as.data.frame(runif(50,min=1,max=100))...

Does there already exist a class in weka that takes care of voting/averaging different models, or do I have to come up with my own scheme? I already looked for that kind of functionality on the web, but I couldn't find any specific information....

I'm trying to analyse the paper ''Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis''. One component of the system described therein that I'm currently grappling with is the difference between Latent and Explicit Semantic Analysis. I've been writing up a document to encapsulate my understanding but it's somewhat, "cobbled together",...

I have a folder that contains many document in .txt of tourism reviews. I want to use the bag of words approach to convert them to some kind of numeric representation for machine learning (Latent Dirichlet Allocation - LDA) in c++ to train the system in recognizing the topic for...

I Wanted to try an example of ALS machine learning algorithm. And my code works fine, However I do not understand parameter rank used in algorithm. I have following code in java // Build the recommendation model using ALS int rank = 10; int numIterations = 10; MatrixFactorizationModel model =...

I'm taking Andrew Ng's ML class on Coursera and am a bit confused on gradient descent. The screenshot of the formula I'm confused by is here: In his second formula, why does he multiply by the value of the ith training example? I thought when you updated you were just...

I would like to implement bag of words representation for my project. I computed the codebook of visual words of images by using their features and descriptors.Then, I obtained cluster centers using k-means. For the bag of words representation part, it is asked that you should use manually labeled segments...

Let Me simplify this question. If I run opencv MLP train and classify consecutively on the same data, I get different results. Meaning, if I put training a new mlp on the same train data and classifying on the same test data in a for loop, each iteration will give...

I have a categorical dataset, I am performing spectral clustering on it. But I do not get very good output. I choose the eigen vectors corresponding to largest eigen values as my centroids for k-means. Please find below the process I follow: 1. Create a symmetric similarity matrix (m*m) using...

I am working on machine learning and prediction for about a month. I have tried IBM watson with bluemix, amazon machine learning and predictionIO. What I want to do is to predict a text field based on other fields. My csv file have four text fields named Question,Summary,Description,Answer and about...

My project takes very long time at running, I made threads and distributed data and processing on my processor cores, But, still takes long time, I tried to optimize the code as i can, How can i distribute computing on multiple laptops?

I'm using the pretrained imagenet model provided along the caffe (CNN) library ('bvlc_reference_caffenet.caffemodel'). I can output a 1000 dim vector of object scores for any images using this model. However I don't know what the actual object categories are. Did someone find a file, where the corresponding object categories are...

I am working on a project, in which we have to extract the patterns(User behavior) from the device log data. Device log contains different device actions with a timestamp like when the devices was switched on or when they was switched off. For example: When a person enters a room....

I have a dataset with 158 rows and 10 columns. I try to build multiple linear regression model and try to predict future value. I used GridSearchCV for tunning parameters. Here is my GridSearchCV and Regression function : def GridSearch(data): X_train, X_test, y_train, y_test = cross_validation.train_test_split(data, ground_truth_data, test_size=0.3, random_state =...

I have a number of CSV files with columns such as gender, age, diagnosis, etc. Currently, they are coded as such: ID, gender, age, diagnosis 1, male, 42, asthma 1, male, 42, anxiety 2, male, 19, asthma 3, female, 23, diabetes 4, female, 61, diabetes 4, female, 61, copd The...

Is it possible to do the following in scikit-learn? We train an estimator A using the given mapping from features to targets, then we use the same data (or mapping) to train another estimator B, then we use outputs of the two trained estimators (A and B) as inputs for...

How do you usually get precision, recall and f-measure from a model created in Vowpal Wabbit on a classification problem? Are there any available scripts or programs that are commonly used for this with vw's output? To make a minimal example using the following data in playtennis.txt : 2 |...

I have a 500x1000 feature vector and principal component analysis says that over 99% of total variance is covered by the first component. So I replace 1000 dimension point by 1 dimension point giving 500x1 feature vector(using Matlab's pca function). But, my classifier accuracy which was initially around 80% with...

The paperBoat format claims to provide a better dataset representation for machine learning routines. I'd like to understand the nature of its optimization. I understand that using an integer representation for model attributes means a faster processing of the data set, what are the other improvements. Also, how to tune...

This has become quite a frustrating question, but I've asked in the Coursera discussions and they won't help. Below is the question: I've gotten it wrong 6 times now. How do I normalize the feature? Hints are all I'm asking for. I'm assuming x_2^(2) is the value 5184, unless I...

Is there any way to reduce the dimension of the following features from 2D coordinate (x,y) to one dimension? ...

I have tried the following code. img=imread("test1.jpg"); gimg=rgb2gray(img); imshow(gimg); bw = gimg < 255; L = bwlabel(bw); imshow(label2rgb(L, @jet, [.7 .7 .7])) s = regionprops(L, 'PixelIdxList', 'PixelList'); s(1).PixelList(1:4, :) idx = s(1).PixelIdxList; sum_region1 = sum(gimg(idx)); x = s(1).PixelList(:, 1); y = s(1).PixelList(:, 2); xbar = sum(x .* double(gimg(idx))) / sum_region1...

I am doing a project on Writer Identification. I want to extract HOG features from Line Images of Arabic Handwriting. And than use Gaussian Mixture Model for Classification. The link to the database containing the line Images is : http://khatt.ideas2serve.net/ So my questions are as follows; There are three folders...

I am trying to classify text data, with Scikit Learn, with the method shown here. (http://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html) except I am loading my own dataset. I'm getting results, but I want to find the accuracy of the classification results. from sklearn.datasets import load_files text_data = load_files("C:/Users/USERNAME/projects/machine_learning/my_project/train", description=None, categories=None, load_content=True, shuffle=True, encoding='latin-1', decode_error='ignore',...

What machine learning classifiers exists which provide after the learning phase a weight vector? I know about SVM, logistic regression, perceptron and LDA. Are there more? My goal is to use these weight vector to draw an importance map....

I've implemented a back-propagating neural network and trained it on my data. The data alternates between sentences in English & Africaans. The neural network is supposed to identify the language of the input. The structure of the Network is 27 *16 * 2 The input layer has 26 inputs for...

I want to configurate the QN-Minimizer from Stanford Core NLP Lib to get nearly similar optimization results as scipy optimize L-BFGS-B implementation or get a standard L-BFSG configuration that is suitable for the most things. I set the standard paramters as follow: The python example I want to copy: scipy.optimize.minimize(neuralNetworkCost,...

Here's my code: from scipy.io import wavfile fName = 'file.wav' fs, signal = wavfile.read(fName) signal = signal / max(abs(signal)) # scale signal assert min(signal) >= -1 and max(signal) <= 1 And the error is: Traceback (most recent call last): File = "vad.py", line 10, in <module> signal = signal /...

I have created an experiment and successfully published a web service which requires inputs. When I schedule this web service as a HTTPS POST JOB it shows this error Http Action - Response from host 'ussouthcentral.services.azureml.net': 'BadRequest' Response Headers: x-ms-request-id: 51fb1d34-5bc7-4832-ad9f-b19826468ea0 Date: Mon, 11 May 2015 11:02:01 GMT Server: Microsoft-HTTPAPI/2.0...

I would like to predict multiple dependent variables using multiple predictors. If I understood correctly, in principle one could make a bunch of linear regression models that each predict one dependent variable, but if the dependent variables are correlated, it makes more sense to use multivariate regression. I would like...

I'm using RandomForest from Weka 3.7.11 which in turn is bagging Weka's RandomTree. My input attributes are numerical and the output attribute(label) is also numerical. When training the RandomTree, K attributes are chosen at random for each node of the tree. Several splits based on those attributes are attempted and...

I am difficulty understanding how both classifiers work under the hood. So far I have deduced NaiveBayes predicts an outcome by 'uncoupling' multiple pieces of evidence, and to treating each of piece of evidence as independent. But when compared to another classification algorithm like J48 or RandomTree, how exactly is...

I'm using the scikit-learn machine learning library (Python) for a machine learning project. One of the algorithms I'm using is the Gaussian Naive Bayes implementation. One of the attributes of the GaussianNB() function is the following: class_prior_ : array, shape (n_classes,) I want to alter the class prior manually since...

I am doing text classification. I have around 32K (spam & ham ) files. import numpy as np import pandas as pd import sklearn.datasets as dataset from sklearn.feature_extraction.text import CountVectorizer from sklearn.feature_extraction.text import TfidfTransformer from sklearn.naive_bayes import BernoulliNB from sklearn.preprocessing import LabelEncoder import re from sklearn.feature_selection import SelectKBest from sklearn.feature_selection...

I have a classification task with 4 classes which I solve with machine learning classifiers (SVM etc.). Which statistical measures can be used for 4 classes? I will for sure use p-value (with permutation test) but I need some more. Some interesting measures are true positive rate, true negative rate,...

Having this: text = word_tokenize("The quick brown fox jumps over the lazy dog") And running: nltk.pos_tag(text) I get: [('The', 'DT'), ('quick', 'NN'), ('brown', 'NN'), ('fox', 'NN'), ('jumps', 'NNS'), ('over', 'IN'), ('the', 'DT'), ('lazy', 'NN'), ('dog', 'NN')] This is incorrect. The tags for quick brown lazy in the sentence should be:...

In Batch execution API help page of Azure Machine Learning there are three different URI’s Submit Job (Response is Job ID) Start Job ( we need to use the above Job ID in this URI) Get Status or Result (we need to use the above Job ID in this URI)...

How flexible or supportive is the Amazon Machine Learning platform for sentiment analysis and text analytics?

I'm currently working on a neural network that should have N parameters in input. Each parameters can have M different values (discrete values), let's say {A,B,C,…,M}. It also has a discrete number of outputs. How can I create my inputs from this situation? Should I have N×M inputs (having 0 or 1 as value), or should I think of a different...

In Matlab help section, there's a very helpful example to solve classification problems under "Digit Classification Using HOG Features". You can easily execute the full script by clikcing on 'Open this example'. However, I'm wondering if there's a way to store the output of "fitcecoc" in a database so you...

I want to compare the ROCK clustering algorithm to a distance based algorithm. Let say we have (m) training examples and (n) features ROCK: From what I understand ROCK does is that 1. It calculates a similarity matrix (m*m) using Jaccard cooficients. 2. Then a threshold value is provided by...

I am very new to scikit and have a usecase which I am trying to solve through scikit python library. I have CSV file like this: Label , userId , message , user_like,user_dislike 1 , 1, "this is good message", 4,5 0, 1, "This is bad message",3,4 1, 2, "this...

I am using sklearn's nearest neighbor for a classification problem. My features are patches of the shape (3600, 2, 5). For example: a = [[5,5,5,5,5], [5,5,5,5,5]] b = [[5,5,5,5,5], [5,5,5,5,5]] features = [] for i in xrange(len(a)): features.append([a[i], b[i]]) #I have 3600 of these in reality. neigh = KNeighborsClassifier() neigh.fit(train_features,...

I have read the following sentence: Functional MRI data are high dimensional compared to the number of samples (usually 50000 voxels for 1000 samples). In this setting, machine learning algorithm can perform poorly. However, a simple statistical test can help reducing the number of voxels. The Student’s t-test (scipy.stats.ttest_ind) performs...

I have Convolutional Neural Network model described in YAML. When I run pylearn2's train.py, I see that only one core of four is used. Is there a way to run training multi-threaded? Yeah, may be it's rather a Theano question. I followed this http://deeplearning.net/software/theano/tutorial/multi_cores.html Theano tutorial about multi cores support,...

What type of learning is Andrew Ng using in his neural network excercise on Coursera? Is it stochastic gradient descent or batch learning? I'm a little confused right now......

I got multiple curves from different sensor but all attached in the same moving object. Now I want to extract features from it , let's say I have cut 0-10 as window1 , so in window1 I got 5 graphs ,each graph represents one sensor in a particular position, each...

What's the best way to use nominal value as opposed to real or boolean ones for being included in a subset of feature vector for machine learning? Should I map each nominal value to real value? For example, if I want to make my program to learn a predictive model...

I recently made my first neural network simulation which also uses a genetic evolution algorithm. It's simple software that just simulates simple organisms collecting food, and they evolve, as one would expect, from organisms with random and sporadic movements into organisms with controlled, food-seeking movements. Since this kind of organism...

I have the following data-set on python import pandas as pd bcw = pd.read_csv('http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data', header=None) Lines like 24 have missing values: 1057013,8,4,5,1,2,?,7,3,1,4 On column 7, there is a '?', and I want to drop this line. How can I achieve this? ...

i wanted to train a new hmm model, by means of Poisson observations that are the only thing i know. I'm using the mhsmm package for R. The first thing that bugs me is the initialization of the model, in the examples is: J<-3 initial <- rep(1/J,J) P <- matrix(1/J,...

when we are training our modal we usually use MLE to estimate our modal. I know it means that the most probable data for such a learned modal is our training set. But I'm wondering if its probability match 1 exactly or not?

I'm using Scikit-learn to apply machine learning algorithm on my datasets. Sometimes I need to have the probabilities of labels/classes instated of the labels/classes themselves. Instead of having Spam/Not Spam as labels of emails, I wish to have only for example: 0.78 probability a given email is Spam. For such...

I want to predict if user click on link or not. I use logistic regression. I have got a lot of data for start. But on 23 examples i didn't get this exception. If i try 3mio data the i get this exception The following is my code, adapted from...

I have a set of items that are each described by 10 precise numbers n1, .., n10. I would like to learn the coefficients k1, .., k10 that should be associated to those numbers to rank them according to my criteria. In that purpose I created a web application (in...

I have user profiles with the following attributes. U={age,sex,country,race} What is the best way to find similarity between two users? for example I have following 2 users. u1={25,M,USA,White} u2={30,M,UK,black} I have searched and found Cosine similarity are mentioned a lot. Is it good for my problem or any other suggestions....

I'm looking for someone who know if it is possible to train a neural network to tell if the image provided live up to the trained expectation. Let's say we have a neural network which trained to read a 800x800 pixel color image. Therefore, I will have 1,920,000 input and...

I am trying to implement Multiclass classification in WEKA. I have lot of rows, say bank transactions, and one is tagged as Food,Medicine,Rent,etc. I want to develop a classifier which can be trained with the previous data I have and predict the class it can belong to for future transactions....

I followed the tutorial here in order to implement Logistic Regression using theano. The aforementioned tutorial uses SciPy's fmin_cg optimisation procedure. Among the important argument to the aforementioned function are: f the object/cost function to be minimised, x0 a user supplied initial guess of the parameters, fprime a function which...

Define the type of concrete day (working / holiday) and length of working day (in some countries it can differ depends on day of week and official holidays) is real problem for software that interacts with banks / state institutions. Also it can be very useful in any kind of...

I am using decision stumps with a BaggingClassifier to classify some data: def fit_ensemble(attributes,class_val,n_estimators): # max depth is 1 decisionStump = DecisionTreeClassifier(criterion = 'entropy', max_depth = 1) ensemble = BaggingClassifier(base_estimator = decisionStump, n_estimators = n_estimators, verbose = 3) return ensemble.fit(attributes,class_val) def predict_all(fitted_classifier, instances): for i, instance in enumerate(instances): instances[i] =...

I got a problem in understending the difference between MLP and SLP. I know that in the first case the MLP has more than one layer (the hidden layers) and that the neurons got a non linear activation function, like the logistic function (needed for the gradient descent). But I...

How do I find whether model type for all models at once? I know how to access this info if I know the algo name, e.g.: library('Caret') tail(name(getModelInfo())) [1] "widekernelpls" "WM" "wsrf" "xgbLinear" "xgbTree" [6] "xyf" getModelInfo()$xyf$type [1] "Classification" "Regression" How do I see the $type for all the algos...

NAME PRICE SALES VIEWS AVG_RATING VOTES COMMENTS Module 1 $12.00 69 12048 5 3 26 Module 2 $24.99 12 52858 5 1 14 Module 3 $10.00 1 1381 -1 0 0 Module 4 $22.99 46 57841 5 8 24 ................. So, Let's say I have statistics of sales. I...

I want to use the resolution time in minutes and the client description of the tickets on Zendesk to predict the resolution time of next tickets based on their description. I will use only this two values, but the description is a large text. I searched about hashing the feature...