FAQ Database Discussion Community


What are units in neural network (backpropagation algorithm)

machine-learning,artificial-intelligence,neural-network,classification,backpropagation
Please help me to understand unit thing in neuron networks. From the book I understood that a unit in input layer represents an attribute of training tuple. However, it is left unclear, how exactly it does. Here is the diagram: There are two "thinking paths" about the input units. The...

How to cluster a set of strings?

machine-learning,cluster-analysis,k-means,hierarchical-clustering
My dataset looks something like this ['', 'ABCDH', '', '', 'H', 'HHIH', '', '', '', '', '', '', '', '', '', '', '', 'FECABDAI', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'FABHJJFFFFEEFGEE', 'FFFF', '', '', '', '', '', '', '',...

Cluster centroids on simplekmeans clustering

machine-learning,cluster-analysis,weka
I am currently trying to interpret a set of results gleaned from running SimpleKMeans clustering on the Diabetes.arff data set. http://i.stack.imgur.com/T4eho.jpg - link to clustered instances (figure 1) So far I can understand that the clustered instances (figure 1) show that 500 variables have been classified as tested negative and...

Can the validation error of a dataset be higher than the test error during the whole process of training a neural network?

machine-learning,computer-vision,neural-network,deep-learning,pylearn
I'm training a convolutional neural network using pylearn2 library and during all the ephocs, my validation error is consistently higher than the testing error. Is it possible? If so, in what kind of situations?

HOG Feature Extraction of Arabic Line Images

image-processing,machine-learning,computer-vision
I am doing a project on Writer Identification. I want to extract HOG features from Line Images of Arabic Handwriting. And than use Gaussian Mixture Model for Classification. The link to the database containing the line Images is : http://khatt.ideas2serve.net/ So my questions are as follows; There are three folders...

Scikit : How to resolve this usecase

python,machine-learning,scikit-learn
I am very new to scikit and have a usecase which I am trying to solve through scikit python library. I have CSV file like this: Label , userId , message , user_like,user_dislike 1 , 1, "this is good message", 4,5 0, 1, "This is bad message",3,4 1, 2, "this...

Which spark MLIB algorithm to use?

machine-learning,apache-spark
I'm newbie to machine learning and would like to understand what algorithm (Classification algorithm or co-relation algorithm?) to use in order to understand what is the relationship between one or more attributes. for example consider I have following set of attributes, Bill No, Bill Amount, Tip amount, Waiter Name and...

Machine learning predict text fields based on text fields

machine-learning,amazon,prediction,ibm-watson,predictionio
I am working on machine learning and prediction for about a month. I have tried IBM watson with bluemix, amazon machine learning and predictionIO. What I want to do is to predict a text field based on other fields. My csv file have four text fields named Question,Summary,Description,Answer and about...

How configure Stanford QNMinimizer to get similar results as scipy.optimize.minimize L-BFGS-B

java,optimization,machine-learning,scipy,stanford-nlp
I want to configurate the QN-Minimizer from Stanford Core NLP Lib to get nearly similar optimization results as scipy optimize L-BFGS-B implementation or get a standard L-BFSG configuration that is suitable for the most things. I set the standard paramters as follow: The python example I want to copy: scipy.optimize.minimize(neuralNetworkCost,...

How to find algo type(regression,classification) in Caret in R for all algos at once?

r,machine-learning,classification,regression,caret
How do I find whether model type for all models at once? I know how to access this info if I know the algo name, e.g.: library('Caret') tail(name(getModelInfo())) [1] "widekernelpls" "WM" "wsrf" "xgbLinear" "xgbTree" [6] "xyf" getModelInfo()$xyf$type [1] "Classification" "Regression" How do I see the $type for all the algos...

T-test for multiple classes (>2)

matlab,machine-learning,feature-extraction,p-value
I have read the following sentence: Functional MRI data are high dimensional compared to the number of samples (usually 50000 voxels for 1000 samples). In this setting, machine learning algorithm can perform poorly. However, a simple statistical test can help reducing the number of voxels. The Student’s t-test (scipy.stats.ttest_ind) performs...

How to select only complete in a panda data.frame

python,machine-learning,data.frame
I have the following data-set on python import pandas as pd bcw = pd.read_csv('http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data', header=None) Lines like 24 have missing values: 1057013,8,4,5,1,2,?,7,3,1,4 On column 7, there is a '?', and I want to drop this line. How can I achieve this? ...

Clustering Categorical data-set with distance based approach

python,machine-learning,cluster-analysis,k-means
I want to compare the ROCK clustering algorithm to a distance based algorithm. Let say we have (m) training examples and (n) features ROCK: From what I understand ROCK does is that 1. It calculates a similarity matrix (m*m) using Jaccard cooficients. 2. Then a threshold value is provided by...

Supervised machine learning for several coefficient

machine-learning,neural-network
I have a set of items that are each described by 10 precise numbers n1, .., n10. I would like to learn the coefficients k1, .., k10 that should be associated to those numbers to rank them according to my criteria. In that purpose I created a web application (in...

What is the advantage of the paperboat format in performance optimization of ML?

optimization,machine-learning,dataset
The paperBoat format claims to provide a better dataset representation for machine learning routines. I'd like to understand the nature of its optimization. I understand that using an integer representation for model attributes means a faster processing of the data set, what are the other improvements. Also, how to tune...

Normalize a feature in this table

machine-learning,normalization
This has become quite a frustrating question, but I've asked in the Coursera discussions and they won't help. Below is the question: I've gotten it wrong 6 times now. How do I normalize the feature? Hints are all I'm asking for. I'm assuming x_2^(2) is the value 5184, unless I...

How do you know if a data set is right for linear regression if it has multiple features?

machine-learning,statistics,linear-regression
If it has one feature it's easy. Just graph it. One of the records there looks like (18, 15). Simple. But if we have multiple features that adds more dimensions to the graph, right? So how can you visualize your data set and determine whether or not linear regression is...

NNet simple modeling

r,machine-learning,nnet
I'm trying to do simple neural network modelling, but the NNet result gives me poor result. It is simply ' output = 0.5 x input ' model that I want nnet model to learn, but the prediction shows all '1' as a result. What is wrong? library(neuralnet) traininginput <- as.data.frame(runif(50,min=1,max=100))...

Object categories of pretrained imagenet model in caffe

machine-learning,neural-network,deep-learning,caffe,matcaffe
I'm using the pretrained imagenet model provided along the caffe (CNN) library ('bvlc_reference_caffenet.caffemodel'). I can output a 1000 dim vector of object scores for any images using this model. However I don't know what the actual object categories are. Did someone find a file, where the corresponding object categories are...

Amazon Machine Learning for sentiment analysis

amazon-web-services,machine-learning,nlp,sentiment-analysis
How flexible or supportive is the Amazon Machine Learning platform for sentiment analysis and text analytics?

Self organising map visualisation result interpretation

r,machine-learning,cluster-analysis,som,unsupervised-learning
Using the R Kohonen package, I have obtained a "codes" plot which shows the codebook vectors. I would like to ask, shouldn't the codebook vectors of neighbouring nodes be similar? Why are the top 2 nodes on the left so different? Is there a way to organise it in a...

what does Maximum Likelihood Estimation exactly mean?

machine-learning,mle
when we are training our modal we usually use MLE to estimate our modal. I know it means that the most probable data for such a learned modal is our training set. But I'm wondering if its probability match 1 exactly or not?

Why can't I calculate CostFunction J

matlab,machine-learning
This is my implementation of CostFunctionJ: function J = CostFunctionJ(X,y,theta) m = size(X,1); predictions = X*theta; sqrErrors =(predictions - y).^2; J = 1/(2*m)* sum(sqrErrors); But when I try to enter the command in MATLAB as: >> X = [1 1; 1 2; 1 3]; >> y = [1; 2; 3];...

How do I plug distance data into scipy's agglomerative clustering methods?

numpy,machine-learning,scipy,hierarchical-clustering
So, I have a set of texts I'd like to do some clustering analysis on. I've taken a Normalized Compression Distance between every text, and now I have basically built a complete graph with weighted edges that looks something like this: text1, text2, 0.539 text2, text3, 0.675 I'm having tremendous...

Train neural network to determine color image quality [closed]

machine-learning,artificial-intelligence,neural-network
I'm looking for someone who know if it is possible to train a neural network to tell if the image provided live up to the trained expectation. Let's say we have a neural network which trained to read a 800x800 pixel color image. Therefore, I will have 1,920,000 input and...

Python NLTK pos_tag not returning the correct part-of-speech tag

python,machine-learning,nlp,nltk,pos-tagger
Having this: text = word_tokenize("The quick brown fox jumps over the lazy dog") And running: nltk.pos_tag(text) I get: [('The', 'DT'), ('quick', 'NN'), ('brown', 'NN'), ('fox', 'NN'), ('jumps', 'NNS'), ('over', 'IN'), ('the', 'DT'), ('lazy', 'NN'), ('dog', 'NN')] This is incorrect. The tags for quick brown lazy in the sentence should be:...

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any()

python-2.7,numpy,machine-learning,speech
Here's my code: from scipy.io import wavfile fName = 'file.wav' fs, signal = wavfile.read(fName) signal = signal / max(abs(signal)) # scale signal assert min(signal) >= -1 and max(signal) <= 1 And the error is: Traceback (most recent call last): File = "vad.py", line 10, in <module> signal = signal /...

Classification with scikit-learn KNN using multi-dimensional features (input dimension error)

python,machine-learning,scikit-learn
I am using sklearn's nearest neighbor for a classification problem. My features are patches of the shape (3600, 2, 5). For example: a = [[5,5,5,5,5], [5,5,5,5,5]] b = [[5,5,5,5,5], [5,5,5,5,5]] features = [] for i in xrange(len(a)): features.append([a[i], b[i]]) #I have 3600 of these in reality. neigh = KNeighborsClassifier() neigh.fit(train_features,...

Nominal valued dataset in machine learning

machine-learning,data-mining
What's the best way to use nominal value as opposed to real or boolean ones for being included in a subset of feature vector for machine learning? Should I map each nominal value to real value? For example, if I want to make my program to learn a predictive model...

Programming the Back Propagation Algorithm

java,machine-learning,neural-network
I'm trying to implement the backpropagation algoirthm into my own net. I understand the idea of the backprop agl, however, I'm not strong with math. I'm just working on the first half of the backprop alg, computing the output layer (not worrying about partial derivatives in the hidden layer(s) yet)....

SciPy Conjugate Gradient Optimisation not invoking callback method after each iteration

python,optimization,machine-learning,scipy,theano
I followed the tutorial here in order to implement Logistic Regression using theano. The aforementioned tutorial uses SciPy's fmin_cg optimisation procedure. Among the important argument to the aforementioned function are: f the object/cost function to be minimised, x0 a user supplied initial guess of the parameters, fprime a function which...

Output the subset of instances used to train each base_estimator of a BaggingClassifier

python,pandas,machine-learning,scikit-learn
I am using decision stumps with a BaggingClassifier to classify some data: def fit_ensemble(attributes,class_val,n_estimators): # max depth is 1 decisionStump = DecisionTreeClassifier(criterion = 'entropy', max_depth = 1) ensemble = BaggingClassifier(base_estimator = decisionStump, n_estimators = n_estimators, verbose = 3) return ensemble.fit(attributes,class_val) def predict_all(fitted_classifier, instances): for i, instance in enumerate(instances): instances[i] =...

Spectral clustering with Similarity matrix constructed by jaccard coefficient

machine-learning,cluster-analysis,pca,eigenvalue,eigenvector
I have a categorical dataset, I am performing spectral clustering on it. But I do not get very good output. I choose the eigen vectors corresponding to largest eigen values as my centroids for k-means. Please find below the process I follow: 1. Create a symmetric similarity matrix (m*m) using...

does we need significant test when we use 10-fold cross validation?

machine-learning,cross-validation
Usually to show that our results are not by chance we use significant test like t-test. But when we use 10-fold cross validation we learn&test our modals over chunks of dataset. I'm wondering does we need t-test when we have used 10-fold cross validation? To be more precise I mean...

Does Andrew Ng's ANN from Coursera use SGD or batch learning?

machine-learning,neural-network
What type of learning is Andrew Ng using in his neural network excercise on Coursera? Is it stochastic gradient descent or batch learning? I'm a little confused right now......

How to specify the prior probability for scikit-learn's Naive Bayes

python,syntax,machine-learning,scikit-learn
I'm using the scikit-learn machine learning library (Python) for a machine learning project. One of the algorithms I'm using is the Gaussian Naive Bayes implementation. One of the attributes of the GaussianNB() function is the following: class_prior_ : array, shape (n_classes,) I want to alter the class prior manually since...

Is it Item based or content based Collaborative filtering?

machine-learning,recommendation-engine,collaborative-filtering,predictionio,content-based-retrieval
I am currently working on an existing system that recommends items that are similar to previous items that the user has liked. It uses Alternating least squares Collaborative Filtering to find feature vectors of users and items. It then uses the feature vectors of the items and uses the cosine...

Extract Patterns from the device log data

machine-learning,pattern-recognition,bayesian-networks
I am working on a project, in which we have to extract the patterns(User behavior) from the device log data. Device log contains different device actions with a timestamp like when the devices was switched on or when they was switched off. For example: When a person enters a room....

Does scikit-learn perform “real” multivariate regression (multiple dependent variables)?

python,machine-learning,scikit-learn,linear-regression,multivariate-testing
I would like to predict multiple dependent variables using multiple predictors. If I understood correctly, in principle one could make a bunch of linear regression models that each predict one dependent variable, but if the dependent variables are correlated, it makes more sense to use multivariate regression. I would like...

Opencv mlp Same Data Different Results

c++,opencv,machine-learning,neural-network,weight
Let Me simplify this question. If I run opencv MLP train and classify consecutively on the same data, I get different results. Meaning, if I put training a new mlp on the same train data and classifying on the same test data in a for loop, each iteration will give...

Global database of “Business days” for several years and countries

java,scala,calendar,machine-learning
Define the type of concrete day (working / holiday) and length of working day (in some countries it can differ depends on day of week and official holidays) is real problem for software that interacts with banks / state institutions. Also it can be very useful in any kind of...

How to train Word2vec on very large datasets?

python,c,machine-learning,word2vec
I am thinking of training word2vec on huge large scale data of more than 10 TB+ in size on web crawl dump. I personally trained c implementation GoogleNews-2012 dump (1.5gb) on my iMac took about 3 hours to train and generate vectors (impressed with speed). I did not try python...

I'm not sure how to interpret accuracy of this classification with Scikit Learn

python,machine-learning,scikit-learn,classification,text-classification
I am trying to classify text data, with Scikit Learn, with the method shown here. (http://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html) except I am loading my own dataset. I'm getting results, but I want to find the accuracy of the classification results. from sklearn.datasets import load_files text_data = load_files("C:/Users/USERNAME/projects/machine_learning/my_project/train", description=None, categories=None, load_content=True, shuffle=True, encoding='latin-1', decode_error='ignore',...

Enforcing that inputs sum to 1 and are contained in the unit interval in scikit-learn

python,numpy,encoding,machine-learning,scikit-learn
I have three inputs: x=(A, B, C); and an output y. It needs to be the case that A+B+C=1 and 0<=A<=1, 0<=B<=1, 0<=C<=1. I want to find the x that maximizes y. My approach is to use a regression routine in scikit-learn to train a model f on my inputs...

Why is there only one hidden layer in a neural network?

machine-learning,neural-network,genetic-algorithm,evolutionary-algorithm
I recently made my first neural network simulation which also uses a genetic evolution algorithm. It's simple software that just simulates simple organisms collecting food, and they evolve, as one would expect, from organisms with random and sporadic movements into organisms with controlled, food-seeking movements. Since this kind of organism...

NaiveBayes, J48 and RandomTree in layman's terms

machine-learning,weka
I am difficulty understanding how both classifiers work under the hood. So far I have deduced NaiveBayes predicts an outcome by 'uncoupling' multiple pieces of evidence, and to treating each of piece of evidence as independent. But when compared to another classification algorithm like J48 or RandomTree, how exactly is...

Theano/Pylearn2. How to parallelize training?

python,multithreading,machine-learning,theano
I have Convolutional Neural Network model described in YAML. When I run pylearn2's train.py, I see that only one core of four is used. Is there a way to run training multi-threaded? Yeah, may be it's rather a Theano question. I followed this http://deeplearning.net/software/theano/tutorial/multi_cores.html Theano tutorial about multi cores support,...

What splitting criterion does Random Tree in Weka 3.7.11 use for numerical attributes?

machine-learning,weka,random-forest,decision-tree
I'm using RandomForest from Weka 3.7.11 which in turn is bagging Weka's RandomTree. My input attributes are numerical and the output attribute(label) is also numerical. When training the RandomTree, K attributes are chosen at random for each node of the tree. Several splits based on those attributes are attempted and...

Dimension Reduction of Feature in Machine Learning

machine-learning
Is there any way to reduce the dimension of the following features from 2D coordinate (x,y) to one dimension? ...

Finding a corresponding leaf node for each data point in a decision tree (scikit-learn)

python,machine-learning,scikit-learn,decision-tree
I'm using decision tree classifier from the scikit-learn package in python 3.4, and I want to get the corresponding leaf node id for each of my input data point. For example, my input might look like this: array([[ 5.1, 3.5, 1.4, 0.2], [ 4.9, 3. , 1.4, 0.2], [ 4.7,...

FeedForward Neural Network: Using a single Network with multiple output neurons for many classes

machine-learning,neural-network,backpropagation,feed-forward
I am currently working on the MNIST handwritten digits classification. I built a single FeedForward network with the following structure: Inputs: 28x28 = 784 inputs Hidden Layers: A single hidden layer with 1000 neurons Output Layer: 10 neurons All the neurons have Sigmoid activation function. The reported class is the...

bag-of-words approach / tools / library for C++?

c++,machine-learning,text-processing,text-extraction,lda
I have a folder that contains many document in .txt of tourism reviews. I want to use the bag of words approach to convert them to some kind of numeric representation for machine learning (Latent Dirichlet Allocation - LDA) in c++ to train the system in recognizing the topic for...

How to use Rs neuralnet package in a Kaggle competition about Titanic

r,machine-learning,neural-network
I am trying to run this code for the Kaggle competition about Titanic for exercise. Its forfree and a beginner case. I am using the neuralnet package within R in this package. This is the train data from the website: train <- read.csv("train.csv") m <- model.matrix( ~ Survived + Pclass...

brain.js: XOR example does not work

javascript,machine-learning,neural-network
I'm trying to understand brain.js. This is my code; it does not work. (Explaination of what I expect it to do below) <script src="https://cdn.rawgit.com/harthur/brain/gh-pages/brain-0.6.3.min.js"> <script> var net = new brain.NeuralNetwork(); net.train([{input: [0, 0], output: [0]}, {input: [0, 1], output: [1]}, {input: [1, 0], output: [1]}, {input: [1, 1], output: [0]}]);...

(Java) Partial Derivatives for Back Propagation of Hidden Layer

java,machine-learning,artificial-intelligence,neural-network
Yesterday I posted a question about the first piece of the Back propagation aglorithm. Today I'm working to understand the hidden layer. Sorry for a lot of questions, I've read several websites and papers on the subject, but no matter how much I read, I still have a hard time...

How do I get the raw predictions (-r) from Vowpal Wabbit when running in daemon mode?

machine-learning,vowpalwabbit
Using the below, I'm able to get both the raw predictions and the final predictions as a file: cat train.vw.txt | vw -c -k --passes 30 --ngram 5 -b 28 --l1 0.00000001 --l2 0.0000001 --loss_function=logistic -f model.vw --compressed --oaa 3 cat test.vw.txt | vw -t -i model.vw --link=logistic -r raw.txt...

Which statistical measures for 4 class classification?

machine-learning,statistics,classification,multilabel-classification
I have a classification task with 4 classes which I solve with machine learning classifiers (SVM etc.). Which statistical measures can be used for 4 classes? I will for sure use p-value (with permutation test) but I need some more. Some interesting measures are true positive rate, true negative rate,...

What is rank in ALS machine Learning Algorithm in Apache Spark Mllib

algorithm,machine-learning,apache-spark,mllib
I Wanted to try an example of ALS machine learning algorithm. And my code works fine, However I do not understand parameter rank used in algorithm. I have following code in java // Build the recommendation model using ALS int rank = 10; int numIterations = 10; MatrixFactorizationModel model =...

Scikit: Remove feature row if present in all documents

python,machine-learning,scikit-learn
I am doing text classification. I have around 32K (spam & ham ) files. import numpy as np import pandas as pd import sklearn.datasets as dataset from sklearn.feature_extraction.text import CountVectorizer from sklearn.feature_extraction.text import TfidfTransformer from sklearn.naive_bayes import BernoulliNB from sklearn.preprocessing import LabelEncoder import re from sklearn.feature_selection import SelectKBest from sklearn.feature_selection...

sklearn Imputer() returned features does not fit in fit function

python,machine-learning,scikit-learn
I have a feature matrix with missing values NaNs, so I need to initialize those missing values first. However, the last line complains and throws out the following line of error: Expected sequence or array-like, got Imputer(axis=0, copy=True, missing_values='NaN', strategy='mean', verbose=0). I checked, it seems the reason is that train_fea_imputed...

Using Python to find correlation pairs

python,pandas,machine-learning,data-mining
NAME PRICE SALES VIEWS AVG_RATING VOTES COMMENTS Module 1 $12.00 69 12048 5 3 26 Module 2 $24.99 12 52858 5 1 14 Module 3 $10.00 1 1381 -1 0 0 Module 4 $22.99 46 57841 5 8 24 ................. So, Let's say I have statistics of sales. I...

Which classifiers provide weight vector?

machine-learning,classification,multilabel-classification
What machine learning classifiers exists which provide after the learning phase a weight vector? I know about SVM, logistic regression, perceptron and LDA. Are there more? My goal is to use these weight vector to draw an importance map....

How to interpret scikit's learn confusion matrix and classification report?

machine-learning,nlp,scikit-learn,svm,confusion-matrix
I have a sentiment analysis task, for this Im using this corpus the opinions have 5 classes (very neg, neg, neu, pos, very pos), from 1 to 5. So I do the classification as follows: from sklearn.feature_extraction.text import TfidfVectorizer import numpy as np tfidf_vect= TfidfVectorizer(use_idf=True, smooth_idf=True, sublinear_tf=False, ngram_range=(2,2)) from sklearn.cross_validation...

Is likelihood calculated over the whole training set or a single example?

machine-learning,probability,mle,language-model
Suppose I have a training set of (x,y)s, where x is the input example and y is the output tag, and y is a value (1....k) (k is the number of classes). When calculating the likelihood of the training set, should it be calculated for the whole training set (all...

What is the default setting for SVM in weka?

machine-learning,weka,svm,libsvm
I would like to know what is the default setting for SVM of weka library?. As I know Weka wraps LIVSVM and the default parameter for LIBSVM is the rbf kernel, does this holds true for weka?.

Octave: Kmeans clustering not working on an image matrix

matlab,image-processing,machine-learning,octave,k-means
I have tried the following code. img=imread("test1.jpg"); gimg=rgb2gray(img); imshow(gimg); bw = gimg < 255; L = bwlabel(bw); imshow(label2rgb(L, @jet, [.7 .7 .7])) s = regionprops(L, 'PixelIdxList', 'PixelList'); s(1).PixelList(1:4, :) idx = s(1).PixelIdxList; sum_region1 = sum(gimg(idx)); x = s(1).PixelList(:, 1); y = s(1).PixelList(:, 2); xbar = sum(x .* double(gimg(idx))) / sum_region1...

How to schedule Batch Execution Service of Azure Machine Learning in Azure Scheduler

azure,machine-learning,batch-processing,azure-scheduler,azure-machine-learning
In Batch execution API help page of Azure Machine Learning there are three different URI’s Submit Job (Response is Job ID) Start Job ( we need to use the above Job ID in this URI) Get Status or Result (we need to use the above Job ID in this URI)...

how to programmatically create ensembles in weka?

java,machine-learning,weka
Does there already exist a class in weka that takes care of voting/averaging different models, or do I have to come up with my own scheme? I already looked for that kind of functionality on the web, but I couldn't find any specific information....

Neural Network Error oscillating with each training example

machine-learning,artificial-intelligence,neural-network,backpropagation
I've implemented a back-propagating neural network and trained it on my data. The data alternates between sentences in English & Africaans. The neural network is supposed to identify the language of the input. The structure of the Network is 27 *16 * 2 The input layer has 26 inputs for...

Distribute computing on multiple devices

java,machine-learning,bigdata,distributed-computing
My project takes very long time at running, I made threads and distributed data and processing on my processor cores, But, still takes long time, I tried to optimize the code as i can, How can i distribute computing on multiple laptops?

Classification fit get ValueError: setting an array element with a sequence

python,machine-learning
I want to predict if user click on link or not. I use logistic regression. I have got a lot of data for start. But on 23 examples i didn't get this exception. If i try 3mio data the i get this exception The following is my code, adapted from...

How to write the body for HTTPS POST job in Azure Schedular without Azure Blob

azure,machine-learning,azure-scheduler,azure-machine-learning
I have created an experiment and successfully published a web service which requires inputs. When I schedule this web service as a HTTPS POST JOB it shows this error Http Action - Response from host 'ussouthcentral.services.azureml.net': 'BadRequest' Response Headers: x-ms-request-id: 51fb1d34-5bc7-4832-ad9f-b19826468ea0 Date: Mon, 11 May 2015 11:02:01 GMT Server: Microsoft-HTTPAPI/2.0...

scikit : Wrong prediction for this case

python,machine-learning,scikit-learn
I have written a sample code below import numpy as np import pandas as pd import csv from sklearn.feature_extraction.text import CountVectorizer from sklearn.feature_extraction.text import TfidfTransformer from sklearn.naive_bayes import MultinomialNB text = ["this is dog" , "this is bull dog" , "this is jack"] countVector = CountVectorizer() countmatrix = countVector.fit_transform(text) print...

Vowpal Wabbit - precision recall f-measure

machine-learning,vowpalwabbit,precision-recall
How do you usually get precision, recall and f-measure from a model created in Vowpal Wabbit on a classification problem? Are there any available scripts or programs that are commonly used for this with vw's output? To make a minimal example using the following data in playtennis.txt : 2 |...

How avoid error “TypeError: invalid data type for einsum” in Python

python,python-2.7,numpy,pandas,machine-learning
I try to load CSV file to numpy-array and use the array in LogisticRegression etc. Now, I am struggling with error is shown below: import numpy as np import pandas as pd from sklearn import preprocessing from sklearn.linear_model import LogisticRegression dataset = pd.read_csv('../Bookie_test.csv').values X = dataset[1:, 32:34] y = dataset[1:,...

Hmm training with multiple observations and mhsmm package in R

r,machine-learning,hidden-markov-models
i wanted to train a new hmm model, by means of Poisson observations that are the only thing i know. I'm using the mhsmm package for R. The first thing that bugs me is the initialization of the model, in the examples is: J<-3 initial <- rep(1/J,J) P <- matrix(1/J,...

Finding similarity between two user profiles

machine-learning,recommendation-engine,user-profile,cosine-similarity
I have user profiles with the following attributes. U={age,sex,country,race} What is the best way to find similarity between two users? for example I have following 2 users. u1={25,M,USA,White} u2={30,M,UK,black} I have searched and found Cosine similarity are mentioned a lot. Is it good for my problem or any other suggestions....

Prediction based on large texts using Vowpal Webbit

machine-learning,vowpalwabbit
I want to use the resolution time in minutes and the client description of the tickets on Zendesk to predict the resolution time of next tickets based on their description. I will use only this two values, but the description is a large text. I searched about hashing the feature...

Text analysis : What after term-document matrix? [closed]

r,machine-learning,nlp,svm,text-mining
I am trying to build predictive models from text data. I built document-term matrix from the text data (unigram and bigram) and built different types of models on that (like svm, random forest, nearest neighbor etc). All the techniques gave decent results, but I want to improve the results. I...

Feature Vectors in Radial Basis Function Network

machine-learning,neural-network,point-clouds
I am trying to use RBFNN for point cloud to surface reconstruction but I couldn't understand what would be my feature vectors in RBFNN. Can any one please help me to understand this one. A goal to get to this: From inputs like this: ...

Multi-Class Classification in WEKA

machine-learning,scikit-learn,classification,weka,libsvm
I am trying to implement Multiclass classification in WEKA. I have lot of rows, say bank transactions, and one is tagged as Food,Medicine,Rent,etc. I want to develop a classifier which can be trained with the previous data I have and predict the class it can belong to for future transactions....

XOR neural network backprop

python,machine-learning,neural-network
I'm trying to implement basic XOR NN with 1 hidden layer in Python. I'm not understanding the backprop algo specifically, so I've been stuck on getting delta2 and updating the weights...help? import numpy as np def sigmoid(x): return 1.0 / (1.0 + np.exp(-x)) vec_sigmoid = np.vectorize(sigmoid) theta1 = np.matrix(np.random.rand(3,3)) theta2...

R: Converting Column Values into Their Own Binary Encoded Columns

r,machine-learning,sparse-matrix,reshape2
I have a number of CSV files with columns such as gender, age, diagnosis, etc. Currently, they are coded as such: ID, gender, age, diagnosis 1, male, 42, asthma 1, male, 42, anxiety 2, male, 19, asthma 3, female, 23, diabetes 4, female, 61, diabetes 4, female, 61, copd The...

Multilayer Perceptron replaced with Single Layer Perceptron

math,machine-learning,neural-network,linear-algebra,perceptron
I got a problem in understending the difference between MLP and SLP. I know that in the first case the MLP has more than one layer (the hidden layers) and that the neurons got a non linear activation function, like the logistic function (needed for the gradient descent). But I...

Feature extraction from multiple curves

machine-learning,svm,feature-extraction,feature-selection
I got multiple curves from different sensor but all attached in the same moving object. Now I want to extract features from it , let's say I have cut 0-10 as window1 , so in window1 I got 5 graphs ,each graph represents one sensor in a particular position, each...

Why does not GridSearchCV give best score ? - Scikit Learn

python,r,machine-learning,scikit-learn,regression
I have a dataset with 158 rows and 10 columns. I try to build multiple linear regression model and try to predict future value. I used GridSearchCV for tunning parameters. Here is my GridSearchCV and Regression function : def GridSearch(data): X_train, X_test, y_train, y_test = cross_validation.train_test_split(data, ground_truth_data, test_size=0.3, random_state =...

NLP - Word Representations

machine-learning,nlp,artificial-intelligence
I am working on a Word representation algorithm, similar to Word2Vec and GloVe.I have been asked to make it more dynamic, such that new words could be added to the vocabulary,and new documents could be submitted to the program even after the representations (vectors) have been created. The problem is,...

One Hot Encoding for representing corpus sentences in python

python,machine-learning,nlp,scikit-learn,one-hot
I am a starter in Python and Scikit-learn library. I currently need to work on a NLP project which firstly need to represent a large corpus by One-Hot Encoding. I have read Scikit-learn's documentations about the preprocessing.OneHotEncoder, however, it seems like it is not the understanding of my term. basically,...

Bag of Words Representation [closed]

matlab,image-processing,machine-learning,computer-vision,image-segmentation
I would like to implement bag of words representation for my project. I computed the codebook of visual words of images by using their features and descriptors.Then, I obtained cluster centers using k-means. For the bag of words representation part, it is asked that you should use manually labeled segments...

Cannot specify probability function for extraTrees model in caret package

r,machine-learning,r-caret
everyone, Recently, I have been using extraTrees model in caret package. However, I noticed that the probability function for extraTrees model is set to NULL by using the following scripts: extratrees_para <- getModelInfo('extraTrees', regex = F)[[1]] extratrees_para$prob I noticed that in the original package of extraTress, it can be used...

MATLAB - Need to make a cell array from a text column file

matlab,machine-learning,text-files
I have a database from UCI Machine Learning (Abalone Database)and I need to separate the first column, which is a character, from the other columns, which are double. The second part I already have with this code: abaloneData = csvread('abalone.data',0,1); I tried a lot to gatter the first part to...

Random Forest Classifier Matlab v/s Python

python,matlab,machine-learning,statistics,random-forest
I used a Random Forest Classifier in Python and MATLAB. With 10 trees in the ensemble, I got ~80% accuracy in Python and barely 30% in MATLAB. This difference persisted even when MATLAB's random forests were grown with 100 or 200 tress. What could be the possible reason for this...

difference between Latent and Explicit Semantic Analysis

machine-learning,nlp
I'm trying to analyse the paper ''Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis''. One component of the system described therein that I'm currently grappling with is the difference between Latent and Explicit Semantic Analysis. I've been writing up a document to encapsulate my understanding but it's somewhat, "cobbled together",...

Can one train estimators in a scikit-learn pipeline simultaneously?

python,machine-learning,scipy,scikit-learn,pipeline
Is it possible to do the following in scikit-learn? We train an estimator A using the given mapping from features to targets, then we use the same data (or mapping) to train another estimator B, then we use outputs of the two trained estimators (A and B) as inputs for...

Using the predict_proba() function of RandomForestClassifier in the safe and right way

python,machine-learning,scikit-learn,random-forest
I'm using Scikit-learn to apply machine learning algorithm on my datasets. Sometimes I need to have the probabilities of labels/classes instated of the labels/classes themselves. Instead of having Spam/Not Spam as labels of emails, I wish to have only for example: 0.78 probability a given email is Spam. For such...

Having trouble creating my Neural Network inputs

machine-learning,artificial-intelligence,neural-network
I'm currently working on a neural network that should have N parameters in input. Each parameters can have M different values (discrete values), let's say {A,B,C,…,M}. It also has a discrete number of outputs. How can I create my inputs from this situation? Should I have N×M inputs (having 0 or 1 as value), or should I think of a different...

Predict label of one single image using DeepLearnToolbox

matlab,machine-learning,computer-vision,deep-learning,conv-neural-network
I am using DeepLearnToolbox to do CNN (Convolutional Neural Networks). I have computed my network successfully and I've seen my accuracy, but my question is: how can I query one single image into the network in order to get the label predicted? The final result that I want to get...

Matlab: How can I store the output of “fitcecoc” in a database

matlab,machine-learning,computer-vision,classification,matlab-cvst
In Matlab help section, there's a very helpful example to solve classification problems under "Digit Classification Using HOG Features". You can easily execute the full script by clikcing on 'Open this example'. However, I'm wondering if there's a way to store the output of "fitcecoc" in a database so you...

Why does classifier accuracy drop after PCA, even though 99% of the total variance is covered?

matlab,machine-learning,pca
I have a 500x1000 feature vector and principal component analysis says that over 99% of total variance is covered by the first component. So I replace 1000 dimension point by 1 dimension point giving 500x1 feature vector(using Matlab's pca function). But, my classifier accuracy which was initially around 80% with...

Basic Machine Learning: Linear Regression and Gradient Descent

machine-learning,gradient-descent
I'm taking Andrew Ng's ML class on Coursera and am a bit confused on gradient descent. The screenshot of the formula I'm confused by is here: In his second formula, why does he multiply by the value of the ith training example? I thought when you updated you were just...