FAQ Database Discussion Community

Pandas Ordinal Variable Treatment in Similarity Calculation

I have a Pandas version 0.15.2 dataframe as below with an ordinal column rate, represented initially as strings. My end goal is to find the similarities of different rows in the df (in reality I have a lot more rows and more ordinal variables). Currently, to factorize() while enforcing the...

Algorithm to compute similarity of two strings in javascript [closed]

Is there any text similarity algorithm in javascript? I want to compare too essays to determine how similar they are. I was thinking about edit distance, but I don't know how to translate it into percentage.

trying to understand LSH through the sample python code

the concise python code i study for is here Question A @ line 8 i do not really understand the syntax meaning for "res = res << 1" for the purpose of "get_signature" Question B @ line 49 (SOLVED BY myself through another Q&A) "xor = r1^r2" does not really...

How to group sets by similarity in contained elements

I am using Python 2.7. I have routes which are composed of arrays of nodes that connect to each other. The nodes are identified by a string key, but for ease I will use numbers: sample_route = [1,2,3,4,7] #obviously over-simplified; real things would be about 20-40 elements long I will...

Measuring the distance between two relative frequency vectors

I am having a problem in choosing a adequate distance function to measure the similarity (dissimilarity) between two relative frequency vectors. More specifically, I am using shape feature vectors that contain data about the basic shapes (circle, triangle, square) present in an image. Thus the vectors are in the form...

R Pairwise comparisson of matrix columns ignoring empty values

I have an array for which I would like to obtain a measure of the similarity between values in each column. By which I mean I wish to compare the rows between pairwise columns of the array and increment a measure when their values match. The resulting measure would then...

calculating similarity between two profiles for number of common features

I am working on a clustering problem of social network profiles and each profile document is represented by number of times the 'term of interest occurs' in the profile description. To do clustering effectively, I am trying to find the correct similarity measure (or distance function) between two of the...

Find match and similarity in two of files R language

I have two of large files, the contents of the files looks like: df1 df2 dput of df1 structure(list(X00.00.location.long. = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,...

Check for similar values VBA Excel

I am developing this macro in Excel vba that will loop through a column of client names that are sorted from a to z and check to see which ones are similar and assign them the same client ID in the column adjacent to it. I am using the Like...

Finding similar strings in array

I need to harness similar_text() for an array of values that look something like this: $strings = ["lawyer" => 3, "business" => 3, "lawyers" => 1, "a" => 3]; What I'm trying to do is find the words what are practically the same, i.e. lawyer and lawyers in the above...

What is prediction function applied for Recommendations used Tanimoto Coefficient for Item-based CF

I'm constructing a recommender system which use Item-based collaborative filtering. But I have a problem with the predict function I don't know which function can be used when calculating similarities between different items (Movies) by using Tanimoto Coefficient (Jaccard similarity coefficient)?. the following example can explain my problem. Let us...

Implementing jaccard similarity in c#

I am trying to understand "Jaccard similarity" between 2 arrays of type double having values greater than zero and less than one. Till now i have searched many websites for this but what I found is that the both arrays should be of same size(Number of elements in array 1...

PostgreSQL multiple pg_trgm similarity score sub-query

I'm fairly new to SQL and I'm struggling with a sub query. I've got a table that looks like this: sss | mm | sid ------------------+----+----- IBM LTD | | 003 I.B.M. | | 003 A.BM LTD | | 004 IMB LTD | | 004 IMB UK | | 005...

Similarity algorithm advice, using two dimensional associative array

The main goal of this algorithm is to find similar titles of news articles from different sources of web and group them, let's say above 55.55% similarity. My current approach of the algorithm consist of following steps: Feed data from MYSQL database into a two-dimensional array ex. $arrayOne. Make another...

javascript text similarity algorithm in percentage based on edit distance [closed]

I already know many edit-distance algorithm implementations in javascript, but I want to calculate the text similarity in percentage based on it. Does anyone know how to implement it?

How to calculate Jaccard similarity between two data frame with in R

I have two data frame, assume both binary dataframe(0,1), and I didn't find any methods does the Jaccard similarity coefficient between the dataframe. I have seen methods that does between the column of a single data frame. Lets say "DF1" DF1<-dataframe(a=c(0, 0, 1, 0), b=c(1,0,1,0), c=c(1,1,1,1)) and DF2: DF1<-dataframe(a=c(0,0,0,0), b=c(1,0,1,0),...