The gotcha with this question is "arbitrary metric". If you don't know what that is, it's just the way to measure distance between points. (In the "real" world, the 1-dimensinal distance is just the absolute magnitude of the difference between the two points).

Enough of the pre-lims. I'm trying to find a fast k nearest neighbor algorithm with these properties:

- works on an arbitrary metric
- somewhat easy to implement
- optimized for finding the distance of a set of points to another set of points

Wikipedia gives a list of algorithms and approaches but nothing on implementation.

UPDATE: the metric is the cosine similarity, which does *not* satisfy the triangle inquality. However, it seems that I can use the "angular similarity" (as per Wikipedia).

UPDATE: the use case is natural language processing. "Vectors" are the "context" of a given word, represented by binary properties (ex: the title of the document). So while there may be only a few properties (right now I'm just using 3), each vector has arbitrarily large dimension (in the title example, each title in the database would correspond to a dimension in the vector).

UPDATE: For the curious, I'm implementing this algorithm:

http://josquin.cs.depaul.edu/~mramezani/papers/IEEEIS.pdf

UPDATE: The algorithm will need to find nearest neighbors for about a dozen points from about 100s of points. The average dimension will probably be very large, say 50, (I really don't know yet). And yes, I'm interested in an algorithm, not a library. And yes, estimates are probably good enough.

Answer:

I would advice you to go for Locality-sensitive hashing (LSH), which is in trend right now. It reduces the dimensionality of high-dimensional data, but I am not sure if your dimension will go well with that algorithm. See the Wikipedia page for more.

You can use your own metric, but in general you can do that in many algorithms. Hope this helps.

You could go for RKD trees, a forest of them, but maybe this is too much now.

algorithm,big-o

I was doing some reading on logarithms and the rate of growth of the running time of algorithms. I have, however, a problem understanding the Big-Ω (Big-Omega) notation. I know that we use it for 'asymptotic lower bounds', and that we can express the idea that an algorithm takes at...

math,cordic

How can we use cordic to tanh(x+1)/tanh(x) I can't get a idea about how to apply cordic to above function. In other word, which point on the above function, can we apply cordic?...

algorithm,graph

I am looking for an algorithm that finds minimal subset of vertices such that by removing this subset (and edges connecting these vertices) from graph all other vertices become unconnected (i.e. the graph won't have any edges). Is there such algorithm? If not: Could you recommend some kind of heuristics...

c++,algorithm,math,recursion

I'm trying to find all possible solutions to the 3X3 magic square. There should be exactly 8 solutions. My code gets them all but there are a lot of repeats. I'm having a hard time tracking the recursive steps to see why I'm getting all the repeats. // This program...

algorithm,numerical-methods,numerical,numerical-integration

Consider the following differential equation f(x) = g'(x) I have a build a code that spits out values of the function f(x) for the variable x, where x goes from 0 to very large. Now, I'm looking for a scheme that will analyse these values of f(x) in order to...

python,math,rounding

This question already has an answer here: How do you round UP a number in Python? 9 answers Python round up integer to next hundred 6 answers so i'm making a pizza program in Python 3.3 that takes input from the user and prints the amount of pizza's needed....

c++,algorithm,graph

Someone can help me to think in the better way to adapt Dijkstra's Algorithm in these conditions? All I thought didn't was good. Example of input: GP4578 MADRID 01:00 PORTO 02:00 IK6587 PORTO 03:00 VALENCIA 05:00 05:30 TENERIFE 08:00 AB5874 VALENCIA 05:40 BERLIM 10:00 "VALENCIA 05:00 05:30" This is a...

python,regex,algorithm,python-2.7,datetime

If I knew the format in which a string represents date-time information, then I can easily use datetime.datetime.strptime(s, fmt). However, without knowing the format of the string beforehand, would it be possible to determine whether a given string contains something that could be parsed as a datetime object with the...

algorithm,data-structures

In an exam I see a question: Which one of the following is true? For a binary search, the best-case occurs when the target item is in the beginning of the search list. For a binary search, the best-case occurs when the target is at the end of the search...

arrays,matlab,math,for-loop,while-loop

This is my one dimensional array A, containing 10 numbers: A = [-8.92100000000000 10.6100000000000 1.33300000000000 ... -2.57400000000000 -4.52700000000000 9.63300000000000 ... 4.26200000000000 16.9580000000000 8.16900000000000 4.75100000000000]; I want the loop to go through like this; (calculating mean interval wise) - Interval length of 2,4,8 (a(1)+a(2))/2 - value stored in one block of...

string,algorithm,palindrome

Given length L find the shortest string >= L formed only of as & bs such that adding some character (Either a or b) doesn't produce a new palindrome substring (never seen before palindrome) For example for L = 1 there is the string aabbaba, adding "a" to it to...

c,string,algorithm,data-structures

int myStrCmp (const char *s1, const char *s2) { const unsigned char *p1 = (const unsigned char *)s1; const unsigned char *p2 = (const unsigned char *)s2; while (*p1 != '\0') { if (*p2 == '\0') return 1; if (*p2 > *p1) return -1; if (*p1 > *p2) return 1;...

algorithm,bit-manipulation

Given a 64 bit number, I need to extract every other bit from it, and convert it into a number: decimal: 357 binary: 0000 0001 0110 0101 odd bits: 0 0 0 1 1 0 1 1 decimal: 27 Any idea of a good algorithmic way to do it? And...

algorithm,recursion,permutation

Assuming I have a list of elements [1,2,3,4,] and a number of bins (let's assume 2 bins), I want to come up with a list of all combinations of splitting up items 1-4 into the 2 bins. Solution should look something like this [{{1}, {2,3,4}}, {{2}, {1,3,4}}, {{3}, {1,2,4}}, {{4},...

algorithm,math,statistics,variance,standard-deviation

I'm successfully using Welford's method to compute running variance and standard deviation as described many times on Stack Overflow and John D Cook's excellent blog post. However in the stream of samples, sometimes I encounter a "rollback", or "remove sample" order, meaning that a previous sample is no longer valid...

math,javafx,geometry,coordinates

I have a line which has points (x1,y1) and (x2,y2). I wanted to attach a Button to it, it should align with the line by rotating based on the line segment points. I need some help in calculating the rotation angle for the Button.

algorithm,go

I am trying to solve the following problem from project euler (please take a look at description and the example in the link, but here is the short explanation). in the matrix, find the minimal path sum from the top left to the bottom right, by moving left, right, up,...

c++,algorithm

The project Euler problem 5 is stated as : "2520 is the smallest number that can be divided by each of the numbers from 1 to 10 without any remainder. What is the smallest positive number that is evenly divisible by all of the numbers from 1 to 20?" Here's...

ruby,algorithm,search,recursion

I'm working through a toy problem in Ruby: how to produce all possible 10-digit phone numbers where each successive number is adjacent to the last on the keypad. I've represented the adjacent relationships between numbers, and have a recursive function, but my method isn't iterating through the whole solution space....

javascript,algorithm,recursion

While working through some Coderbyte challenges, I was able to solve the following problem recursively, but was hoping to get some feedback on how I can improve it. Have the function AdditivePersistence(num) take the num parameter being passed which will always be a positive integer and return its additive persistence...

java,math,set,linkedhashset

I am trying to do a homework in math which is find a subset of collection {1,2,..,n} where n is a number given in the code, I cannot get it done with the sublist so I need to get your help with a math programming. For example for n =...

php,algorithm,digits

I'm following the programming test, and there are questions like this From this consecutive digits: 123456789101112131415161718192021.... For example The 10th digit is 1 The 11th digit is 0 and so on What is the 1,000,000th digit? What is the 1,000,000,000th digit? What is the 1,000,000,000,000th digit? Your solution should run...

c++,algorithm,parallel-processing,c++14

Proposal N3554 (A Parallel Algorithms Library) for C++14, proposes (among other things), what seem to be parallel versions of the current std::partial_sum, e.g.: template< class ExecutionPolicy, class InputIterator, class OutputIterator, class BinaryOperation> OutputIterator inclusive_scan( ExecutionPolicy &&exec, InputIterator first, InputIterator last, OutputIterator result, BinaryOperation binary_op); With the explanation Effects: For each...

python,algorithm

I am trying to calculate least completion time of a project schedule, which has n number of tasks and each task can have a dependency task. example Task ID Duration Days Dependent ID 1 10 0 2 3 0 3 6 1 4 5 2 5 10 1 Task with...

algorithm,coordinates,coordinate-systems,coordinate

I'm writing a program for visualizing crystals. As a part of the program, I have to generate all different basic points in a lattice structure. For those that aren't familiar with crystallography, you can find the most general cases of these structures here: https://en.wikipedia.org/wiki/Hermann%E2%80%93Mauguin_notation#Lattice_types The problem was that I wanted...

algorithm,graph,cycle,dfs

I came around one of those questions in my exams: Topologocial sorting using Kahn's Algorithm requires the graph to be DAG (Directed Acyclic Graph). How can we determine if a graph contains no cycles without using DFS/BFS first? I am trying to answer that for too long now and I...

math,vector,3d,cube

I'm trying to figure out the math to find a random point inside a cube. I have something small but it can't take into account the rotation of the cube. Here are some images of my results. Here you can see the cube is rotated to some degree but when...

c++,arrays,algorithm,recursion

I'm trying to write a recursive algorithm that returns true if at least one array[i] == i. And false if there is no array[i] = i. Also, the parameters needed are (int * arr, int start, int end). So i'll be traversing the array with a pointer. For example: int...

c++,algorithm,inheritance,time-complexity

Here I have 800 derived classes of Base and a list of 8000000 objects of these types, which can be of any order. The goal is to separate the list into the 800 types as efficiently as possible. Here I have written two functions to do that. The first is...

python,algorithm,sorting,math

I am looking for a basic algorithm that gives more weigh to the recent reviews. So, the output value of the algorithm is mutable. For example, two reviews with exactly the same score, will have a different ranking based on the timestamp of the creation. Review_1 Score 10 creation 10/5/2014...

python,algorithm,time-complexity,longest-substring

Someone asked me a question Find the longest alphabetically increasing or equal string composed of those letters. Note that you are allowed to drop unused characters. So ghaaawxyzijbbbklccc returns aaabbbccc. Is an O(n) solution possible? and I implemented it code [in python] s = 'ghaaawxyzijbbbklccc' lst = [[] for i...

algorithm

I attended a programming competition(it has ended now). I don't know why my solution is giving WA, I read the editorial, saw other people solution but unable to find a flaw in my solution. Obviously I am missing somewhere. Please help! Question Link My Solution My approach If the pattern...

c,algorithm,security,math,encryption

I'm trying to reverse the following code in order to provide a function which takes the buffer and decrypts it. void crypt_buffer(unsigned char *buffer, size_t size, char *key) { size_t i; int j; j = 0; for(i = 0; i < size; i++) { if(j >= KEY_SIZE) j = 0;...

algorithm,discrete-mathematics

I'm doing an online course and i'm stuck on this question. I know there are similar questions but they don't help me. What is the order of growth of the worst case running time of the following code fragment as a function of N? int sum = 0; for (int...

c++,algorithm,sorting,c++11

I have implemented three different sorting algorithms and now I want to confirm that my approach of counting the total number of comparisons is correct. In my mind, the number of comparisons shouldn't be tied to the conditional branches because if the condition isn't met, the comparison was still made...

python,math,combinations,itertools

Other than doing this: from itertools import combinations def brute_force(x): for l in range (1,len(x)+1): for f in list(combinations(range(0,len(x)),l)): yield f x = range(1,18) len(list(brute_force(x))) [out]: 131071 How could I mathematically calculate the number of all possible combinations? Is there a way to do it computationally without enumerating the possible...

c#,regex,algorithm

I'm making a script to try and hack into an account whose login password is at least 8 characters long and includes at least 1 number, 1 special character and 1 capital letter. I will use brute force. Is there a compact, elegant and efficient way to iterate through every...

javascript,arrays,math

i was playing around in code academy and for some reason when i change the last slaying i the else statement to true instead of making the person invincible it just crashes my browser any ideas why thank you in advance var slaying = true; var youHit = Math.floor(Math.random()*2); var...

java,math

I did a code like this: public static double myPow(double x, int n){ if(n==0) return 1; double t = myPow(x, n/2); if(n % 2 != 0){ if(n < 0){ return (1 / t * t * x); } else { return t * t * x; } } else {...

python,algorithm,permutation

I'm trying to write a function that would input a list (of pre-determined length) and output the permutation of that list according to my choice of re-ordering. For example, suppose I have the list [1,'a',12,'b','poppy'] and I have the permutation (3,4,5,2,1) Then the function output would be a list with...

javascript,algorithm

I accidentally got undefined == true and undefined == false that both of them returns false. But !undefined returns true. And this is the question: What's the algorithmic difference(s) between !someVariable and someVariable == false? If i want to explain it more, type undefined == false ? 't' : 'f'...

algorithm

You are managing a software project that involves building a computer-assisted instrument for medical surgery. The exact placement of the surgical knife is dependent on a number of different parameters, usually at least 25, sometimes more. Your programmer has developed two algorithms for positioning the cutting tool, and is seeking...

c++,algorithm,matrix

I am trying to print the right hemisphere of a matrix. If we draw the main and the secondary diagonals in a matrix we see that we got 4 equal parts, the right part is called (in my algorithms textbook) the right hemisphere of a square matrix. For example, in...

algorithm,optimization,dynamic-programming,frequency

I have the following problem and I only have a slight idea about it: Consider a tape storage problem. Given n files of length l1,...,ln and frequencies with which they are accessed f1,...,fn, where sum of all frequencies is 1 and 0<fi<1. "Optimal" means to minimize the average retrieval time...

algorithm,dynamic-programming,knapsack-problem

I am familiar with the 0-1 knapsack problem and when you are given a certain number of copies from each item but I can figure out how to solve it when you are given infinite copies of each item using dynamic programming. I am trying to solve it by hand...

python,math

I'm writing a python script to find out some things about randomization. I have the following code: from random import randint one = 0 two = 0 olddiff = 0 diff = 0 sumdiff = 0 avgdiff = 0 headcounter = 0 counter = 0 while (headcounter < 500000): while...