search,indexing,solr,levenshtein-distance , What indexer do I use to find the list in the collection that is most similar to my list?

What indexer do I use to find the list in the collection that is most similar to my list?


Tag: search,indexing,solr,levenshtein-distance

Lets say I have my list of ingredients: {'potato','rice','carrot','corn'}

and I want to return lists from a database that are most similar to mine:

{'beans','potato','oranges','lettuce'}, {'carrot','rice','corn','apple'} {'onion','garlic','radish','eggs'}

My query would return this first: {'carrot','rice','corn','apple'}

I've used Solr, and have looked at CloudSearch, ElasticSearch, Algolia, Searchify and Swiftype. These engines only seem to let me put in one query string and then filter by other facets.

In a real scenario my search list will be about 200 items long and will be matching against about a million lists in my database.

What technology should I use to accomplish what I want to do?

Should I look away from search indexers and more towards database-esque things like mongo, map reduce, hadoop... All I know are the names of other technologies and I just need someone to point me in the right direction on what technology path I should be exploring for this.

With so much data I can't really loop through it, I need to query everything at once.


I wonder what keeps you from trying it with Solr, as Solr provides much of what you need. You can declare the field as type="string" multiValued="true and save each list item as a value. Then, when querying, you specify each of the items in the list to look for as a search term for that field, and Solr will – by default – return the closest match. If you need exact control over what will be regarded as a match (e.g. at least 40% of the terms from the search list have to be in a matching list) you can use the mm EDisMax parameter, cf. Solr Wiki

Having said that, I must add that I’ve never searched for 200 query terms (do I unerstand correctly that the list whose contents should be searched will contain about 200 items?) and do not know how well that performs. But I guess that setting up a test core and filling it with random lists using a script should not take more than a few hours, so it should be possible to evaluate the performance of this approach without investing too much time.


Using [] on pointers in C?

Right now I'm looking over some C code and they have some pointer syntax that I'm confused about. So first they declared a pointer like so: int32_t *p_tx_buf=NULL; Then later on they wrote: p_tx_buf = malloc(...math... ); The stuff in the middle is just math to calculate the size of...

Printing corresponding elements from multiple lists by using known element from a single list in Python

I have 4 very large arrays, a, b, c, and d of the same size. I have selected specific elements in 'a', and I need the corresponding elements from b, c, and d (i.e. if I select an element from 'a' want the elements from the other arrays that have...

Mysql Fulltext search returns empty result while there are 100+ rows

The query I use to get rows: SELECT * FROM `sentence` WHERE MATCH(text) AGAINST('hello') Mysql returns empty result when I run this query. However if I use LIKE keyword to get rows SELECT * FROM `sentence` WHERE text LIKE '%hello%' Then, Mysql returns 144 rows. And lets come to a...

Optimize the execution of select

I want optimize this select: Select Dane1, Dane5, Dane6, Dane7 FROM Test INNER JOIN Test2 ON Test.Id=Test2.IdTest WHERE Dane5 > 199850 My database has 2 tables test, test2: test design: Id int ->PRIMARY KEY, Dane1 int, Dane2 int, Dane3 int, Dane4 int, Dane5 int, test2 design: Id int ->PRIMARY KEY,...

Elasticsearch advanced search

I find Elasticsearch gem and want to use it. For example I have: Method in app/controller/search_controller.rb: def search if params[:q].nil? @articles = [] else @articles = params[:q] end end View at app/views/search/search.html.erb: <h1>Articles Search</h1> <%= form_for search_path, method: :get do |f| %> <p> <%= f.label "Search for" %> <%=...

SQL find same value on multiple filelds with like operator

I have this records from my users table: user_id first_name last_name gender email ******* ********** ********* ****** ***** 229 Natalie Fern F [email protected] and I want to search same First Name & Last Name from first_name OR last_name. I have created sql query but not getting record. SELECT * FROM...

grep first n rows, return file name only

I can do the following to search for what I need and return the file name: grep -l "mysearchstring" ./*.xml However the files I am searching are huge so this takes forever. The string I am searching will appear in the first 200 rows so how can I search only...

Do you get the same performance using index prefixes?

Say I have a collection containing documents like the one below: { _id: ObjectId(), myValue: 123, otherValue: 456 } I then create like below: {myValue: 1, otherValue: 1} If I execute the following query: db.myCollection.find({myValue: 123}) will I get the same performance with my index as I would if I...

System.ArgumentOutOfRangeException in For loop due to entry deleted from database. Help fix it

Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click Dim TimeNow As DateTime TimeNow = DateTime.Now With ReservationTableDataGridView For i As Integer = 0 To (.Rows.Count - 1) ListView1.BackColor = Color.Black Dim ToBeTested As DateTime ToBeTested = (.Rows(i).Cells(1).Value) # Console.WriteLine(ToBeTested) If ToBeTested > TimeNow.AddMinutes(45) Then Me.ReservationTableTableAdapter.Delete((.Rows(i).Cells(4).Value), (.Rows(i).Cells(1).Value), (.Rows(i).Cells(2).Value), (.Rows(i).Cells(3).Value))...

If statement for search field in Rails

I have a blog site and recently added a search bar for visitors to browse through blog posts. When used, all of the posts go away in the directory and only those which return via the search appear. In order to show the general directory and view all, I would...

How can I access semi-sparse data efficiently in java?

So I'm working with a problem where I am parsing a large text file into data - each row of the file being represented by a Node object with several data fields. During program execution, these objects will be accessed many times according to their int id field (specified in...

How to use all the cores of Solr in solrj

I have downloaded solr 5.2.0 and have started using $solr_home/bin/solr start The Logs stated: Waiting to see Solr listening on port 8983 [/] Started Solr server on port 8983 (pid=17330). Happy searching! Then I visited http://localhost:8983/solr and created a new core using Core Admin / new Core as Core1 (...

Looping through numbers to create a large table

I have a code that works, but I want to add some more functionality to it. It currently does what it is supposed to do, and has sped up some processes, but now I think it can be sped up even more. I am using the solution that I marked...

How do I make a query search in rails case insensitive?

I have this search method in my user.rb model def where("description like ?", "%#{query}%") end but my search is case sensitive. I want to make it case insensitive. How do I do that? I'm hoping it's a quick fix. Thanks...

Pagination with PDO MySQL Search Multiple Form Fields

I am having a heck of a time getting this script to work. I have two search fields. That is the reason for the different queries. I can't get the pagination to work with the search script. The pagination alwasy brings back all the records and I get a blank...

SQL - Avoid 2 columns in 1 record having the same value

I'm now working on a new MySQL database (interface PHPmyadmin) for my personal project and I don't know how to avoid two columns on a same table to have the same value. So, i want that we can't insert same values for 2 differents columns in one record. Example TABLE...

IndexError obstructing code from working with larger csv file

I have data that sorts a csv by using groupby and then plots the information. I used a small sample of information to create the code. It ran smoothly and so then I tried running it with the huge file of data. I am pretty new at Python and this...

Search barre php+mysql “Page not found”

I've a small issue with this search barre : When I search for the Author or the Session, the search is succesful and everything is fine. But if I search anything in the last box (the Name one) my page is redirected to the site homepage. (with "Page not found"...

Trying to create a new dataframe based on internal sums of a column from another dataframe using Python/pandas

Let's assume I have a pandas dataframe df as follow: df = DataFrame({'Col1':[1,2,3,4], 'Col2':[5,6,7,8]}) Col1 Col2 0 1 5 1 2 6 2 3 7 3 4 8 Is there a way for me to change a column into the sum of all the following elements in the column? For...

How can i search multiple words in a file on Notepad++

I'm trying to find a list of Strings in a file (already formated from \n and \r), for example : 0145100841 65722ED01A 657738J000 6584640F00 // this one is found in a file of this : 201506186584640F00AME NMGR01 RUBAT How can i do this in Notepad++ ? Is there an existing...

How can I add custom search engine to browser?

Basically what I'm trying to do is to be able to add my own "search engine" (based on PHP and Mod_rewrite) to any browser automatically. Somewhere on the Internet I found that I need to declare a search provider. How can I do this? <link>, manifest or some JavaScript?

Optimizer using an index not present in the current schema

CONNECT alll/all SELECT /*+ FIRST_ROWS(25) */ employee_id, department_id FROM hr.employees WHERE department_id > 50; Execution Plan Plan hash value: 2056577954 | Id | Operation | Name | Rows | Bytes | | 0 | SELECT STATEMENT | | 25 | 200 | 1 | TABLE ACCESS BY INDEX ROWID| EMPLOYEES...

ElasticSearch- “No query registered for…”

ElasticSearch returns me "No query registered for [likes_count]" error when trying to look up entries using the following query. The field likes_count is a new field of documents and does not exist in every document. The same query works without the sort part. Why does this error appear? Thanks {...

Searching a TextField and IntField together seperated by an AND condition In Lucene

I have indexed my documents as: doc.add(new IntField("ID", id, Field.Store.YES)); doc.add(new TextField("First_Name", First_Name, Field.Store.YES)); doc.add(new TextField("Last_Name", Last_Name, Field.Store.YES)); doc.add(new TextField("Address", add, Field.Store.YES)); doc.add(new TextField("City", city, Field.Store.YES)); doc.add(new TextField("State", state, Field.Store.YES)); doc.add(new IntField("Zip_Code", zip, Field.Store.YES)); Where id, FirstName, city, add, state, zip are variables that store the values to be indexed....

Swift ios relational picker views and apple dev guidelines

Right now I have a picker view that shows up when you press a label, and after you have selected anything from the picker view and hit done it will hide and the label will change to the value you selected. But I want to implement another picker view, and...

PHP/mySQL results not being displayed [duplicate]

This question already has an answer here: mysql_fetch_array() expects parameter 1 to be resource (or mysqli_result), boolean given 29 answers I keep getting a warning on my server after trying to do a search it says: Warning: mysql_fetch_array() expects parameter 1 to be resource, boolean given inform/search.php on line...

Creating NxM indices from a Python array

I would like to take an NxM matrix, for simplicity, we'll use x=np.arange(25).reshape((5,5)) And I would like to create a new matrix, A, in which I can store a node for each element in the first row, its N-direction index in the second row, its M-direction index in the third...

Search box/field design with multiple search locations

Not sure if this question is better suited for a different StackExchange site but, here goes: I have a search page that searches a number of different type of things. All (at the moment) requiring a different input field for each type of search. For example, one might search for...

Can anyone help me make the search bar work as I now have the JS prompt? [on hold]

I have created a small program that pulls from the YouTube API which allows you to search for a random video for whatever title you enter when prompted. My goal is to have this work like a search engine. I would like to make my search bar the input instead...

Recursive solution doesn't iterate correctly

I'm working through a toy problem in Ruby: how to produce all possible 10-digit phone numbers where each successive number is adjacent to the last on the keypad. I've represented the adjacent relationships between numbers, and have a recursive function, but my method isn't iterating through the whole solution space....

Sql string search

This is my question: Find the capital and the name where the capital includes the name of the country. This is my answer: SELECT name,capital FROM world WHERE capital=concat(name,"NULL") I dont understand what should i do in the "WHERE" part. Should i do capital=concat(name,something?) or IN or LIKE? Can you...

How to make a big form?

I want to make a big form with 9 fields of search (html with php/sql). Today, I use Case When Then. But with 9 fields, there are many combinations For example, when I search for 5 fields. I get back the field if there (Null if there is no) and...

Which tables do not have indexes in Oracle? [closed]

I think its wrong question ,but its a university project assignment .. The question : I want to find tables without indexes in Oracle with select statement in SqlPlus. Thanks for helping.....

Understanding Apache Lucene's scoring algorithm

I'm working with Hibernate Search for months now, but still I'm not able to digest the relevance it brings. I'm overall satisfied with the results it returns, but even simplest test does not satisfy my expectation. First test was using the term frequency(tf). Data: word word word word word word...

np.argmax on multidimensional arrays, keeping some indexes fixed

I have a collection of 2D narrays, depending on two integer indexes, say p1 and p2, with each matrix of the same shape. Then I need to find, for each pair (p1,p2), the maximum value of the matrix and the indexes of these maxima. A trivial, albeit slow, way to...

Postgres Index-only-scan: can we ignore the visibility map or avoid heap fetches?

Sorry, lots of context before the actual question as we've throughly researched this and I wanted to give you full context. Some context: postgres index-only-scans rely on the visibility map (VM). If a page is not marked as not-fully-visible in the visibility map, postgres fetches that page to ensure the...

Retrieve index of newly added row - for loop in R

I am trying to retrieve the index of a newly-added row, added via a for loop. Starting from the beginning, I have a list of matrices of p-values, each with a variable number of rows and columns. This is because not all groups have an adequate number of treated individuals...

Reduction of list dimensions in Python

I'm trying to assign classes to a list of nodes, and separate all nodes into separate lists based on class tag. For example, if we have the following code: #define number of classes MaxC=5 index=[4 4 5 1 4 1 4 5 4 4 3 1 3 3 1 1]...

Creating index while updating the documents

I have a collection I am updating adding a new field. The document looks like: {"A": "P145", "B":"adf", "C":[{"df":"14", "color":"blue"},{"df":17}], "_id":ObjectID(....), "Synonyms":{"Synonym1": "value1", "Synonym2": ["value1", "value2"]}} In the update I am adding new elements to C I want to create a index on the field A and B. A and...

Treeview search results flat list

How can I make the list on the right populate with only items which meet the search criteria? I'm not asking for the literal code necessarily, but just some general guidance on how to do so. I've already written the code to populate the list on the left with C#,...

Pandas: break categorical column to multiple columns

Imagine a Pandas dataframe of the following format: id type v1 v2 1 A 6 9 1 B 4 2 2 A 3 7 2 B 3 6 I would like to convert this dataframe into the following format: id A_v1 A_v2 B_v1 B_v2 1 6 9 4 2 2...

Python Pandas: select rows based on comparison across rows

In the dataframe below, the first column is the index with occasional non-unique values. | | col1 | |---|------| | A | 120 | | A | 90 | | A | 80 | | B | 80 | | B | 50 | | C | 120 | |...

PHP sum echo result on the fly (difficult)

This is a difficult one but easy for professionals. I have in mysql the fields: product name, mycost, sellprice and stock. Because I need to know how much I have in investment in my store by each product I created the follow on the fly calculation and is working fine:...

Neo4j: configure legacy index with cypher or property file

I am using Neo4j-server and I am trying to find a way to configure a legacy index either with cypher or with a property. So far to enable the legacy indexing I just uncommented the related line in file. How can I apply some of the configurations listed here...

KQL - Ignoring items with property not equal to value

I have to configure the site search so that it does not include items wich have a property of ModerationStatus != 1. I found out that using a query like ModerationStatus <> 1 can probably solve my problem, but I am not sure if it will work in my environment...

Get Order Details by Order Id

I need to retrieve an order from Magento by its id. How do I load a specific order by id?

Using StringComparer with StringBuilder to search for a string

I need to use globalization rules to search for all occurrences of a string within a document. The pseudocode is: var searchText = "Hello, World"; var compareInfo = new CultureInfo("en-US").CompareInfo; DocumentIterator start = null; // the start position if a match occurs var sb = new StringBuilder(); // the document...

Creating Index in Elasticsearch using Java API giving NoClassFoundException

I'm trying to create a node based client using Java API and index a JSON document. Here's the code : import java.util.Date; import java.util.HashMap; import java.util.Map; import org.elasticsearch.action.deletebyquery.DeleteByQueryResponse; import org.elasticsearch.client.Client; import org.elasticsearch.node.Node; import static org.elasticsearch.node.NodeBuilder.*; public class Els { public static void main (String args[]){ Els p = new Els();...