search,solr,lucene,full-text-search,hibernate-search , Understanding Apache Lucene's scoring algorithm

Understanding Apache Lucene's scoring algorithm


Tag: search,solr,lucene,full-text-search,hibernate-search

I'm working with Hibernate Search for months now, but still I'm not able to digest the relevance it brings. I'm overall satisfied with the results it returns, but even simplest test does not satisfy my expectation.

First test was using the term frequency(tf). Data:

Results I get:

  1. word
  2. word word word word
  3. word word word word word
  4. word word word word word word
  5. word word
  6. word word word

I'm really confused with this scoring effect. My Query is quite complex, but as this test did not have any other field involved, it can be simplified as below: booleanjunction.should(phraseQuery).should(keywordQuery).should(fuzzyQuery)

I've analyzers as below:

 SnowballPorterFilterFactory for english

My Explanation object


Scoring calculation is something really complex. Here, you have to begin with the primal equation:

score(q,d) = coord(q,d) · queryNorm(q) · ∑ ( tf(t in d) · idf(t)2 · t.getBoost() · norm(t,d) )

As you said, you have tf which means term frequency and its value is the squareroot of the frequency of the term.

But here, as you can see in your explanation, you also have norm (aka fieldNorm) which is used in fieldWeight calculation. Let's take your example:

eklavya eklavya eklavya eklavya eklavya

4.296241 = fieldWeight in 177, product of:
  2.236068 = tf(freq=5.0), with freq of:
    5.0 = termFreq=5.0
  4.391628 = idf(docFreq=6, maxDocs=208)
  0.4375 = fieldNorm(doc=177)


4.391628 = fieldWeight in 170, product of:
  1.0 = tf(freq=1.0), with freq of:
    1.0 = termFreq=1.0
  4.391628 = idf(docFreq=6, maxDocs=208)
  1.0 = fieldNorm(doc=170)

Here, eklavya has a better score than the other because fieldWeight is the product of tf, idf and fieldNorm. This last one is higher for eklavya document because he only contains one term.

As above documentation said:

lengthNorm - computed when the document is added to the index in accordance with the number of tokens of this field in the document, so that shorter fields contribute more to the score.

The more terms you have in a field, lower fieldNorm will be. Be careful with the value of this field.

So, to conclude, here you have a perfect mix to understand that the score is not calculated only with the frequency but also with the number of term that you have in your field.


Solr 5.1.0: How to set the unique key via Schema API

In Solr 5.1.0, is it possible to set the unique key via the REST schema api? I created a collection with the data driven schema. Solr would guess what the field type and create the field based on the data I upload. I can still define fields beforehand by sending...

How to index plain text files for search in Sphinx

I scanned dozens of articles and forum threads, looked through official documentation, but couldn't find an answer. This article sounds promising, since is says that The data to be indexed can generally come from very different sources: SQL databases, plain text files, HTML files, but unfortunately as all other articles...

Getting application/json back from a Solr query

I'm calling the Solr REST api using a Jersey client: final ClientResponse resp = client().path(queryPath()) .queryParam("q", query.getQuery()) .queryParam("wt", "json") .accept(MediaType.APPLICATION_JSON_TYPE) .get(ClientResponse.class); resp.getEntity(HttpResponse.class) and when I run it I get: A message body reader for Java class challenger.HttpResponse, and Java type class challenger.HttpResponse, and MIME media type text/plain; charset=UTF-8 was not...

How to skip a row with file exists condition in laravel

This is for a search query based on many input fields, i'm doing if statements inside the query based on the inputs, for example : $query = Model::all(); if($field = Input::get('field')) $query->where('column_name', $field); but what i want to do also is a condition to skip a row if there is...

KQL - Ignoring items with property not equal to value

I have to configure the site search so that it does not include items wich have a property of ModerationStatus != 1. I found out that using a query like ModerationStatus <> 1 can probably solve my problem, but I am not sure if it will work in my environment...

SQL find same value on multiple filelds with like operator

I have this records from my users table: user_id first_name last_name gender email ******* ********** ********* ****** ***** 229 Natalie Fern F [email protected] and I want to search same First Name & Last Name from first_name OR last_name. I have created sql query but not getting record. SELECT * FROM...

Understanding Apache Lucene's scoring algorithm

I'm working with Hibernate Search for months now, but still I'm not able to digest the relevance it brings. I'm overall satisfied with the results it returns, but even simplest test does not satisfy my expectation. First test was using the term frequency(tf). Data: word word word word word word...

VB.Net - How to dynamicaly search for a string in all TreeView nodes expanding and collapsing nodes matching (or not) the search string?,search,treeview,collapse,expand
I´m trying to implement dynamic search on a treeview component, and I´m almost done with it, except that since it´s a dynamic search based on the textchanged event of a textbox, the first characters of the search string are always found, so the search function expand all nodes because they...

Sql string search

This is my question: Find the capital and the name where the capital includes the name of the country. This is my answer: SELECT name,capital FROM world WHERE capital=concat(name,"NULL") I dont understand what should i do in the "WHERE" part. Should i do capital=concat(name,something?) or IN or LIKE? Can you...

If statement for search field in Rails

I have a blog site and recently added a search bar for visitors to browse through blog posts. When used, all of the posts go away in the directory and only those which return via the search appear. In order to show the general directory and view all, I would...

ElasticSearch- “No query registered for…”

ElasticSearch returns me "No query registered for [likes_count]" error when trying to look up entries using the following query. The field likes_count is a new field of documents and does not exist in every document. The same query works without the sort part. Why does this error appear? Thanks {...

Treeview search results flat list

How can I make the list on the right populate with only items which meet the search criteria? I'm not asking for the literal code necessarily, but just some general guidance on how to do so. I've already written the code to populate the list on the left with C#,...

Solr custom UpdateRequestProcessorFactory fails with “Error Instantiating UpdateRequestProcessorFactory”

I have a custom class extending UpdateRequestProcessorFactory doing some work on a document when it gets added to the index. This was working fine in v4.10.3 in standalone Solr. I moved to SolrCloud v5.2 and it throws this error when adding the Collection (node): ERROR - 2015-06-14 12:25:11.071; [ docs_shard1_replica1]...

How to use all the cores of Solr in solrj

I have downloaded solr 5.2.0 and have started using $solr_home/bin/solr start The Logs stated: Waiting to see Solr listening on port 8983 [/] Started Solr server on port 8983 (pid=17330). Happy searching! Then I visited http://localhost:8983/solr and created a new core using Core Admin / new Core as Core1 (...

Get Order Details by Order Id

I need to retrieve an order from Magento by its id. How do I load a specific order by id?

How can i search multiple words in a file on Notepad++

I'm trying to find a list of Strings in a file (already formated from \n and \r), for example : 0145100841 65722ED01A 657738J000 6584640F00 // this one is found in a file of this : 201506186584640F00AME NMGR01 RUBAT How can i do this in Notepad++ ? Is there an existing...

Is it possible to index views in Apache Solr

Let me first give you an example. I have two tables -table1 and table2. table1 has a field id_table2, which is a foreign key and references one of the fields in table2. So, when I want to scan table1, I make a query like: SELECT t1.attr_1_, t1.attr_2_, t2.attr_3_ FROM table1...

How do I make a query search in rails case insensitive?

I have this search method in my user.rb model def where("description like ?", "%#{query}%") end but my search is case sensitive. I want to make it case insensitive. How do I do that? I'm hoping it's a quick fix. Thanks...

Assigning value to Search Bar Delegate - Swift

So I was following this link for implementing a Search Bar in my Table View. I had to make a few changes because my cells contained Objects with different values and data types and so I had to adapt the Search function accordingly. I'm pretty sure I have most of...

Solr 4.10.2 MySQL import fails with

I'm trying to migrate a server with Solr 4.7.2 on it. I have a Solr 4.10.2 with 4 cores running which is the new machine. I have an importer running on the old machine that poses no problem. However, when trying to run the importer on the new machine, I...

How can I sort by realtime score in solr?

Now I have a solr collection: question question has some field: id answer_count created_at updated_at now I have the sort rule: score = answer_count * 100 - (the hours now to created_at) * 5 then I need to sort by the score desc. how can i do that because of...

Swift ios relational picker views and apple dev guidelines

Right now I have a picker view that shows up when you press a label, and after you have selected anything from the picker view and hit done it will hide and the label will change to the value you selected. But I want to implement another picker view, and...

Elasticsearch advanced search

I find Elasticsearch gem and want to use it. For example I have: Method in app/controller/search_controller.rb: def search if params[:q].nil? @articles = [] else @articles = params[:q] end end View at app/views/search/search.html.erb: <h1>Articles Search</h1> <%= form_for search_path, method: :get do |f| %> <p> <%= f.label "Search for" %> <%=...

Using StringComparer with StringBuilder to search for a string

I need to use globalization rules to search for all occurrences of a string within a document. The pseudocode is: var searchText = "Hello, World"; var compareInfo = new CultureInfo("en-US").CompareInfo; DocumentIterator start = null; // the start position if a match occurs var sb = new StringBuilder(); // the document...

Rails4 + sunspot search

I am trying to use sunspot solr for searching with Rails 4 and mysql. I defined a searchable block in my model(eg XYZ): searchable do text :name, :stored => true string :id, :stored => true end I just want to search in "name". The "id" is the primary key. There...

grep first n rows, return file name only

I can do the following to search for what I need and return the file name: grep -l "mysearchstring" ./*.xml However the files I am searching are huge so this takes forever. The string I am searching will appear in the first 200 rows so how can I search only...

Searching a TextField and IntField together seperated by an AND condition In Lucene

I have indexed my documents as: doc.add(new IntField("ID", id, Field.Store.YES)); doc.add(new TextField("First_Name", First_Name, Field.Store.YES)); doc.add(new TextField("Last_Name", Last_Name, Field.Store.YES)); doc.add(new TextField("Address", add, Field.Store.YES)); doc.add(new TextField("City", city, Field.Store.YES)); doc.add(new TextField("State", state, Field.Store.YES)); doc.add(new IntField("Zip_Code", zip, Field.Store.YES)); Where id, FirstName, city, add, state, zip are variables that store the values to be indexed....

sum of rows maximum and minimum is not correct. whats the issue?

I have a C program. I calculated the sum of each row of array and then compared them to find which row is minimum sum and which is maximum sum. But my program sometimes give correct output but sometime wrong. Where is the issue? please help me. Thanks #include <iostream>...

PHP/mySQL results not being displayed [duplicate]

This question already has an answer here: mysql_fetch_array() expects parameter 1 to be resource (or mysqli_result), boolean given 29 answers I keep getting a warning on my server after trying to do a search it says: Warning: mysql_fetch_array() expects parameter 1 to be resource, boolean given inform/search.php on line...

How do I combine Facet and FilterQueries using Spring data Solr?

Is it possible to combine a facet and field query in spring data solr? Something that would build a query like this: > http://localhost:8983/solr/myCore/select?q=lastName%3AHarris*&fq=filterQueryField%3Ared&wt=json&indent=true&facet=true&facet.field=state In other words, how do I add FilterParameters to a SimpleFacetQuery? Any/all replies welcome, thanks in advance, -- Griff...

trying to extract a string from a js this keyword

I'm trying to execute javascript after a link is clicked before loading the link, using this code: $('body').on("click", 'a', function (evt) { evt.preventDefault(); console.log(this); $('.content').addClass('hide'); if ("AV") > 0) { $('#AVheader').addClass('fullwidth'); } setTimeout(function () {, "_self"); }, 500); }); My errors are: Isn't a function and, "_self");...

Fuzzy search not working with dismax query parser

There is a field in my schema 'fullText' which is of the 'text_en' type, and multivalued. The term 'tests' is in the fullText field in one document. In solr, when I try to search using the word 'test', with the standard lucene parser with minimal distance 1, its returning the...

Mysql Fulltext search returns empty result while there are 100+ rows

The query I use to get rows: SELECT * FROM `sentence` WHERE MATCH(text) AGAINST('hello') Mysql returns empty result when I run this query. However if I use LIKE keyword to get rows SELECT * FROM `sentence` WHERE text LIKE '%hello%' Then, Mysql returns 144 rows. And lets come to a...

CoreSpotlight indexing not working

I am using the CoreSpotLight api to index some content. For some reason I am not able to find the data when I search in the SpotLight. let atset:CSSearchableItemAttributeSet = CSSearchableItemAttributeSet() atset.title = "Simple title" atset.contentDescription = "Simple twitter search" let item = CSSearchableItem(uniqueIdentifier: "id1", domainIdentifier: "", attributeSet: atset) CSSearchableIndex.defaultSearchableIndex().indexSearchableItems([item])...

Search code in C# .NET MVC is not working

Im trying to use the following code to get a list with the results of a search from another list. Heres what I've got: public ActionResult MedicosList(String order,String Search_Data) { var medicoEntity = new MedsEntities(); var lolo = from stu in medicoEntity.Medico select stu; System.Diagnostics.Debug.WriteLine("NO HAY D:"); { lolo =...

Search box/field design with multiple search locations

Not sure if this question is better suited for a different StackExchange site but, here goes: I have a search page that searches a number of different type of things. All (at the moment) requiring a different input field for each type of search. For example, one might search for...

solrcloud - choosing cores for update and search requests

I have a SolrCloud with one collection configured with compositeId and numShards=3 and replicationFactor=2. there will be about 200K inserts a day and about as many searches. from the SolrCloud documentation: "If the machine is a replica, the document is forwarded to the leader for processing." Does this means that...

How can I add custom search engine to browser?

Basically what I'm trying to do is to be able to add my own "search engine" (based on PHP and Mod_rewrite) to any browser automatically. Somewhere on the Internet I found that I need to declare a search provider. How can I do this? <link>, manifest or some JavaScript?

How can I remove from browsers?

whenever I open my browsers, tab is added to the browser. I try to remove in all of my browsers(IE,FF,chrome,Opera) by registry in run > regedit, about:config in firefox, changing home page and search engine defaults and reset Internet Explorer(and reinstall it from "Turn Windows features on or...

Can anyone help me make the search bar work as I now have the JS prompt? [on hold]

I have created a small program that pulls from the YouTube API which allows you to search for a random video for whatever title you enter when prompted. My goal is to have this work like a search engine. I would like to make my search bar the input instead...

Lucene vs Solr, indexning speed for sampe data

I have worked upon Lucene before and now moving towards Solr. The problem is that I am not able to do Indexing on Solr as fast as Lucene can do. My Lucene Code: public class LuceneIndexer { public static void main(String[] args) { String indexDir = "/home/demo/indexes/index1/"; IndexWriterConfig indexWriterConfig =...

Search barre php+mysql “Page not found”

I've a small issue with this search barre : When I search for the Author or the Session, the search is succesful and everything is fine. But if I search anything in the last box (the Name one) my page is redirected to the site homepage. (with "Page not found"...

How to make a big form?

I want to make a big form with 9 fields of search (html with php/sql). Today, I use Case When Then. But with 9 fields, there are many combinations For example, when I search for 5 fields. I get back the field if there (Null if there is no) and...

Searching for a sentence in a file java [closed]

I am really stuck up with this. I am having an input file say input.txt. content of input.txt is Using a musical analogy, hardware is like a musical instrument and software is like the notes played on that instrument. Now I want to search the text like a musical instrument...

Recursive solution doesn't iterate correctly

I'm working through a toy problem in Ruby: how to produce all possible 10-digit phone numbers where each successive number is adjacent to the last on the keypad. I've represented the adjacent relationships between numbers, and have a recursive function, but my method isn't iterating through the whole solution space....

PHP sum echo result on the fly (difficult)

This is a difficult one but easy for professionals. I have in mysql the fields: product name, mycost, sellprice and stock. Because I need to know how much I have in investment in my store by each product I created the follow on the fly calculation and is working fine:...

Solr 5.1.0 - Apache TikaEntityProcessor Cannot Find My Files

Solr, more specifically Tika, is having some problems finding my file whose filepath is retrieved from a database. Whenever I go to index it logs errors saying that this can't find the file. I'm basically doing what this guy is doing here, which is taking a file path from a...

SOLR - highlight searching text ? Is this possible

I'm beginning with SOLR so please don't flame me if this question is stupid or something like this. I was reading solr documentation and found out that there is something called "highlight". I have really simple query: /select?q=text:test&wt=json&indent=true text is a field in my index and I'm trying to highlight...

Pagination with PDO MySQL Search Multiple Form Fields

I am having a heck of a time getting this script to work. I have two search fields. That is the reason for the different queries. I can't get the pagination to work with the search script. The pagination alwasy brings back all the records and I get a blank...