search,solr,lucene,full-text-search,hibernate-search , Understanding Apache Lucene's scoring algorithm


Understanding Apache Lucene's scoring algorithm

Question:

Tag: search,solr,lucene,full-text-search,hibernate-search

I'm working with Hibernate Search for months now, but still I'm not able to digest the relevance it brings. I'm overall satisfied with the results it returns, but even simplest test does not satisfy my expectation.

First test was using the term frequency(tf). Data:

Results I get:

  1. word
  2. word word word word
  3. word word word word word
  4. word word word word word word
  5. word word
  6. word word word

I'm really confused with this scoring effect. My Query is quite complex, but as this test did not have any other field involved, it can be simplified as below: booleanjunction.should(phraseQuery).should(keywordQuery).should(fuzzyQuery)

I've analyzers as below:

 StandardFilterFactory
 LowerCaseFilterFactory
 StopFilterFactory
 SnowballPorterFilterFactory for english

My Explanation object https://jsfiddle.net/o51kh3og/


Answer:

Scoring calculation is something really complex. Here, you have to begin with the primal equation:

score(q,d) = coord(q,d) · queryNorm(q) · ∑ ( tf(t in d) · idf(t)2 · t.getBoost() · norm(t,d) )

As you said, you have tf which means term frequency and its value is the squareroot of the frequency of the term.

But here, as you can see in your explanation, you also have norm (aka fieldNorm) which is used in fieldWeight calculation. Let's take your example:

eklavya eklavya eklavya eklavya eklavya

4.296241 = fieldWeight in 177, product of:
  2.236068 = tf(freq=5.0), with freq of:
    5.0 = termFreq=5.0
  4.391628 = idf(docFreq=6, maxDocs=208)
  0.4375 = fieldNorm(doc=177)

eklavya

4.391628 = fieldWeight in 170, product of:
  1.0 = tf(freq=1.0), with freq of:
    1.0 = termFreq=1.0
  4.391628 = idf(docFreq=6, maxDocs=208)
  1.0 = fieldNorm(doc=170)

Here, eklavya has a better score than the other because fieldWeight is the product of tf, idf and fieldNorm. This last one is higher for eklavya document because he only contains one term.

As above documentation said:

lengthNorm - computed when the document is added to the index in accordance with the number of tokens of this field in the document, so that shorter fields contribute more to the score.

The more terms you have in a field, lower fieldNorm will be. Be careful with the value of this field.

So, to conclude, here you have a perfect mix to understand that the score is not calculated only with the frequency but also with the number of term that you have in your field.


Related:


solrcloud - choosing cores for update and search requests


solr,solrcloud
I have a SolrCloud with one collection configured with compositeId and numShards=3 and replicationFactor=2. there will be about 200K inserts a day and about as many searches. from the SolrCloud documentation: "If the machine is a replica, the document is forwarded to the leader for processing." Does this means that...

Get Order Details by Order Id


magento,search,order,magento-1.9
I need to retrieve an order from Magento by its id. How do I load a specific order by id?

How to use all the cores of Solr in solrj


java,indexing,solr,lucene,solrj
I have downloaded solr 5.2.0 and have started using $solr_home/bin/solr start The Logs stated: Waiting to see Solr listening on port 8983 [/] Started Solr server on port 8983 (pid=17330). Happy searching! Then I visited http://localhost:8983/solr and created a new core using Core Admin / new Core as Core1 (...

How do I combine Facet and FilterQueries using Spring data Solr?


spring,solr,filtering,facet
Is it possible to combine a facet and field query in spring data solr? Something that would build a query like this: > http://localhost:8983/solr/myCore/select?q=lastName%3AHarris*&fq=filterQueryField%3Ared&wt=json&indent=true&facet=true&facet.field=state In other words, how do I add FilterParameters to a SimpleFacetQuery? Any/all replies welcome, thanks in advance, -- Griff...

Using StringComparer with StringBuilder to search for a string


c#,string,search,stringbuilder,culture
I need to use globalization rules to search for all occurrences of a string within a document. The pseudocode is: var searchText = "Hello, World"; var compareInfo = new CultureInfo("en-US").CompareInfo; DocumentIterator start = null; // the start position if a match occurs var sb = new StringBuilder(); // the document...

Lucene vs Solr, indexning speed for sampe data


java,indexing,solr,lucene,full-text-search
I have worked upon Lucene before and now moving towards Solr. The problem is that I am not able to do Indexing on Solr as fast as Lucene can do. My Lucene Code: public class LuceneIndexer { public static void main(String[] args) { String indexDir = "/home/demo/indexes/index1/"; IndexWriterConfig indexWriterConfig =...

PHP/mySQL results not being displayed [duplicate]


php,mysql,search,result
This question already has an answer here: mysql_fetch_array() expects parameter 1 to be resource (or mysqli_result), boolean given 29 answers I keep getting a warning on my server after trying to do a search it says: Warning: mysql_fetch_array() expects parameter 1 to be resource, boolean given inform/search.php on line...

Searching for a sentence in a file java [closed]


java,file,search
I am really stuck up with this. I am having an input file say input.txt. content of input.txt is Using a musical analogy, hardware is like a musical instrument and software is like the notes played on that instrument. Now I want to search the text like a musical instrument...

Can anyone help me make the search bar work as I now have the JS prompt? [on hold]


javascript,html5,search,youtube-api,search-engine
I have created a small program that pulls from the YouTube API which allows you to search for a random video for whatever title you enter when prompted. My goal is to have this work like a search engine. I would like to make my search bar the input instead...

Elasticsearch advanced search


ruby-on-rails,ruby,search
I find Elasticsearch gem and want to use it. For example I have: Method in app/controller/search_controller.rb: def search if params[:q].nil? @articles = [] else @articles = Article.search params[:q] end end View at app/views/search/search.html.erb: <h1>Articles Search</h1> <%= form_for search_path, method: :get do |f| %> <p> <%= f.label "Search for" %> <%=...

Solr custom UpdateRequestProcessorFactory fails with “Error Instantiating UpdateRequestProcessorFactory”


java,solr,lucene,config,solrcloud
I have a custom class extending UpdateRequestProcessorFactory doing some work on a document when it gets added to the index. This was working fine in v4.10.3 in standalone Solr. I moved to SolrCloud v5.2 and it throws this error when adding the Collection (node): ERROR - 2015-06-14 12:25:11.071; [ docs_shard1_replica1]...

PHP sum echo result on the fly (difficult)


php,html,mysql,forms,search
This is a difficult one but easy for professionals. I have in mysql the fields: product name, mycost, sellprice and stock. Because I need to know how much I have in investment in my store by each product I created the follow on the fly calculation and is working fine:...

Solr 4.10.2 MySQL import fails with java.io.EOFException


mysql,solr
I'm trying to migrate a server with Solr 4.7.2 on it. I have a Solr 4.10.2 with 4 cores running which is the new machine. I have an importer running on the old machine that poses no problem. However, when trying to run the importer on the new machine, I...

Mysql Fulltext search returns empty result while there are 100+ rows


mysql,search,full-text-search
The query I use to get rows: SELECT * FROM `sentence` WHERE MATCH(text) AGAINST('hello') Mysql returns empty result when I run this query. However if I use LIKE keyword to get rows SELECT * FROM `sentence` WHERE text LIKE '%hello%' Then, Mysql returns 144 rows. And lets come to a...

How to index plain text files for search in Sphinx


search,sphinx,plaintext
I scanned dozens of articles and forum threads, looked through official documentation, but couldn't find an answer. This article sounds promising, since is says that The data to be indexed can generally come from very different sources: SQL databases, plain text files, HTML files, but unfortunately as all other articles...

Fuzzy search not working with dismax query parser


solr,lucene
There is a field in my schema 'fullText' which is of the 'text_en' type, and multivalued. The term 'tests' is in the fullText field in one document. In solr, when I try to search using the word 'test', with the standard lucene parser with minimal distance 1, its returning the...

Sql string search


sql,string,search,select
This is my question: Find the capital and the name where the capital includes the name of the country. This is my answer: SELECT name,capital FROM world WHERE capital=concat(name,"NULL") I dont understand what should i do in the "WHERE" part. Should i do capital=concat(name,something?) or IN or LIKE? Can you...

SOLR - highlight searching text ? Is this possible


solr,solrj,solr-highlight
I'm beginning with SOLR so please don't flame me if this question is stupid or something like this. I was reading solr documentation and found out that there is something called "highlight". I have really simple query: /select?q=text:test&wt=json&indent=true text is a field in my index and I'm trying to highlight...

Search code in C# .NET MVC is not working


c#,asp.net-mvc,search
Im trying to use the following code to get a list with the results of a search from another list. Heres what I've got: public ActionResult MedicosList(String order,String Search_Data) { var medicoEntity = new MedsEntities(); var lolo = from stu in medicoEntity.Medico select stu; System.Diagnostics.Debug.WriteLine("NO HAY D:"); { lolo =...

ElasticSearch- “No query registered for…”


search,indexing,elasticsearch
ElasticSearch returns me "No query registered for [likes_count]" error when trying to look up entries using the following query. The field likes_count is a new field of documents and does not exist in every document. The same query works without the sort part. Why does this error appear? Thanks {...

Searching a TextField and IntField together seperated by an AND condition In Lucene


java,search,indexing,lucene
I have indexed my documents as: doc.add(new IntField("ID", id, Field.Store.YES)); doc.add(new TextField("First_Name", First_Name, Field.Store.YES)); doc.add(new TextField("Last_Name", Last_Name, Field.Store.YES)); doc.add(new TextField("Address", add, Field.Store.YES)); doc.add(new TextField("City", city, Field.Store.YES)); doc.add(new TextField("State", state, Field.Store.YES)); doc.add(new IntField("Zip_Code", zip, Field.Store.YES)); Where id, FirstName, city, add, state, zip are variables that store the values to be indexed....

Search box/field design with multiple search locations


python,search,design,search-engine,pyramid
Not sure if this question is better suited for a different StackExchange site but, here goes: I have a search page that searches a number of different type of things. All (at the moment) requiring a different input field for each type of search. For example, one might search for...

Search barre php+mysql “Page not found”


php,mysql,search,prepared-statement
I've a small issue with this search barre : When I search for the Author or the Session, the search is succesful and everything is fine. But if I search anything in the last box (the Name one) my page is redirected to the site homepage. (with "Page not found"...

Assigning value to Search Bar Delegate - Swift


ios,swift,uitableview,search
So I was following this link for implementing a Search Bar in my Table View. I had to make a few changes because my cells contained Objects with different values and data types and so I had to adapt the Search function accordingly. I'm pretty sure I have most of...

Getting application/json back from a Solr query


java,json,solr,jersey,jersey-client
I'm calling the Solr REST api using a Jersey client: final ClientResponse resp = client().path(queryPath()) .queryParam("q", query.getQuery()) .queryParam("wt", "json") .accept(MediaType.APPLICATION_JSON_TYPE) .get(ClientResponse.class); resp.getEntity(HttpResponse.class) and when I run it I get: A message body reader for Java class challenger.HttpResponse, and Java type class challenger.HttpResponse, and MIME media type text/plain; charset=UTF-8 was not...

Recursive solution doesn't iterate correctly


ruby,algorithm,search,recursion
I'm working through a toy problem in Ruby: how to produce all possible 10-digit phone numbers where each successive number is adjacent to the last on the keypad. I've represented the adjacent relationships between numbers, and have a recursive function, but my method isn't iterating through the whole solution space....

Solr 5.1.0: How to set the unique key via Schema API


solr,schema,unique-key
In Solr 5.1.0, is it possible to set the unique key via the REST schema api? I created a collection with the data driven schema. Solr would guess what the field type and create the field based on the data I upload. I can still define fields beforehand by sending...

If statement for search field in Rails


jquery,ruby-on-rails,search,if-statement
I have a blog site and recently added a search bar for visitors to browse through blog posts. When used, all of the posts go away in the directory and only those which return via the search appear. In order to show the general directory and view all, I would...

How to skip a row with file exists condition in laravel


laravel,search,eloquent
This is for a search query based on many input fields, i'm doing if statements inside the query based on the inputs, for example : $query = Model::all(); if($field = Input::get('field')) $query->where('column_name', $field); but what i want to do also is a condition to skip a row if there is...

CoreSpotlight indexing not working


ios,swift,search,ios9,corespotlight
I am using the CoreSpotLight api to index some content. For some reason I am not able to find the data when I search in the SpotLight. let atset:CSSearchableItemAttributeSet = CSSearchableItemAttributeSet() atset.title = "Simple title" atset.contentDescription = "Simple twitter search" let item = CSSearchableItem(uniqueIdentifier: "id1", domainIdentifier: "com.shrikar.twitter.search", attributeSet: atset) CSSearchableIndex.defaultSearchableIndex().indexSearchableItems([item])...

How can I remove delta-homes.com from browsers?


search,registry,malware
whenever I open my browsers, delta-homes.com tab is added to the browser. I try to remove delta-homes.com in all of my browsers(IE,FF,chrome,Opera) by registry in run > regedit, about:config in firefox, changing home page and search engine defaults and reset Internet Explorer(and reinstall it from "Turn Windows features on or...

KQL - Ignoring items with property not equal to value


search,sharepoint-2013
I have to configure the site search so that it does not include items wich have a property of ModerationStatus != 1. I found out that using a query like ModerationStatus <> 1 can probably solve my problem, but I am not sure if it will work in my environment...

Pagination with PDO MySQL Search Multiple Form Fields


php,mysql,search,pdo,pagination
I am having a heck of a time getting this script to work. I have two search fields. That is the reason for the different queries. I can't get the pagination to work with the search script. The pagination alwasy brings back all the records and I get a blank...

Treeview search results flat list


c#,search,treeview
How can I make the list on the right populate with only items which meet the search criteria? I'm not asking for the literal code necessarily, but just some general guidance on how to do so. I've already written the code to populate the list on the left with C#,...

How can I sort by realtime score in solr?


solr
Now I have a solr collection: question question has some field: id answer_count created_at updated_at now I have the sort rule: score = answer_count * 100 - (the hours now to created_at) * 5 then I need to sort by the score desc. how can i do that because of...

grep first n rows, return file name only


regex,linux,search,grep
I can do the following to search for what I need and return the file name: grep -l "mysearchstring" ./*.xml However the files I am searching are huge so this takes forever. The string I am searching will appear in the first 200 rows so how can I search only...

Swift ios relational picker views and apple dev guidelines


ios,xcode,swift,uitableview,search
Right now I have a picker view that shows up when you press a label, and after you have selected anything from the picker view and hit done it will hide and the label will change to the value you selected. But I want to implement another picker view, and...

How can I add custom search engine to browser?


search,browser,manifest,provider
Basically what I'm trying to do is to be able to add my own "search engine" (based on PHP and Mod_rewrite) to any browser automatically. Somewhere on the Internet I found that I need to declare a search provider. How can I do this? <link>, manifest or some JavaScript?

How to make a big form?


php,search
I want to make a big form with 9 fields of search (html with php/sql). Today, I use Case When Then. But with 9 fields, there are many combinations For example, when I search for 5 fields. I get back the field if there (Null if there is no) and...

Solr 5.1.0 - Apache TikaEntityProcessor Cannot Find My Files


mysql,solr,tika
Solr, more specifically Tika, is having some problems finding my file whose filepath is retrieved from a database. Whenever I go to index it logs errors saying that this can't find the file. I'm basically doing what this guy is doing here, which is taking a file path from a...

Is it possible to index views in Apache Solr


sql,view,solr
Let me first give you an example. I have two tables -table1 and table2. table1 has a field id_table2, which is a foreign key and references one of the fields in table2. So, when I want to scan table1, I make a query like: SELECT t1.attr_1_, t1.attr_2_, t2.attr_3_ FROM table1...

SQL find same value on multiple filelds with like operator


mysql,sql,search,like,sql-like
I have this records from my users table: user_id first_name last_name gender email ******* ********** ********* ****** ***** 229 Natalie Fern F [email protected] and I want to search same First Name & Last Name from first_name OR last_name. I have created sql query but not getting record. SELECT * FROM...

How do I make a query search in rails case insensitive?


ruby-on-rails,postgresql,ruby-on-rails-4,search
I have this search method in my user.rb model def self.search(query) where("description like ?", "%#{query}%") end but my search is case sensitive. I want to make it case insensitive. How do I do that? I'm hoping it's a quick fix. Thanks...

trying to extract a string from a js this keyword


javascript,search,this
I'm trying to execute javascript after a link is clicked before loading the link, using this code: $('body').on("click", 'a', function (evt) { evt.preventDefault(); console.log(this); $('.content').addClass('hide'); if (this.search("AV") > 0) { $('#AVheader').addClass('fullwidth'); } setTimeout(function () { window.open(this, "_self"); }, 500); }); My errors are: this.search Isn't a function and window.open(this, "_self");...

Understanding Apache Lucene's scoring algorithm


search,solr,lucene,full-text-search,hibernate-search
I'm working with Hibernate Search for months now, but still I'm not able to digest the relevance it brings. I'm overall satisfied with the results it returns, but even simplest test does not satisfy my expectation. First test was using the term frequency(tf). Data: word word word word word word...

sum of rows maximum and minimum is not correct. whats the issue?


c++,arrays,search
I have a C program. I calculated the sum of each row of array and then compared them to find which row is minimum sum and which is maximum sum. But my program sometimes give correct output but sometime wrong. Where is the issue? please help me. Thanks #include <iostream>...

How can i search multiple words in a file on Notepad++


search,notepad++
I'm trying to find a list of Strings in a file (already formated from \n and \r), for example : 0145100841 65722ED01A 657738J000 6584640F00 // this one is found in a file of this : 201506186584640F00AME NMGR01 RUBAT How can i do this in Notepad++ ? Is there an existing...

Rails4 + sunspot search


mysql,ruby-on-rails,solr,sunspot
I am trying to use sunspot solr for searching with Rails 4 and mysql. I defined a searchable block in my model(eg XYZ): searchable do text :name, :stored => true string :id, :stored => true end I just want to search in "name". The "id" is the primary key. There...

VB.Net - How to dynamicaly search for a string in all TreeView nodes expanding and collapsing nodes matching (or not) the search string?


vb.net,search,treeview,collapse,expand
I´m trying to implement dynamic search on a treeview component, and I´m almost done with it, except that since it´s a dynamic search based on the textchanged event of a textbox, the first characters of the search string are always found, so the search function expand all nodes because they...