solr , Custom field class in SOLR

Custom field class in SOLR


Tag: solr

SOLR allows use to define a field type using settings such as:

<fieldtype name= "text_th" class = "solr.TextField" />

Here the class is an in-built SOLR class. Are we allowed to define our own class and use it in a fieldType definition? I would like to have a class such as MyCompany.MyClass{ String name, int age, float salary }

This is so that the value that I store in the SOLR document is a composite value made up of 3 components. If this is allowed, are there any guidelines or defined practices for such classes such as how to develop a tokenizer for such a custom class, etc.?




Yes, you can create your own FieldType - there should however be very specific reasons to do so. The Solr backend is Lucene, so it'll be your own responsibility to marshal content between the Solr representation and the backing Lucene implementation.

Possible starting points could be the implementation of StrField in Solr (which is just a simple field in Lucene as well), or far more advanced examples such as LatLonType and PointField.

Be aware that the reason to create a new field type should be to express something that isn't possible with the currently available field types (or greatly simplify handling of such values). Remember that you might have to concern yourself with how sorting should work, how filtering should work, etc., and syntaxes for all these items.

Usually you're far better off (and an actually maintainable solution) by creating a separate collection, or by indexing your content in more than one way. Custom field types (on the Lucene / Solr level) is almost never the answer.


Django-Haystack with Solr: Searching by page description meta tags

I've been digging around and can't seem to find a way to create a search index for the page description meta tags using Haystack and Solr. Does anyone have experience with this, or any tips? I have looked at the page model in cms, but can't figure out how to...

dse cassandra solr doesnt return _uniqueKey in response

Im using Datastax 4.6. My solr client queries data by using _uniqueKey. From version 4.6 the limitation about using simple primary key is removed. How can i configure solr or create table in cassandra, so that I receive in solr response information about synthetic key _uniqueKey. There is no problem...

Solr : stemming in a live cluster (reindexing issues)

I have a live Solr cluster where stemming was not enabled and my schema.xml looks like this: .. <field name="Searchable_Text" type="text_general" indexed="true" stored="true" multiValued="false"/> .. <field name="text" type="text_general" indexed="true" stored="false" multiValued="true"/> .. <copyField source="Searchable_Text" dest="text" maxChars="3000"/> .. <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer...

Solr 5.1.0: How to set the unique key via Schema API

In Solr 5.1.0, is it possible to set the unique key via the REST schema api? I created a collection with the data driven schema. Solr would guess what the field type and create the field based on the data I upload. I can still define fields beforehand by sending...

Subentity SolrEntityProcessor stops working since SolR 5.x

I use a data import like this <dataConfig> <document name="products"> <entity name="outer" dataSource="my_datasource" pk="id" query="..." deltaQuery="..." deltaImportQuery="..." > <entity name="solr" processor="SolrEntityProcessor" url="${}" query="Xid:${outer.Xid}" rows="1" fl="Id,FieldA,FieldB" wt="javabin" /> </entity> </document> </dataConfig> The interesting part is the sub entity, which uses SolrEntityProcessor. Until (including) SoLR 4.10 everything...

Solr 5.1.0 - Apache TikaEntityProcessor Cannot Find My Files

Solr, more specifically Tika, is having some problems finding my file whose filepath is retrieved from a database. Whenever I go to index it logs errors saying that this can't find the file. I'm basically doing what this guy is doing here, which is taking a file path from a...

can solr find all of the terms of a field of a document?

solr uses inverted index to find the document from the indexed "terms". but what I wonder is that - is there any approach to know all of the terms which refer to a specific documents? thanks...

Solr splits a field containing a URL when copying from destination to a copyfield

I'm using Solr 4.5.1 and i have these two fields indexed in solr : schema.xml <field name="event_id" type="custom_string" indexed="true" stored="true" /> <field name="text" type="text_fr" indexed="true" multiValued="true" stored="true"/> <copyField source="event_id" dest="text"/> <fieldType name="text_fr" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <!-- normalisation des accents,...

How do I combine Facet and FilterQueries using Spring data Solr?

Is it possible to combine a facet and field query in spring data solr? Something that would build a query like this: > http://localhost:8983/solr/myCore/select?q=lastName%3AHarris*&fq=filterQueryField%3Ared&wt=json&indent=true&facet=true&facet.field=state In other words, how do I add FilterParameters to a SimpleFacetQuery? Any/all replies welcome, thanks in advance, -- Griff...

Still seeing old shard after calling SPLITSHARD

I called splitshard, and now this is what I see even after posting a commit: I thought splitshard was supposed to get rid of the original shard, shard1, in this case. Am I missing something? I was expecting the only two remaining shards to be shard1_0 and shard1_1. The REST...

Understanding Apache Lucene's scoring algorithm

I'm working with Hibernate Search for months now, but still I'm not able to digest the relevance it brings. I'm overall satisfied with the results it returns, but even simplest test does not satisfy my expectation. First test was using the term frequency(tf). Data: word word word word word word...

Heap memory Solr and Elasticsearch

I'm just reading the book Mastering Apache Solr and the writer recommends to set the minimum heap size (-Xms) to 2GB and the maximum heap size (-Xmx) to 12GB. Is 2GB necessary? I just use a 512MB server (which is low, I know) for Solr and I found it already...

Lucene vs Solr, indexning speed for sampe data

I have worked upon Lucene before and now moving towards Solr. The problem is that I am not able to do Indexing on Solr as fast as Lucene can do. My Lucene Code: public class LuceneIndexer { public static void main(String[] args) { String indexDir = "/home/demo/indexes/index1/"; IndexWriterConfig indexWriterConfig =...

Solr boost direct match over fuzzy match

Let's say I have a query like this: text_data:(Apple OR Apple~2) How do I know what boost value to provide to give the direct match a clear priority over the fuzzy match?...

Fields in apache solr response are multivalued when they should be singular

I'm experiencing a problem with Apache Solr where I'm receiving fields wrapped in lists in JSON responses but they should be singular. Here is an exerpt from schema.xml, two example fields giving me a problem are django_ct and django_id: <fields> <!-- general --> <field name="id" type="string" indexed="true" stored="true" multiValued="false" required="true"/>...

Partially indexing Cassandra table with SOLR

One of the tables inside our Cassandra (DSE 4.7) Cluster contains south of 15 billion records. With the number of servers we have - it would be impossible to index them all with Solr. So, is it possible to somehow index the data partially/sample and/or start indexing and then "pause"...

SOLR QueryElevationComponent for Multi-tenant Support

Newbie question so please be nice. :) Basically we need to implement editorial boosting for a multi-tenant SOLR environment wherein a pre-defined query from a user would always bring a certain set of documents at the top of the results. A couple of challenges we have include: Given a single...

Solr 4.10.2 MySQL import fails with

I'm trying to migrate a server with Solr 4.7.2 on it. I have a Solr 4.10.2 with 4 cores running which is the new machine. I have an importer running on the old machine that poses no problem. However, when trying to run the importer on the new machine, I...

How to store the file path of an indexed document in Apache Solr 5.1.0

I'm trying to store the file path of an locally stored indexed document in Apache Solr so I can then update the index with metadata that is stored in a DB in MySQL. That file path is how I'm going to relate the document to its corresponding metadata I already...

What indexer do I use to find the list in the collection that is most similar to my list?

Lets say I have my list of ingredients: {'potato','rice','carrot','corn'} and I want to return lists from a database that are most similar to mine: {'beans','potato','oranges','lettuce'}, {'carrot','rice','corn','apple'} {'onion','garlic','radish','eggs'} My query would return this first: {'carrot','rice','corn','apple'} I've used Solr, and have looked at CloudSearch, ElasticSearch, Algolia, Searchify and Swiftype. These engines only...

Solr Indexing in Storm topology vs Hbase NG Indexer

I am working on designing the Data Indexing feature into Solr. We are using Storm Topology and have a Hbase Bolt where it is adding data into Hbase. The requirement is what ever data we are adding into Hbase, needs to be indexed as well. The following are the options:...

Using schema.xml with Solr

I am trying to use schema.xml with the latest version of Solr (5.1.0). It seems that by default Solr 5.1.0 uses managed schema, but I would like to use schema.xml for a specific collection. So I create a new collection (using solr create -c my_collection on windows and copy schema.xml...

Connection refused when trying to access SOLR instance running in boot2docker on windows

I pulled this SOLR docker image and then followed the instructions to run it. docker run -d -p 8983:8983 -t makuk66/docker-solr Typing in docker ps yielded 1197d246f0e3 makuk66/docker-solr:latest "/bin/bash -c '/opt/ 50 minutes ago Up 50 minutes>8983/tcp suspicious_sinoussi So I know it's running. In order to connect to it...

Developing a search and tag heavy website

I'm in the planning phase of developing a very tag heavy website. Everything will essentially be associated with tags and the entire site would be based on searching these tags. Now, I've been thinking a lot about going the nosql route here, since from what I read and understand, it...

Solr custom UpdateRequestProcessorFactory fails with “Error Instantiating UpdateRequestProcessorFactory”

I have a custom class extending UpdateRequestProcessorFactory doing some work on a document when it gets added to the index. This was working fine in v4.10.3 in standalone Solr. I moved to SolrCloud v5.2 and it throws this error when adding the Collection (node): ERROR - 2015-06-14 12:25:11.071; [ docs_shard1_replica1]...

How does ReversedWildcardFilterFactory speed up wildcard searches?

The Solr docs say: solr.ReversedWildcardFilterFactory A filter that reverses tokens to provide faster leading wildcard and prefix queries. Add this filter to the index analyzer, but not the query analyzer. The standard Solr query parser will use this to reverse wildcard and prefix queries to improve performance... How does it...

Fuzzy search not working with dismax query parser

There is a field in my schema 'fullText' which is of the 'text_en' type, and multivalued. The term 'tests' is in the fullText field in one document. In solr, when I try to search using the word 'test', with the standard lucene parser with minimal distance 1, its returning the...

Is it possible to index views in Apache Solr

Let me first give you an example. I have two tables -table1 and table2. table1 has a field id_table2, which is a foreign key and references one of the fields in table2. So, when I want to scan table1, I make a query like: SELECT t1.attr_1_, t1.attr_2_, t2.attr_3_ FROM table1...

Solr date variable resolver is not working with MySql

I have used Solr 3.3 version as Data Import Handler(DIH) with Oracle. Its working fine for me. Now I am trying the same with Mysql. With the change in database, I have changed the query used in data-config.xml for MySql. The query has variables which are passed url in http....

How to index documents with their metadata in a DB using Solr 5.1.0

I'm using Apache Solr to index documents for a search engine. These documents are stored locally on my file system. In order to do a faceted search I also have to include these documents meta-data which is stored in a MySQL DB. Is there a way to simultaneously index these...

Solrcloud multicore configuration

I have a standalone Solr instance with 4 different cores working fine using the embedded Jetty server. I configured the cores for v4.10.3 but since I moved to v5.1 and all seems to work fine without any changes. Before going into production, I need to set it up as a...

How to use all the cores of Solr in solrj

I have downloaded solr 5.2.0 and have started using $solr_home/bin/solr start The Logs stated: Waiting to see Solr listening on port 8983 [/] Started Solr server on port 8983 (pid=17330). Happy searching! Then I visited http://localhost:8983/solr and created a new core using Core Admin / new Core as Core1 (...

Heap size issue on migrating from Solr 5.0.0 to Solr 5.1.0

I have a Solr 5.0.0 in production with a custom heap size like this SOLR_JAVA_MEM="-Xms2g -Xmx2g" When I tried to migrate to Solr 5.1.0 with the same configuration and start the server it returned a OutOfMemoryError. Looking to the Solr API I saw that the heap size was set to...

Rails4 + sunspot search

I am trying to use sunspot solr for searching with Rails 4 and mysql. I defined a searchable block in my model(eg XYZ): searchable do text :name, :stored => true string :id, :stored => true end I just want to search in "name". The "id" is the primary key. There...

Solr Cloud Managed Resources

I am implementing Solr Cloud for the first time. I've worked with normal Solr and have that down pretty well, but I'm not finding a lot on what you can and can't do with Solr Cloud. So my question is about Managed Resources. I know you can CRUD stop words...

SOLR - highlight searching text ? Is this possible

I'm beginning with SOLR so please don't flame me if this question is stupid or something like this. I was reading solr documentation and found out that there is something called "highlight". I have really simple query: /select?q=text:test&wt=json&indent=true text is a field in my index and I'm trying to highlight...

solrException. XML parser doesn't support XInclude option

After configuring solr4.7.2 with tomcat 7, got the error in solrAdmin page stating SolrCore Initialization Failures fran92:org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: XML parser doesn't support XInclude option My solr.xml file contains one core <?xml version="1.0" encoding="UTF-8" ?> <solr persistent="true"> <cores host="${host:}" adminPath="/admin/cores" hostContext="${hostContext:solr}"> <core config="solrconfig.xml" name="fran92" instanceDir="generic" schema="schema.xml"...

Apache Solr Exception

Hello I am trying to run Solr on a Tomcat and have an exception like org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: directory '/var/lib/solr/data/index' does not exist Maybe anyone has some trouble like I do?...

SOLR to ignore some terms of a phrase

Is there a way to tell SOLR to search for (for example) 80% of the phrase "term1 term2 term3 term4" will yeild documents with at least 3 terms. Extra question - if such logic exists - will it work with proximity : "term1 term2 term3 term4"~15 specifically, tried to do...

Modelling a leaderboard/ranking list in Riak

I am currently researching databases for a scalable game backend. Riak looks very nice from an operational point of view. I can easily see how to model user and game data and statistics in Riak. But I have trouble with leaderboards/ranking lists. Assuming we have millions of players and the...

How to add multiple suggesters definition in solr search components

I am using solr 5.1. I am trying to configure multiple suggester definition in Solr search component according to Apache solr wiki. I have configured single suggester perfectly and it works perfect but whenever I try to configure multiple suggester it gives me following errors java.lang.NullPointerException at org.apache.solr.handler.component.SearchHandler.handleRequestBody( at org.apache.solr.handler.RequestHandlerBase.handleRequest(

How can I sort by realtime score in solr?

Now I have a solr collection: question question has some field: id answer_count created_at updated_at now I have the sort rule: score = answer_count * 100 - (the hours now to created_at) * 5 then I need to sort by the score desc. how can i do that because of...

DSpace error with oai import

After configuring my DSpace server, its working correctly but when I look at the OAI identify page ( so we can be harvested, it says that the repository is localhost instead of my URL. I investigated and found out that to update this, I have to run this command: dspace/bin/dspace...

Getting application/json back from a Solr query

I'm calling the Solr REST api using a Jersey client: final ClientResponse resp = client().path(queryPath()) .queryParam("q", query.getQuery()) .queryParam("wt", "json") .accept(MediaType.APPLICATION_JSON_TYPE) .get(ClientResponse.class); resp.getEntity(HttpResponse.class) and when I run it I get: A message body reader for Java class challenger.HttpResponse, and Java type class challenger.HttpResponse, and MIME media type text/plain; charset=UTF-8 was not...

TYPO3 Solr extension and facets

I have a small question about TYPO3 solr facets.At present in my website I have 6 different indexing configuration available. Two of them are custom extension table's and one is tt_news and rest of the 3 are pages table with some custom condition. I managed to add this using additionalWhereClause...

solrcloud - choosing cores for update and search requests

I have a SolrCloud with one collection configured with compositeId and numShards=3 and replicationFactor=2. there will be about 200K inserts a day and about as many searches. from the SolrCloud documentation: "If the machine is a replica, the document is forwarded to the leader for processing." Does this means that...

Solr: Retrieve non-stored fields from external data source

I'm currently working on a project on which I would like to index several data sources (Oracle and HBase) into Solr for full text search. Additionally, I want to be able to visualize the data I index into Solr. I'm still evaluating on whether to use Banana or Hue for...

solr bin/post - specify a document ID

I am quite new to solr as such, and have set up everything as per the example, and it all works fine. However, I have one nagging issue, for which I do not seem to find a solution for. So, normally, I do the following using the SimplePostTool and it...

Data import in solr from multiple entities

Currently i have a Solr core, which is importing data from multiple entities, i.e 2 different MySQL tables. I have to import data in the same core through 3rd entity which is another core in the same Solr Database. I found a documentation on many different sites which were guiding...

How to add individual objects to django haystack?

I have a search index that I have created using Solr. I want to add individual django objects to the search index. To remove objects from the solr database we use remove_object. some = SomFooModel.objects.get(pk=1) foo = FooIndex() foo.remove_object(some) #This works To add it, is there something like add_object or...