solr,dataimporthandler,solr5 , Subentity SolrEntityProcessor stops working since SolR 5.x

Subentity SolrEntityProcessor stops working since SolR 5.x


Tag: solr,dataimporthandler,solr5

I use a data import like this

    <document name="products">

The interesting part is the sub entity, which uses SolrEntityProcessor. Until (including) SoLR 4.10 everything worked fine, but since 5.1 it doesn't work anymore. It doesn't fail in the meaning, that it tells me, it failed, but it "successfully" stops during the import of the second document.

In the logs the following Exception appears. It looks like DIH intentionally closes the connection of the SolrEntityProcessor and crashes as soon as it tries to fetch the sub entity for the second document.

java.lang.RuntimeException: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.IllegalStateException: Connection pool shut down
    at org.apache.solr.handler.dataimport.DocBuilder.execute(
    at org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(
    at org.apache.solr.handler.dataimport.DataImporter.runCmd(
    at org.apache.solr.handler.dataimport.DataImporter$
Caused by: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.IllegalStateException: Connection pool shut down
    at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(
    at org.apache.solr.handler.dataimport.DocBuilder.doDelta(
    at org.apache.solr.handler.dataimport.DocBuilder.execute(
    ... 3 more
Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.IllegalStateException: Connection pool shut down
    at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(
    at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(
    at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(
    at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(
    at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(
    ... 5 more
Caused by: java.lang.IllegalStateException: Connection pool shut down
    at org.apache.http.util.Asserts.check(
    at org.apache.http.impl.conn.PoolingClientConnectionManager.requestConnection(
    at org.apache.http.impl.client.DefaultRequestDirector.execute(
    at org.apache.http.impl.client.AbstractHttpClient.doExecute(
    at org.apache.http.impl.client.CloseableHttpClient.execute(
    at org.apache.http.impl.client.CloseableHttpClient.execute(
    at org.apache.http.impl.client.CloseableHttpClient.execute(
    at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(
    at org.apache.solr.client.solrj.impl.HttpSolrClient.request(
    at org.apache.solr.client.solrj.impl.HttpSolrClient.request(
    at org.apache.solr.client.solrj.SolrRequest.process(
    at org.apache.solr.client.solrj.SolrClient.query(
    at org.apache.solr.client.solrj.SolrClient.query(
    at org.apache.solr.handler.dataimport.SolrEntityProcessor.doQuery(
    at org.apache.solr.handler.dataimport.SolrEntityProcessor.buildIterator(
    at org.apache.solr.handler.dataimport.SolrEntityProcessor.nextRow(
    at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(
    ... 8 more


This is a known bug in Solr. Even I have come across! I posted this as an answer because, this is a bug and there's no solution from the author! We have actually downgraded the version in order to get rid of this bug. I am not sure if this helps, but this is what we did at our end.

Update: This may or may not help you, but we resolved the issue by downgrading.


How does ReversedWildcardFilterFactory speed up wildcard searches?

The Solr docs say: solr.ReversedWildcardFilterFactory A filter that reverses tokens to provide faster leading wildcard and prefix queries. Add this filter to the index analyzer, but not the query analyzer. The standard Solr query parser will use this to reverse wildcard and prefix queries to improve performance... How does it...

Fuzzy search not working with dismax query parser

There is a field in my schema 'fullText' which is of the 'text_en' type, and multivalued. The term 'tests' is in the fullText field in one document. In solr, when I try to search using the word 'test', with the standard lucene parser with minimal distance 1, its returning the...

Solr Cloud Managed Resources

I am implementing Solr Cloud for the first time. I've worked with normal Solr and have that down pretty well, but I'm not finding a lot on what you can and can't do with Solr Cloud. So my question is about Managed Resources. I know you can CRUD stop words...

can solr find all of the terms of a field of a document?

solr uses inverted index to find the document from the indexed "terms". but what I wonder is that - is there any approach to know all of the terms which refer to a specific documents? thanks...

How to use all the cores of Solr in solrj

I have downloaded solr 5.2.0 and have started using $solr_home/bin/solr start The Logs stated: Waiting to see Solr listening on port 8983 [/] Started Solr server on port 8983 (pid=17330). Happy searching! Then I visited http://localhost:8983/solr and created a new core using Core Admin / new Core as Core1 (...

solrcloud - choosing cores for update and search requests

I have a SolrCloud with one collection configured with compositeId and numShards=3 and replicationFactor=2. there will be about 200K inserts a day and about as many searches. from the SolrCloud documentation: "If the machine is a replica, the document is forwarded to the leader for processing." Does this means that...

How do I combine Facet and FilterQueries using Spring data Solr?

Is it possible to combine a facet and field query in spring data solr? Something that would build a query like this: > http://localhost:8983/solr/myCore/select?q=lastName%3AHarris*&fq=filterQueryField%3Ared&wt=json&indent=true&facet=true&facet.field=state In other words, how do I add FilterParameters to a SimpleFacetQuery? Any/all replies welcome, thanks in advance, -- Griff...

Data import in solr from multiple entities

Currently i have a Solr core, which is importing data from multiple entities, i.e 2 different MySQL tables. I have to import data in the same core through 3rd entity which is another core in the same Solr Database. I found a documentation on many different sites which were guiding...

SOLR - highlight searching text ? Is this possible

I'm beginning with SOLR so please don't flame me if this question is stupid or something like this. I was reading solr documentation and found out that there is something called "highlight". I have really simple query: /select?q=text:test&wt=json&indent=true text is a field in my index and I'm trying to highlight...

SOLR to ignore some terms of a phrase

Is there a way to tell SOLR to search for (for example) 80% of the phrase "term1 term2 term3 term4" will yeild documents with at least 3 terms. Extra question - if such logic exists - will it work with proximity : "term1 term2 term3 term4"~15 specifically, tried to do...

How to add multiple suggesters definition in solr search components

I am using solr 5.1. I am trying to configure multiple suggester definition in Solr search component according to Apache solr wiki. I have configured single suggester perfectly and it works perfect but whenever I try to configure multiple suggester it gives me following errors java.lang.NullPointerException at org.apache.solr.handler.component.SearchHandler.handleRequestBody( at org.apache.solr.handler.RequestHandlerBase.handleRequest(

dse cassandra solr doesnt return _uniqueKey in response

Im using Datastax 4.6. My solr client queries data by using _uniqueKey. From version 4.6 the limitation about using simple primary key is removed. How can i configure solr or create table in cassandra, so that I receive in solr response information about synthetic key _uniqueKey. There is no problem...

Connection refused when trying to access SOLR instance running in boot2docker on windows

I pulled this SOLR docker image and then followed the instructions to run it. docker run -d -p 8983:8983 -t makuk66/docker-solr Typing in docker ps yielded 1197d246f0e3 makuk66/docker-solr:latest "/bin/bash -c '/opt/ 50 minutes ago Up 50 minutes>8983/tcp suspicious_sinoussi So I know it's running. In order to connect to it...

How to store the file path of an indexed document in Apache Solr 5.1.0

I'm trying to store the file path of an locally stored indexed document in Apache Solr so I can then update the index with metadata that is stored in a DB in MySQL. That file path is how I'm going to relate the document to its corresponding metadata I already...

How can I sort by realtime score in solr?

Now I have a solr collection: question question has some field: id answer_count created_at updated_at now I have the sort rule: score = answer_count * 100 - (the hours now to created_at) * 5 then I need to sort by the score desc. how can i do that because of...

Solr 5.1.0: How to set the unique key via Schema API

In Solr 5.1.0, is it possible to set the unique key via the REST schema api? I created a collection with the data driven schema. Solr would guess what the field type and create the field based on the data I upload. I can still define fields beforehand by sending...

Rails4 + sunspot search

I am trying to use sunspot solr for searching with Rails 4 and mysql. I defined a searchable block in my model(eg XYZ): searchable do text :name, :stored => true string :id, :stored => true end I just want to search in "name". The "id" is the primary key. There...

Django-Haystack with Solr: Searching by page description meta tags

I've been digging around and can't seem to find a way to create a search index for the page description meta tags using Haystack and Solr. Does anyone have experience with this, or any tips? I have looked at the page model in cms, but can't figure out how to...

Getting application/json back from a Solr query

I'm calling the Solr REST api using a Jersey client: final ClientResponse resp = client().path(queryPath()) .queryParam("q", query.getQuery()) .queryParam("wt", "json") .accept(MediaType.APPLICATION_JSON_TYPE) .get(ClientResponse.class); resp.getEntity(HttpResponse.class) and when I run it I get: A message body reader for Java class challenger.HttpResponse, and Java type class challenger.HttpResponse, and MIME media type text/plain; charset=UTF-8 was not...

solrException. XML parser doesn't support XInclude option

After configuring solr4.7.2 with tomcat 7, got the error in solrAdmin page stating SolrCore Initialization Failures fran92:org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: XML parser doesn't support XInclude option My solr.xml file contains one core <?xml version="1.0" encoding="UTF-8" ?> <solr persistent="true"> <cores host="${host:}" adminPath="/admin/cores" hostContext="${hostContext:solr}"> <core config="solrconfig.xml" name="fran92" instanceDir="generic" schema="schema.xml"...

Heap size issue on migrating from Solr 5.0.0 to Solr 5.1.0

I have a Solr 5.0.0 in production with a custom heap size like this SOLR_JAVA_MEM="-Xms2g -Xmx2g" When I tried to migrate to Solr 5.1.0 with the same configuration and start the server it returned a OutOfMemoryError. Looking to the Solr API I saw that the heap size was set to...

Developing a search and tag heavy website

I'm in the planning phase of developing a very tag heavy website. Everything will essentially be associated with tags and the entire site would be based on searching these tags. Now, I've been thinking a lot about going the nosql route here, since from what I read and understand, it...

Solrcloud multicore configuration

I have a standalone Solr instance with 4 different cores working fine using the embedded Jetty server. I configured the cores for v4.10.3 but since I moved to v5.1 and all seems to work fine without any changes. Before going into production, I need to set it up as a...

DSpace error with oai import

After configuring my DSpace server, its working correctly but when I look at the OAI identify page ( so we can be harvested, it says that the repository is localhost instead of my URL. I investigated and found out that to update this, I have to run this command: dspace/bin/dspace...

How to add individual objects to django haystack?

I have a search index that I have created using Solr. I want to add individual django objects to the search index. To remove objects from the solr database we use remove_object. some = SomFooModel.objects.get(pk=1) foo = FooIndex() foo.remove_object(some) #This works To add it, is there something like add_object or...

Partially indexing Cassandra table with SOLR

One of the tables inside our Cassandra (DSE 4.7) Cluster contains south of 15 billion records. With the number of servers we have - it would be impossible to index them all with Solr. So, is it possible to somehow index the data partially/sample and/or start indexing and then "pause"...

Solr 4.10.2 MySQL import fails with

I'm trying to migrate a server with Solr 4.7.2 on it. I have a Solr 4.10.2 with 4 cores running which is the new machine. I have an importer running on the old machine that poses no problem. However, when trying to run the importer on the new machine, I...

Solr date variable resolver is not working with MySql

I have used Solr 3.3 version as Data Import Handler(DIH) with Oracle. Its working fine for me. Now I am trying the same with Mysql. With the change in database, I have changed the query used in data-config.xml for MySql. The query has variables which are passed url in http....

Solr boost direct match over fuzzy match

Let's say I have a query like this: text_data:(Apple OR Apple~2) How do I know what boost value to provide to give the direct match a clear priority over the fuzzy match?...

Solr 5.1.0 - Apache TikaEntityProcessor Cannot Find My Files

Solr, more specifically Tika, is having some problems finding my file whose filepath is retrieved from a database. Whenever I go to index it logs errors saying that this can't find the file. I'm basically doing what this guy is doing here, which is taking a file path from a...

Solr: Retrieve non-stored fields from external data source

I'm currently working on a project on which I would like to index several data sources (Oracle and HBase) into Solr for full text search. Additionally, I want to be able to visualize the data I index into Solr. I'm still evaluating on whether to use Banana or Hue for...

SOLR QueryElevationComponent for Multi-tenant Support

Newbie question so please be nice. :) Basically we need to implement editorial boosting for a multi-tenant SOLR environment wherein a pre-defined query from a user would always bring a certain set of documents at the top of the results. A couple of challenges we have include: Given a single...

Solr custom UpdateRequestProcessorFactory fails with “Error Instantiating UpdateRequestProcessorFactory”

I have a custom class extending UpdateRequestProcessorFactory doing some work on a document when it gets added to the index. This was working fine in v4.10.3 in standalone Solr. I moved to SolrCloud v5.2 and it throws this error when adding the Collection (node): ERROR - 2015-06-14 12:25:11.071; [ docs_shard1_replica1]...

Heap memory Solr and Elasticsearch

I'm just reading the book Mastering Apache Solr and the writer recommends to set the minimum heap size (-Xms) to 2GB and the maximum heap size (-Xmx) to 12GB. Is 2GB necessary? I just use a 512MB server (which is low, I know) for Solr and I found it already...

Subentity SolrEntityProcessor stops working since SolR 5.x

I use a data import like this <dataConfig> <document name="products"> <entity name="outer" dataSource="my_datasource" pk="id" query="..." deltaQuery="..." deltaImportQuery="..." > <entity name="solr" processor="SolrEntityProcessor" url="${}" query="Xid:${outer.Xid}" rows="1" fl="Id,FieldA,FieldB" wt="javabin" /> </entity> </document> </dataConfig> The interesting part is the sub entity, which uses SolrEntityProcessor. Until (including) SoLR 4.10 everything...

How to index documents with their metadata in a DB using Solr 5.1.0

I'm using Apache Solr to index documents for a search engine. These documents are stored locally on my file system. In order to do a faceted search I also have to include these documents meta-data which is stored in a MySQL DB. Is there a way to simultaneously index these...

Lucene vs Solr, indexning speed for sampe data

I have worked upon Lucene before and now moving towards Solr. The problem is that I am not able to do Indexing on Solr as fast as Lucene can do. My Lucene Code: public class LuceneIndexer { public static void main(String[] args) { String indexDir = "/home/demo/indexes/index1/"; IndexWriterConfig indexWriterConfig =...

Still seeing old shard after calling SPLITSHARD

I called splitshard, and now this is what I see even after posting a commit: I thought splitshard was supposed to get rid of the original shard, shard1, in this case. Am I missing something? I was expecting the only two remaining shards to be shard1_0 and shard1_1. The REST...

What indexer do I use to find the list in the collection that is most similar to my list?

Lets say I have my list of ingredients: {'potato','rice','carrot','corn'} and I want to return lists from a database that are most similar to mine: {'beans','potato','oranges','lettuce'}, {'carrot','rice','corn','apple'} {'onion','garlic','radish','eggs'} My query would return this first: {'carrot','rice','corn','apple'} I've used Solr, and have looked at CloudSearch, ElasticSearch, Algolia, Searchify and Swiftype. These engines only...

Using schema.xml with Solr

I am trying to use schema.xml with the latest version of Solr (5.1.0). It seems that by default Solr 5.1.0 uses managed schema, but I would like to use schema.xml for a specific collection. So I create a new collection (using solr create -c my_collection on windows and copy schema.xml...

Solr : stemming in a live cluster (reindexing issues)

I have a live Solr cluster where stemming was not enabled and my schema.xml looks like this: .. <field name="Searchable_Text" type="text_general" indexed="true" stored="true" multiValued="false"/> .. <field name="text" type="text_general" indexed="true" stored="false" multiValued="true"/> .. <copyField source="Searchable_Text" dest="text" maxChars="3000"/> .. <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer...

solr bin/post - specify a document ID

I am quite new to solr as such, and have set up everything as per the example, and it all works fine. However, I have one nagging issue, for which I do not seem to find a solution for. So, normally, I do the following using the SimplePostTool and it...

Solr splits a field containing a URL when copying from destination to a copyfield

I'm using Solr 4.5.1 and i have these two fields indexed in solr : schema.xml <field name="event_id" type="custom_string" indexed="true" stored="true" /> <field name="text" type="text_fr" indexed="true" multiValued="true" stored="true"/> <copyField source="event_id" dest="text"/> <fieldType name="text_fr" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <!-- normalisation des accents,...

Understanding Apache Lucene's scoring algorithm

I'm working with Hibernate Search for months now, but still I'm not able to digest the relevance it brings. I'm overall satisfied with the results it returns, but even simplest test does not satisfy my expectation. First test was using the term frequency(tf). Data: word word word word word word...

Is it possible to index views in Apache Solr

Let me first give you an example. I have two tables -table1 and table2. table1 has a field id_table2, which is a foreign key and references one of the fields in table2. So, when I want to scan table1, I make a query like: SELECT t1.attr_1_, t1.attr_2_, t2.attr_3_ FROM table1...

Apache Solr Exception

Hello I am trying to run Solr on a Tomcat and have an exception like org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: directory '/var/lib/solr/data/index' does not exist Maybe anyone has some trouble like I do?...

TYPO3 Solr extension and facets

I have a small question about TYPO3 solr facets.At present in my website I have 6 different indexing configuration available. Two of them are custom extension table's and one is tt_news and rest of the 3 are pages table with some custom condition. I managed to add this using additionalWhereClause...