indexing,solr,hbase,storm , Solr Indexing in Storm topology vs Hbase NG Indexer


Solr Indexing in Storm topology vs Hbase NG Indexer

Question:

Tag: indexing,solr,hbase,storm

I am working on designing the Data Indexing feature into Solr. We are using Storm Topology and have a Hbase Bolt where it is adding data into Hbase. The requirement is what ever data we are adding into Hbase, needs to be indexed as well.

The following are the options:

  1. Add code to index in Solr, in Hbase bolt itself.
  2. Create a new bolt, and separate Solr indexing
  3. Use Hbase ND indexer, and integrate Solr indexer with Hbase row insertion.

The first two option, are similar to transactions, meaning both Hbase and Solr or none. But not sure, if we can do this, as we are dealing with data on large scale.

For third option, the starting point is Hbase, so all data is assumed to be in there. However, we do not have complete control on debugging because we have to deploy the jar into Indexer environment.

Please help me, which design is preferable.


Answer:

After some analysis, we went ahead and implemented the design with NGHbas indexer. One argument is that we cannot gaurantee same data in hbase and solr as we cannot handle transactions at large scale. Also we have similar design for streaming data. So made used of the setup


Related:


ElasticSearch- “No query registered for…”


search,indexing,elasticsearch
ElasticSearch returns me "No query registered for [likes_count]" error when trying to look up entries using the following query. The field likes_count is a new field of documents and does not exist in every document. The same query works without the sort part. Why does this error appear? Thanks {...

np.argmax on multidimensional arrays, keeping some indexes fixed


numpy,multidimensional-array,indexing,argmax
I have a collection of 2D narrays, depending on two integer indexes, say p1 and p2, with each matrix of the same shape. Then I need to find, for each pair (p1,p2), the maximum value of the matrix and the indexes of these maxima. A trivial, albeit slow, way to...

Solr custom UpdateRequestProcessorFactory fails with “Error Instantiating UpdateRequestProcessorFactory”


java,solr,lucene,config,solrcloud
I have a custom class extending UpdateRequestProcessorFactory doing some work on a document when it gets added to the index. This was working fine in v4.10.3 in standalone Solr. I moved to SolrCloud v5.2 and it throws this error when adding the Collection (node): ERROR - 2015-06-14 12:25:11.071; [ docs_shard1_replica1]...

Creating index while updating the documents


mongodb,indexing
I have a collection I am updating adding a new field. The document looks like: {"A": "P145", "B":"adf", "C":[{"df":"14", "color":"blue"},{"df":17}], "_id":ObjectID(....), "Synonyms":{"Synonym1": "value1", "Synonym2": ["value1", "value2"]}} In the update I am adding new elements to C I want to create a index on the field A and B. A and...

Pandas: break categorical column to multiple columns


python,indexing,pandas
Imagine a Pandas dataframe of the following format: id type v1 v2 1 A 6 9 1 B 4 2 2 A 3 7 2 B 3 6 I would like to convert this dataframe into the following format: id A_v1 A_v2 B_v1 B_v2 1 6 9 4 2 2...

How to use all the cores of Solr in solrj


java,indexing,solr,lucene,solrj
I have downloaded solr 5.2.0 and have started using $solr_home/bin/solr start The Logs stated: Waiting to see Solr listening on port 8983 [/] Started Solr server on port 8983 (pid=17330). Happy searching! Then I visited http://localhost:8983/solr and created a new core using Core Admin / new Core as Core1 (...

Decision to use KEY or UNIQUE KEY


mysql,indexing,unique-index
I understand that UNIQUE KEY is a unique index and KEY is a non-unique index. I have read that in case of unique index'es inserting data might result in some IO. If we don't have to rely on the DB for unique-ness and we still want fast lookup's using column...

Printing corresponding elements from multiple lists by using known element from a single list in Python


arrays,indexing,elements
I have 4 very large arrays, a, b, c, and d of the same size. I have selected specific elements in 'a', and I need the corresponding elements from b, c, and d (i.e. if I select an element from 'a' want the elements from the other arrays that have...

Neo4j: configure legacy index with cypher or property file


indexing,neo4j
I am using Neo4j-server and I am trying to find a way to configure a legacy index either with cypher or with a property. So far to enable the legacy indexing I just uncommented the related line in neo4j.properties file. How can I apply some of the configurations listed here...

Postgres Index-only-scan: can we ignore the visibility map or avoid heap fetches?


postgresql,indexing
Sorry, lots of context before the actual question as we've throughly researched this and I wanted to give you full context. Some context: postgres index-only-scans rely on the visibility map (VM). If a page is not marked as not-fully-visible in the visibility map, postgres fetches that page to ensure the...

php not executing in a get_file_contents document


php,indexing,footer,copyright
In my index.php file I have a command to retrieve the footer content. <?php echo file_get_contents('template/footer.php'); ?> In the footer.php document I have an additional script to get a automatic generated copyright date however when the footer.php document is rendered it isn't executing the php. <?php echo date("Y"); ?> Is...

Which tables do not have indexes in Oracle? [closed]


oracle,indexing,sqlplus
I think its wrong question ,but its a university project assignment .. The question : I want to find tables without indexes in Oracle with select statement in SqlPlus. Thanks for helping.....

Getting application/json back from a Solr query


java,json,solr,jersey,jersey-client
I'm calling the Solr REST api using a Jersey client: final ClientResponse resp = client().path(queryPath()) .queryParam("q", query.getQuery()) .queryParam("wt", "json") .accept(MediaType.APPLICATION_JSON_TYPE) .get(ClientResponse.class); resp.getEntity(HttpResponse.class) and when I run it I get: A message body reader for Java class challenger.HttpResponse, and Java type class challenger.HttpResponse, and MIME media type text/plain; charset=UTF-8 was not...

IndexError obstructing code from working with larger csv file


python,csv,indexing,pandas
I have data that sorts a csv by using groupby and then plots the information. I used a small sample of information to create the code. It ran smoothly and so then I tried running it with the huge file of data. I am pretty new at Python and this...

Is it possible to index views in Apache Solr


sql,view,solr
Let me first give you an example. I have two tables -table1 and table2. table1 has a field id_table2, which is a foreign key and references one of the fields in table2. So, when I want to scan table1, I make a query like: SELECT t1.attr_1_, t1.attr_2_, t2.attr_3_ FROM table1...

Optimize the execution of select


sql-server,indexing,query-optimization,clustered-index,non-clustered-index
I want optimize this select: Select Dane1, Dane5, Dane6, Dane7 FROM Test INNER JOIN Test2 ON Test.Id=Test2.IdTest WHERE Dane5 > 199850 My database has 2 tables test, test2: test design: Id int ->PRIMARY KEY, Dane1 int, Dane2 int, Dane3 int, Dane4 int, Dane5 int, test2 design: Id int ->PRIMARY KEY,...

Creating NxM indices from a Python array


python,numpy,indexing
I would like to take an NxM matrix, for simplicity, we'll use x=np.arange(25).reshape((5,5)) And I would like to create a new matrix, A, in which I can store a node for each element in the first row, its N-direction index in the second row, its M-direction index in the third...

Optimizer using an index not present in the current schema


oracle,indexing,optimizer
CONNECT alll/all SELECT /*+ FIRST_ROWS(25) */ employee_id, department_id FROM hr.employees WHERE department_id > 50; Execution Plan Plan hash value: 2056577954 | Id | Operation | Name | Rows | Bytes | | 0 | SELECT STATEMENT | | 25 | 200 | 1 | TABLE ACCESS BY INDEX ROWID| EMPLOYEES...

solrcloud - choosing cores for update and search requests


solr,solrcloud
I have a SolrCloud with one collection configured with compositeId and numShards=3 and replicationFactor=2. there will be about 200K inserts a day and about as many searches. from the SolrCloud documentation: "If the machine is a replica, the document is forwarded to the leader for processing." Does this means that...

Solr 5.1.0 - Apache TikaEntityProcessor Cannot Find My Files


mysql,solr,tika
Solr, more specifically Tika, is having some problems finding my file whose filepath is retrieved from a database. Whenever I go to index it logs errors saying that this can't find the file. I'm basically doing what this guy is doing here, which is taking a file path from a...

Creating Index in Elasticsearch using Java API giving NoClassFoundException


java,indexing,elasticsearch
I'm trying to create a node based client using Java API and index a JSON document. Here's the code : import java.util.Date; import java.util.HashMap; import java.util.Map; import org.elasticsearch.action.deletebyquery.DeleteByQueryResponse; import org.elasticsearch.client.Client; import org.elasticsearch.node.Node; import static org.elasticsearch.node.NodeBuilder.*; public class Els { public static void main (String args[]){ Els p = new Els();...

Lucene vs Solr, indexning speed for sampe data


java,indexing,solr,lucene,full-text-search
I have worked upon Lucene before and now moving towards Solr. The problem is that I am not able to do Indexing on Solr as fast as Lucene can do. My Lucene Code: public class LuceneIndexer { public static void main(String[] args) { String indexDir = "/home/demo/indexes/index1/"; IndexWriterConfig indexWriterConfig =...

Solr 4.10.2 MySQL import fails with java.io.EOFException


mysql,solr
I'm trying to migrate a server with Solr 4.7.2 on it. I have a Solr 4.10.2 with 4 cores running which is the new machine. I have an importer running on the old machine that poses no problem. However, when trying to run the importer on the new machine, I...

System.ArgumentOutOfRangeException in For loop due to entry deleted from database. Help fix it


database,vb.net,for-loop,indexing
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click Dim TimeNow As DateTime TimeNow = DateTime.Now With ReservationTableDataGridView For i As Integer = 0 To (.Rows.Count - 1) ListView1.BackColor = Color.Black Dim ToBeTested As DateTime ToBeTested = (.Rows(i).Cells(1).Value) # Console.WriteLine(ToBeTested) If ToBeTested > TimeNow.AddMinutes(45) Then Me.ReservationTableTableAdapter.Delete((.Rows(i).Cells(4).Value), (.Rows(i).Cells(1).Value), (.Rows(i).Cells(2).Value), (.Rows(i).Cells(3).Value))...

Postgres 9.4 jsonb array as table


sql,json,postgresql,indexing,jsonb
I have a json array with around 1000 elements of the structure "oid: aaa, instance:bbb, value:ccc". {"_id": 37637070 , "data": [{"oid": "11.5.15.1.4", "value": "1", "instance": "1.1.4"} , {"oid": "11.5.15.1.9", "value": "17", "instance": "1.1.4"} , {"oid": "12.5.15.1.5", "value": "0.0.0.0", "instance": "0"}]} oid and instance are unique per json array. If I...

Searching a TextField and IntField together seperated by an AND condition In Lucene


java,search,indexing,lucene
I have indexed my documents as: doc.add(new IntField("ID", id, Field.Store.YES)); doc.add(new TextField("First_Name", First_Name, Field.Store.YES)); doc.add(new TextField("Last_Name", Last_Name, Field.Store.YES)); doc.add(new TextField("Address", add, Field.Store.YES)); doc.add(new TextField("City", city, Field.Store.YES)); doc.add(new TextField("State", state, Field.Store.YES)); doc.add(new IntField("Zip_Code", zip, Field.Store.YES)); Where id, FirstName, city, add, state, zip are variables that store the values to be indexed....

Create a text index on MongoDB schema which references another schema


javascript,node.js,mongodb,indexing,mongoose
I am trying to add an index to a certain Schema with mongoose for text searches. If I add a text index to individual fields it works fine, also with compound indexes it is okay. For example the answer provided here is great: Full text search with weight in mongoose...

comparing index values in an array


ruby-on-rails,ruby,indexing,each
I have an array that looks like this ranking_array = ['NC', '40', '30/5', '30/4', '30/3', '30/2', '30/1', '30', '15/5', '15/4', '15/3', '15/2', '15/1', '15', '5/6', '4/6', '3/6', '2/6', '1/6', '0', '-2/6', '-4/6', '-15', '-30'] I have also a model user, my user has a ranking which is a value that...

Trying to create a new dataframe based on internal sums of a column from another dataframe using Python/pandas


python,indexing,pandas,sum,dataframes
Let's assume I have a pandas dataframe df as follow: df = DataFrame({'Col1':[1,2,3,4], 'Col2':[5,6,7,8]}) Col1 Col2 0 1 5 1 2 6 2 3 7 3 4 8 Is there a way for me to change a column into the sum of all the following elements in the column? For...

Fuzzy search not working with dismax query parser


solr,lucene
There is a field in my schema 'fullText' which is of the 'text_en' type, and multivalued. The term 'tests' is in the fullText field in one document. In solr, when I try to search using the word 'test', with the standard lucene parser with minimal distance 1, its returning the...

Rails4 + sunspot search


mysql,ruby-on-rails,solr,sunspot
I am trying to use sunspot solr for searching with Rails 4 and mysql. I defined a searchable block in my model(eg XYZ): searchable do text :name, :stored => true string :id, :stored => true end I just want to search in "name". The "id" is the primary key. There...

How do I combine Facet and FilterQueries using Spring data Solr?


spring,solr,filtering,facet
Is it possible to combine a facet and field query in spring data solr? Something that would build a query like this: > http://localhost:8983/solr/myCore/select?q=lastName%3AHarris*&fq=filterQueryField%3Ared&wt=json&indent=true&facet=true&facet.field=state In other words, how do I add FilterParameters to a SimpleFacetQuery? Any/all replies welcome, thanks in advance, -- Griff...

Retrieve index of newly added row - for loop in R


r,for-loop,indexing
I am trying to retrieve the index of a newly-added row, added via a for loop. Starting from the beginning, I have a list of matrices of p-values, each with a variable number of rows and columns. This is because not all groups have an adequate number of treated individuals...

Is there a way to associate data with a file in a folder hierarchy?


java,indexing,uniqueidentifier,tagging,subfolder
Using Java, I am creating a program that indexes a folder structure and allows a user to search for files and also tag a file with keywords and then search for files based off of those tags. I have been traversing through the folder hierarchy using the FileUtils listFiles method...

SQL - Avoid 2 columns in 1 record having the same value


mysql,indexing
I'm now working on a new MySQL database (interface PHPmyadmin) for my personal project and I don't know how to avoid two columns on a same table to have the same value. So, i want that we can't insert same values for 2 differents columns in one record. Example TABLE...

How can I access semi-sparse data efficiently in java?


java,arrays,indexing
So I'm working with a problem where I am parsing a large text file into data - each row of the file being represented by a Node object with several data fields. During program execution, these objects will be accessed many times according to their int id field (specified in...

Reduction of list dimensions in Python


python,list,indexing,nodes
I'm trying to assign classes to a list of nodes, and separate all nodes into separate lists based on class tag. For example, if we have the following code: #define number of classes MaxC=5 index=[4 4 5 1 4 1 4 5 4 4 3 1 3 3 1 1]...

How to get a mysql query to use a specific index?


mysql,database,table,indexing
SELECT * FROM orders WITH (INDEX(idx)); When i fired above query i got the error mysql #1064 - You have an error in your SQL syntax I have created index as below create index idx on orders(date,status); Can anybody tell me the correct syntax?...

Python Pandas: select rows based on comparison across rows


python,indexing,pandas
In the dataframe below, the first column is the index with occasional non-unique values. | | col1 | |---|------| | A | 120 | | A | 90 | | A | 80 | | B | 80 | | B | 50 | | C | 120 | |...

Using [] on pointers in C?


c,pointers,indexing
Right now I'm looking over some C code and they have some pointer syntax that I'm confused about. So first they declared a pointer like so: int32_t *p_tx_buf=NULL; Then later on they wrote: p_tx_buf = malloc(...math... ); The stuff in the middle is just math to calculate the size of...

Looping through numbers to create a large table


excel,vba,excel-vba,indexing
I have a code that works, but I want to add some more functionality to it. It currently does what it is supposed to do, and has sped up some processes, but now I think it can be sped up even more. I am using the solution that I marked...

Why can't I access the elements of my NSMutableArray?


objective-c,indexing,nsmutablearray
I declared arrays a,b,c and d as properties in my interface class and initialized them like this: [self setPatternA:[[NSMutableArray init] initWithArray:@[@0,@0,@1,@1,@2,@2]]]; [self setPatternB:[[NSMutableArray init] initWithArray:@[@3,@4,@4,@5,@5,@3]]]; [self setPatternC:[[NSMutableArray init] initWithArray:@[@3,@3,@4,@4,@5,@5]]]; [self setPatternD:[[NSMutableArray init] initWithArray:@[@0,@1,@1,@2,@2,@0]]]; and now I'm trying to access them like this: NSInteger a=[patternA objectAtIndex:3]; NSLog(@"pattern a: %ld", (long)a); but...

Does PostgreSQL quickly search for columns with arrays of strings?


arrays,database,performance,postgresql,indexing
According to Can PostgreSQL index array columns?, PostgreSQL can index array columns. Can it do searches on an array column as efficiently as it does for non array types? For example, suppose you have a row from a questions table (like SO): title: ... content:... tags: [ 'postgresql', 'indexing', 'arrays'...

Understanding Apache Lucene's scoring algorithm


search,solr,lucene,full-text-search,hibernate-search
I'm working with Hibernate Search for months now, but still I'm not able to digest the relevance it brings. I'm overall satisfied with the results it returns, but even simplest test does not satisfy my expectation. First test was using the term frequency(tf). Data: word word word word word word...

How can I sort by realtime score in solr?


solr
Now I have a solr collection: question question has some field: id answer_count created_at updated_at now I have the sort rule: score = answer_count * 100 - (the hours now to created_at) * 5 then I need to sort by the score desc. how can i do that because of...

Solr 5.1.0: How to set the unique key via Schema API


solr,schema,unique-key
In Solr 5.1.0, is it possible to set the unique key via the REST schema api? I created a collection with the data driven schema. Solr would guess what the field type and create the field based on the data I upload. I can still define fields beforehand by sending...

How to count the characters of each line in a string and convert the number to a hex


java,arrays,indexing,hex
I have this String here called message. Bruce Wayne,Batman,None,Gotham City,Robin,The Joker Oliver Queen,Green Arrow,None,Star City,Speedy,Deathstroke Clark Kent,Superman,Flight,Metropolis,None,Lex Luthor Bart Allen,The Flash,Speed,Central City,Kid Flash,Professor Zoom I need to count the number of characters in each line and print them in hex. First line should be (From Bruce to Joker) 2b Second...

Do you get the same performance using index prefixes?


performance,mongodb,indexing
Say I have a collection containing documents like the one below: { _id: ObjectId(), myValue: 123, otherValue: 456 } I then create like below: {myValue: 1, otherValue: 1} If I execute the following query: db.myCollection.find({myValue: 123}) will I get the same performance with my index as I would if I...