elasticsearch,docker,mesos,marathon , Strategy to persist the node's data for dynamic Elasticsearch clusters


Strategy to persist the node's data for dynamic Elasticsearch clusters

Question:

Tag: elasticsearch,docker,mesos,marathon

I'm sorry that this is probably a kind of broad question, but I didn't find a solution form this problem yet.

I try to run an Elasticsearch cluster on Mesos through Marathon with Docker containers. Therefore, I built a Docker image that can start on Marathon and dynamically scale via either the frontend or the API.

This works great for test setups, but the question remains how to persist the data so that if either the cluster is scaled down (I know this is also about the index configuration itself) or stopped, and I want to restart later (or scale up) with the same data.

The thing is that Marathon decides where (on which Mesos Slave) the nodes are run, so from my point of view it's not predictable if the all data is available to the "new" nodes upon restart when I try to persist the data to the Docker hosts via Docker volumes.

The only things that comes to my mind are:

Are there any other way to approach this? Are there any recommendations? Unfortunately, I didn't find a good resource about this kind of topic. Thanks a lot in advance.


Answer:

Elasticsearch and NFS are not the best of pals ;-). You don't want to run your cluster on NFS, it's much too slow and Elasticsearch works better when the speed of the storage is better. If you introduce the network in this equation you'll get into trouble. I have no idea about Docker or Mesos. But for sure I recommend against NFS. Use snapshot/restore.

The first snapshot will take some time, but the rest of the snapshots should take less space and less time. Also, note that "incremental" means incremental at file level, not document level.

The snapshot itself needs all the nodes that have the primaries of the indices you want snapshoted. And those nodes all need access to the common location (the repository) so that they can write to. This common access to the same location usually is not that obvious, that's why I'm mentioning it.


Related:


Elasticsearch and C# - query to find exact matches over strings


c#,.net,database,elasticsearch,nest
I need a way to search documents using a plain exact match over two or multiple fields which are of type "string" and "integer". I'd like to avoid standard query as I don't care about scoring or best match, just a yes/no outcome if both the fields match or not....

Saving docker container image


jenkins,docker
I created a new docker container using jenkings image This is the command I ran docker run -p 8080:8080 -v /var/jenkins_home jenkins I created a few jobs on the jenkins instance and commited the image docker commit 7b903d061654 test When I run the image I created using the command (below)...

elasticsearch aggregation group by null key


elasticsearch
here is the data in my elasticsearch server: {"system": "aaa"}, {"system": "bbb"}, {"system": null} I want to get the statistics for system. then I did the query: { "aggs" : { "myAggrs" : { "terms" : { "field" : "system" } } } it gives me the result: { "key":...

How to customize the configuration file of the official PostgreSQL docker image?


postgresql,docker
I'm using the the postgres official image https://registry.hub.docker.com/_/postgres/. And now I'm trying to customize its configuration. For this purpose the command sed is used, e.g. to change the max_connections: sed -i -e"s/^max_connections = 100.*$/max_connections = 1000/" /var/lib/postgresql/data/postgresql.conf I tried two methods to apply this configuration. The first is by adding...

ElasticSearch asynchronous post


database,post,asynchronous,elasticsearch,get
I'm posting data on my ElasticSearch database. I've noticed that data is not immediately available, it requires some milliseconds to show up in a GET request. I can live with that (after all, the calls are asynchronous so this behavior is expected) but in my test code I need to...

Elasticsearch: How to query using partial phrases in quotation marks


elasticsearch
I am trying to implement a search behavior that supports partial phrases. A possible search input could look like this: example "hello world" elasticsearch Now I want to get all documents, that contain the words example and elasticsearch as well as the phrase hello world. As this is a very...

Get document on some condition in elastic search java API


java,elasticsearch,elasticsearch-plugin
As I know we can parse document in elastic search, And when we search for a keyword, It will return the document using this code of java API:- org.elasticsearch.action.search.SearchResponse searchHits = node.client() .prepareSearch() .setIndices("indices") .setQuery(qb) .setFrom(0).setSize(1000) .addHighlightedField("file.filename") .addHighlightedField("content") .addHighlightedField("meta.title") .setHighlighterPreTags("<span class='badge badge-info'>") .setHighlighterPostTags("</span>") .addFields("*", "_source")...

ElasticSearch - Configuration to Analyse a document on Indexing


elasticsearch
In a single request, I want to retrieve documents from a SOR, store them in ElasticSearch, and then search those documents using the ES search API. There seems to be some lag from the time the document is indexed and the time it is analyzed and ready to be searched....

How to get duplicate field values in elastic search by field name without knowing its value


elasticsearch
I have a field "EmployeeName" in an elastic search index - and I would like to execute a query that will return me all the cases where there are duplicate values of "EmployeeName". Can this be done? I found more_like_this but this requires field value for "like_text". But my requirement...

Get elasticsearch result based on two keys


elasticsearch,elastic
I want to get all docs who's "PayerAccountId" should equal to "123" and "UsageStartDate" should be in range [2015-05-01 TO 2015-05-10] I am expecting something to run like this, curl -X GET http://192.168.1.3:9200/_all/_search -d '{"query" : {"match" : { "PayerAccountId:\"156023466485\" AND UsageStartDate:[2015-01-01 TO 2015-01-10]" }}}' Obviously it's not working any...

Use of container docker as a proxy for CF app to get public IP


docker,containers,bluemix
I need a public IP for my application: is it a proper solution to use docker container as a proxy? I can see the price increasing quite a lot with all traffic going through proxy to reach the application. Is there any other option recommended?

ElasticSearch REST - insert JSON string without using class


elasticsearch,elastic,elasticsearch-net
I am looking for an example where we can push below sample JSON string to ElasticSearch without using classes in REST api. { "UserID":1, "Username": "Test", "EmailID": "[email protected]" } We get the input as xml and we convert it to JSON string using NewtonSoft.JSON dll. I know REST api is...

Elasticsearch - Query document missing an array value


elasticsearch
I would like to query my elasticsearch index in order to retrieve the documents that don't contain a specific value in an array. For instance, if my query is : { "query": { "bool": { "must": [ { "match_all": {} } ], "must_not": [], "should": [] } }, "from": 0,...

How to use arrays in lambda expressions?


c#,elasticsearch,nest
I am writing a program with NEST library of ElasticSearch. I want to write a lambda expression for a function with this argument: HighlighDescriptor<parentdocument> HighlighDescriptor.onFields (param Action<HighlightFieldDescriptor<ParentDocument>>[] fieldHighlighters) I don't know what is the array in the function argument?...

What tool can manage Docker container to start in order?


docker,containers
We're trying to port our system to the containers, currently we try Docker module with Puppet and we face an issue with the order of starting some specific conainers. We have a web application, MySQL, NginX and then everything connect through HAproxy. Each of them live in their own container....

Not able to access Kibana running in a Docker container on port 5601


elasticsearch,docker,dockerfile,kibana-4
I have built a docker image with the following Docker file. # gunicorn-flask FROM devdb/kibana MAINTAINER John Doe <[email protected]> ENV DEBIAN_FRONTEND noninteractive RUN apt-get update RUN apt-get install -y python python-pip python-virtualenv gunicorn # Setup flask application RUN mkdir -p /deploy/app COPY gunicorn_config.py /deploy/gunicorn_config.py COPY app /deploy/app RUN pip install...

Docker container http requests limit


http,elasticsearch,docker
I'm new to Docker so, most likely, I'm missing something. I'm running a container with Elasticsearch, using this image. I'm able to setup everyhing correctly. After that I was a using a script developed by a collegue in order to insert some data, basically querying a MySQL database and making...

Does Google Container Registry undergo issues?


docker,google-cloud-platform,google-container-engine,google-container-registry
I am facing an issue while using container optimized VMs. I launch an instance with following command: gcloud compute instances create "$instance_name" \ --tags "http-server" \ --image container-vm \ --scopes storage-rw,logging-write \ --metadata-from-file google-container-manifest="m2.yml" \ --zone "$my_zone" \ --machine-type "$my_machine_type" where m2.yml is: version: v1beta2 containers: - name: nginx image:...

What's the best way to share files from Windows to Boot2docker VM


docker,boot2docker
I have make my code ready on Windows, but I find it's not easy to share to boot2docker. I also find that boot2docker can't persistent my changes. For example, I create a folder /temp, after I restart boot2docker, this folder disappears, it's very inconvenient. What is your way when you...

Re-index object with new fields


elasticsearch,nest
It seems like as long as the id field is maintained, its super easy to re-index a document by simply calling Index(), but is there a way to given an object was updated and new fields were added, to have it include these new fields in the index? I'm not...

docker set iptables options in docker-compose.yml


docker
I'm using docker-compose for managing containers. How to I turn off iptables (set --iptables=false for docker) when starting via docker-compose up?...

Elasticsearch standard analyser stopwords


elasticsearch
I am trying to guess what is the default stopwords list in standard analyzer in elasticsearch. I run version 1.3.1, and it seems to me that the English list is used, because running a wildcard query like this { "wildcard" : { "name" : { "wildcard" : "*in*" } }...

Is there a way to tell kubernetes to update your containers?


automation,docker,kubernetes
I have a kubernetes cluster, and I am wondering how (best practice) to update containers. I know the idea is to tear down the old containers and put up new ones, but is there a one-liner I can use, do I have to remove the replication controller or pod(s) and...

How to write search queries in kibana using Query DSL for Elasticsearch aggregation


elasticsearch,querydsl,kibana-4
I am working on ELK stack to process Apache access logs. Spent a lot of time understanding Query DSL format so that more complex queries can be written. Currently am facing issues with running the queries in kibana interface but the same queries work just fine when posted using curl...

Run Boot2Docker from bash


osx,bash,docker,boot2docker
I have installed Boot2Docker on my Mac. By clicking on the app icon, a terminal window opens with some bash script running to prepare the docker vm and then you're good to go. My problem is, I want to write a bash script to run the Boot2Docker vm and then...

MultiMatch query with Nest and Field Suffix


c#,elasticsearch,nest
Using Elasticsearch I have a field with a suffix - string field with a .english suffix with an english analyser on it as shown in the following mapping ... "valueString": { "type": "string", "fields": { "english": { "type": "string", "analyzer": "english" } } } ... The following query snippet won't...

Parsing Google Custom Search API for Elasticsearch Documents


json,python-2.7,elasticsearch,google-search-api
After retrieving results from the Google Custom Search API and writing it to JSON, I want to parse that JSON to make valid Elasticsearch documents. You can configure a parent - child relationship for nested results. However, this relationship seems to not be inferred by the data structure itself. I've...

Bad scoring due to different maxDocs of IDF


elasticsearch
I have two documents with a field title of: News New Website If I search for the term new website the score for the News document is much higher than the other one which is obviously not what I want. I wrapped an explain around it and got: 'hits': [{'_explanation':...

ElasticSearch- “No query registered for…”


search,indexing,elasticsearch
ElasticSearch returns me "No query registered for [likes_count]" error when trying to look up entries using the following query. The field likes_count is a new field of documents and does not exist in every document. The same query works without the sort part. Why does this error appear? Thanks {...

Docker-compose and pdb


python,docker,pdb,docker-compose
I see that I'm not the first one to ask the question but there was no clear answer to this: How to use pdb with docker-composer in Python development? When you ask uncle Google about django docker you get awesome docker-composer examples and tutorials and I have an environment working...

docker run local script without host volumes


database,shell,docker,docker-compose
The goal is to add data to my database server containers of a multi-container web app from a download using curl once the database containers are running. I can do this from docker-compose.yml or from docker run independent of the web app, as long as I use host volumes. How...

Why is /etc/hosts file empty in my docker container?


docker
I created a minimal docker container, following https://github.com/snoyberg/haskell-scratch containing a single Haskell application. When run the application works fine except it cannot resolve hosts from /etc/hosts because it is empty which implies linking does not work correctly (or at least I need to use numeric addresses which is impractical...). I...

Operator '??' cannot be applied to operands of type IQueryContainer and lambda expression


c#,elasticsearch,nest
I am trying to create a method to process a certain query. I follow an example posted on the Nest repository (line 60), but still the MatchAll is not recognized by the compiler and if I try to build the solution, the error that shows is: Operator '??' cannot be...

PyCharm add remote Python interpreter inside the Docker


python,docker,pycharm,remote-debugging,boot2docker
So I have set up a docker on my laptop. I'm using Boot2Docker so I have one level of indirection to access the docker. In PyCharm, I can set a remote python interpreter via SSH but I'm not sure how to do it for dockers that can only be accessed...

Any suggestion for running Aerospike on Kubernetes on CoreOS on GCE?


docker,google-compute-engine,kubernetes,aerospike
I would like to run Aerospike cluster on Docker containers managed by Kubernetes on CoreOS on Google Compute Engine (GCE). But since GCE does not permit multicast, I have to use Mesh heartbeat as described here, which has to be set up by specifying all node's IP addresses and ports;...

How to have multiple regex based on or condition in elasticsearch?


elasticsearch
I want to get all 000ANT and 0BBNTA from id, is there something similar to terms which works with regexp or is there any other way? Otherwise I will have to query elasticsearch for each item say 000ANT and 0BBNTA. Please help. Below is something that I am trying out...

Why are images created and why doesn't Docker clean them up?


docker
I know how to removed containers and images. What I don't understand is why those <none> images are created in the process of a docker build -t my_container . or a similar command. Running docker images will give me something like that: REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE go_dev...

Access binaries inside docker


ubuntu,meteor,docker
I am using Meteor and Meteur Up package to push a bundle to server. It uses docker. The problem is that I cannot access graphicsmagick or imagemagick from inside a docker to use it in my app. However it is installed on the server and I can access it when...

elastic search sort in aggs by column


sorting,elasticsearch,group-by,order
I am trying to sort in elastic search in aggs, equivalent in mysql "ORDER BY Title ASC/DESC". Here is the index structure: 'body' => array( 'mappings' => array( 'test_type' => array( '_source' => array( 'enabled' => true ), 'properties' => array( 'ProductId' => array( 'type' => 'integer', 'index' => 'not_analyzed'...

Elasticsearch NumberFormatException when running two consecutive java tests


java,date,elasticsearch,numberformatexception,spring-data-elasticsearch
I have two test in a class, each of them containing the following query: SearchQuery searchQuery = new NativeSearchQueryBuilder().withQuery(matchAllQuery()).withFilter(rangeFilter("publishDate").lt(date)).build(); In one of the tests, the number of the results elasticsearchTemplate.count(searchQuery, Article.class), in the other one the returned values are verified elasticsearchTemplate.queryForPage(searchQuery,Article.class) If I run any of these two tests separately,...

Configure Dockerfile to set AWS configurations


node.js,amazon-web-services,docker
I've just started looking at Docker. I have a node app that resizes and image and then sends an SQS message to aws when finished. I have managed to create a docker image of my app, copying it from my local machine, but run into the issue that I can't...

How to compute the scores based on field data in elasticsearch


elasticsearch
I have the following fields in documents { name: "Pearl", age : 43, weight: 54, bodyWeight : 103, height : 1.8 } Now i want to get scores for the documents based on the bodyWeight to height ratio of the documents. How to do that?...

Elasticsearch geospatial search, problems with index setup


elasticsearch,geospatial
I'm trying to search for documents previously added to an index, which has been configured to allow geospatial queries (or so I think). My elasticsearch instance is hosted on qbox.io. This is the code I wrote to create an index from the command line curl -XPOST username:[email protected]/events -d '{ "settings"...

NEST ElasticSearch.NET Escape Special Characters


c#,elasticsearch,nest
I have been experimenting with the use of the NEST client for Elastic Search, but seem to have hit a barrier when filtering on a term which contains special/reserved characters such as '/' Below is a JSON representation of my model.. "categories": { "count": 1, "default": "root/Hello/World/Category", } When submitting...

How to edit file after I shell to a docker container?


docker
I successfully shelled to a docker container using docker exec -i -t 69f1711a205e bash Now I need to edit file and I don't have any editors inside [email protected]:/# nano bash: nano: command not found [email protected]:/# pico bash: pico: command not found [email protected]:/# vi bash: vi: command not found [email protected]:/# vim...

NEST - Using GET instead of POST/PUT for searching


c#,elasticsearch,nest
Is there a way to tell NEST to use GET instead of POST when performing searches? Similar to how the ElasticSearch documentation shows CURL using GET I'd like to use GET when using NEST instead of using POST as it currently does.

Installing Python 3 Docker Ubuntu error command 'x86_64-linux-gnu-gcc


python,python-3.x,amazon-web-services,docker
I'm trying to create a dockerfile that uses Python 3. FROM ubuntu:14.04 RUN apt-get update RUN apt-get install -y python3 python3-dev python-pip RUN apt-get install -y libxml2-dev libxslt1-dev libpq-dev libjpeg-dev libfreetype6-dev zlib1g-dev RUN cd /var/projects/apps && pip install -r requirements.txt I get the error fatal error: Python.h: No such file...

How to read data in logs using logstash?


elasticsearch,logstash
I have just started log stash, i have log files in that log file whole object is printed in the logs, Since my object is huge i cant write the grok patterns to the whole object and also i expecting only two values out of those object. Can you please...