python,json,elasticsearch,elastic , Two JSON douments linked by a key


Two JSON douments linked by a key

Question:

Tag: python,json,elasticsearch,elastic

I have a python server listening to POST from an external server.I expect two JSON documents for every incident happening on the external server. One of the fields in the JSON documents is a unique_key which can be used to identify that these two documents belong together. Upon recieving the JSON documents, my python server sticks into elasticsearch. The two documents related to the incident will be indexed in the elastic search as follows.

/my_index/doc_type/doc_1

/my_index/doc_type/doc_2

i.e the documents belong to the same index and has the same document type. But I don't have an easy way to know that these two documents are related. I want to do some processing before inserting into ElasticSearch when I can use the unique_key on the two documents to link these two. What are your thoughts on doing some normalization across the two documents and merging them into a single JSON document. It has to be remembered that I will be recieving a large number of such documents per second. I need some temporary storage to store and process the JSON documents. Can some one give some suggestions for approaching this problem.

As updated I am adding the basic structure of the JSON files here.

json_1

{
    "msg": "0",
    "tdxy": "1",
    "data": {

        "Metric": "true",
        "Severity": "warn",

        "Message": {
            "Session": "None",
            "TransId": "myserver.com-14d9e013794",
            "TransName": "dashboard.action",
            "Time": 0,
            "Code": 0,
            "CPUs": 8,
            "Lang": "en-GB",
            "Event": "false",
        },
        "EventTimestamp": "1433192761097"
    },
    "Timestamp": "1433732801097",
    "Host": "myserver.myspace.com",
    "Group": "UndefinedGroup"
}

json_2

{
    "Message": "Hello World",
    "Session": "4B5ABE9B135B7EHD49343865C83AD9E079",
    "TransId": "myserver.com-14d9e013794",  
    "TransName": "dashboard.action"
    "points": [
        {
            "Name": "service.myserver.com:9065",
            "Host": "myserver.com",
            "Port": "9065",

        }
    ],
    "Points Operations": 1,
    "Points Exceeded": 0,
    "HEADER.connection": "Keep-Alive",
    "PARAMETER._": "1432875392706",
}

I have updated the code as per the suggestion.

      if rx_buffer:
           txid = json.loads(rx_buffer)['TransId']
            if `condition_1`: 
                res = es.index(index='its', doc_type='vents', id=txid, body=rx_buffer)
                print(res['created'])
            elif `condition_2`:
                res = es.update(index='its', doc_type='vents', id=txid, body={"f_vent":{"b_vent":rx_buffer}})                                                              

I get the following error.

 File "/usr/lib/python2.7/site-packages/elasticsearch/transport.py", line 307, in perform_request
    status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
  File "/usr/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.py", line 89, in perform_request
    self._raise_error(response.status, raw_data)
  File "/usr/lib/python2.7/site-packages/elasticsearch/connection/base.py", line 105, in _raise_error
    raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
RequestError: TransportError(400, u'ActionRequestValidationException[Validation Failed: 1: script or doc is missing;]')

Answer:

The code below makes the assumption you're using the official elasticsearch-py library, but it's easy to transpose the code to another library.

We'd also probably need to create a specific mapping for your assembled document of type doc_type, but it heavily depends on how you want to query it later on.

Anyway, based on our discussion above, I would then index json1 first

from elasticsearch import Elasticsearch
es_client = Elasticsearch(hosts=[{"host": "localhost", "port": 9200}])

json1 = { ...JSON of the first document you've received... }

// extract the unique ID
// note: you might want to only take 14d9e013794 and ditch "myserver.com-" if that prefix is always constant
doc_id = json1['data']['Message']['TransID']

// index the first document
es_client.index(index="my_index", doc_type="doc_type", id=doc_id, body=json1)

At this point json1 is stored in Elasticsearch. Then, when you later get your second document json2 you can proceed like this:

json2 = { ...JSON of the first document you've received... }

// extract the unique ID
// note: same remark about keeping only the second part of the id
doc_id = json2['TransID']

// make a partial update of your first document
es_client.update(index="my_index", doc_type="doc_type", id=doc_id, body={"doc": {"SecondDoc": json2}})

Note that SecondDoc can be any name of your choosing here, it's simply a nested field that will contain your second document.

At this point you should have a single document having the id 14d9e013794 and the following content:

{
  "msg": "0",
  "tdxy": "1",
  "data": {
    "Metric": "true",
    "Severity": "warn",
    "Message": {
      "Session": "None",
      "TransId": "myserver.com-14d9e013794",
      "TransName": "dashboard.action",
      "Time": 0,
      "Code": 0,
      "CPUs": 8,
      "Lang": "en-GB",
      "Event": "false"
    },
    "EventTimestamp": "1433192761097"
  },
  "Timestamp": "1433732801097",
  "Host": "myserver.myspace.com",
  "Group": "UndefinedGroup",
  "SecondDoc": {
    "Message": "Hello World",
    "Session": "4B5ABE9B135B7EHD49343865C83AD9E079",
    "TransId": "myserver.com-14d9e013794",
    "TransName": "dashboard.action",
    "points": [
      {
        "Name": "service.myserver.com:9065",
        "Host": "myserver.com",
        "Port": "9065"
      }
    ],
    "Points Operations": 1,
    "Points Exceeded": 0,
    "HEADER.connection": "Keep-Alive",
    "PARAMETER._": "1432875392706"
  }
}

Of course, you can make any processing on json1 and json2 before indexing/updating them.


Related:


How to check for multiple attributes in a list


python,python-2.7
I am making a TBRPG game using Python 2.7, and i'm currently making a quest system. I wanted to make a function that checks all of the quests in a list, in this case (quests), and tells you if any of of the quests in the list have the same...

represent an index inside a list as x,y in python


python,list,numpy,multidimensional-array
I have a list which contains 1000 integers. The 1000 integers represent 20X50 elements of dimensional array which I read from a file into the list. I need to walk through the list with an indicator in order to find close elements to each other. I want that my indicator...

odoo v8 - Field(s) `arch` failed against a constraint: Invalid view definition


python,xml,view,odoo,add-on
I want to create a new view with a DB-view. When I try to install my app, DB-view was created then I get error: 2015-06-22 12:59:10,574 11988 ERROR odoo openerp.addons.base.ir.ir_ui_view: Das Feld `datum` existiert nicht Fehler Kontext: Ansicht `overview.tree.view` [view_id: 1532, xml_id: k. A., model: net.time.overview, parent_id: k. A.] 2015-06-22...

How to remove structure with python from this case?
python,python-2.7
How to remove "table" from HTML using python? I had case like this: paragraph = ''' <p>Lorem ipsum dolor sit amet, consectetur adipisicing elit. Quidem molestiae consequuntur officiis corporis sint.<br /><br /> <table> <tr> <td> text title </td> <td> text title 2 </td> </tr> </table> <p> lorem ipsum</p> ''' how...

Can't save json data to variable (or cache) with angularjs $http.get


json,angularjs,web-services,rest
I have weird angularjs problem. I'm trying to fetch data from Rest Webservice. It works fine, but I can't save json data to object. My code looks like: services.service('customerService', [ '$http', '$cacheFactory', function($http, $cacheFactory) { var cache = $cacheFactory('dataCache'); var result = cache.get('user'); this.getById = function(id){ $http.get(urlList.getCustomer + id).success(function(data, status,...

Response 200 OK but Jquery shows error?


javascript,jquery,json
I am stuck in kind of a weird problem where I am trying to make a JSONP request to the Open Weather history API to get the weather information for the previous years on the a particular date. fetchHistoryUrl = 'http://78.46.48.103/data/3.0/days?day=0622&mode=list&lat=39.77476949&lon=-100.1953125'; $.ajax({ method: 'get', url: fetchHistoryUrl, success: function(response){ console.log(response); },...

Calling function and passing arguments multiple times


python,function,loops
I want to call the function multiple time and use it's returned argument everytime when it's called. For example: def myfunction(first, second, third): return (first+1,second+1,third+1) 1st call: myfunction(1,2,3) 2nd call is going to be pass returned variables: myfunction(2,3,4) and loop it until defined times. How can I do such loop?...

Pandas Dataframe Complex Calculation


python,python-2.7,pandas,dataframes
I have the following dataframe,df: Year totalPubs ActualCitations 0 1994 71 191.002034 1 1995 77 2763.911781 2 1996 69 2022.374474 3 1997 78 3393.094951 I want to write code that would do the following: Citations of currentyear / Sum of totalPubs of the two previous years I want something to...

Link to another resource in a REST API: by its ID, or by its URL?


json,api,rest,api-design,hateoas
I am creating some APIs using apiary, so the language used is JSON. Let's assume I need to represent this resource: { "id" : 9, "name" : "test", "customer_id" : 12, "user_id" : 1, "store_id" : 3, "notes" : "Lorem ipsum example long text" } Is it correct to refer...

Replacing elements in an HTML file with JSON objects


javascript,json,replace
I am writing some Javascript to: Read in and Parse a JSON file Read in an HTML file (created as a template file) Replace elements in the HTML file with elements from the JSON file. It is only replacing the first element obj.verb. Can someone please help me figure out...

how to enable a entry by clicking a button in Tkinter?


python,tkinter
I need to activate many entries when button is clicked please do not write class based code, modify this code only because i need to change the whole code for the project as i did my whole project without classes from Tkinter import * import ttk x='disabled' def rakhi(): global...

sys.argv in a windows environment


python,windows,python-3.x
I'm attempting to learn python using the book 'a byte of python'. The code: import sys print('the command line arguments are:') for i in sys.argv: print(i) print('\n\nThe PYTHONPATH is', sys.path, '\n') outputs: the command line arguments are: C:/Users/user/PycharmProjects/helloWorld/module_using_sys.py The PYTHONPATH is ['C:\\Users\\user\\PycharmProjects\\helloWorld', 'C:\\Users\\user\\PycharmProjects\\helloWorld', 'C:\\Python34\\python34.zip', 'C:\\Python34\\DLLs', 'C:\\Python34\\lib', 'C:\\Python34', 'C:\\Python34\\lib\\site-packages']...

why i don't get return value javascript


javascript,jquery,html,json,html5
When i debug my code i can see i have value but i don't get value to createCheckBoxPlatform FN function createCheckBoxPlatform(data) { var platform = ""; $.each(data, function (i, item) { platform += '<label><input type="checkbox" name="' + item.PlatformName + ' value="' + item.PlatformSK + '">' + item.PlatformName + '</label>' +...

The event loop is already running


python,python-3.x,pyqt,pyqt4
I have the following 5 files: gui.py # -*- coding: utf-8 -*- from PyQt4 import QtCore, QtGui try: _fromUtf8 = QtCore.QString.fromUtf8 except AttributeError: def _fromUtf8(s): return s try: _encoding = QtGui.QApplication.UnicodeUTF8 def _translate(context, text, disambig): return QtGui.QApplication.translate(context, text, disambig, _encoding) except AttributeError: def _translate(context, text, disambig): return QtGui.QApplication.translate(context, text, disambig)...

SyntaxError: invalid syntax?


python,syntax
Good afternoon, I am developing a script in python and while I am trying to compile it from the terminator/terminal i always get this error, but I cannot understand where is the syntax error? File "_case1.py", line 128 print ('########################') ^ SyntaxError: invalid syntax Then I just change the position...

SQLAlchemy. 2 different relationships for 1 column


python,sqlalchemy
I have a simple many-to-many relationship with associated table: with following data: matches: users: users_mathces: ONE user can play MANY matches and ONE match can involve up to TWO users I want to realize proper relationships in both "Match" and "User" classes users_matches_table = Table('users_matches', Base.metadata, Column('match_id', Integer, ForeignKey('matches.id', onupdate="CASCADE",...

How to use template within Django template?


python,html,django,templates,django-1.4
I have the django template like below: <a href="https://example.com/url{{ mylist.0.id }}" target="_blank"><h1 class="title">{{ mylist.0.title }}</h1></a> <p> {{ mylist.0.text|truncatewords:50 }}<br> ... (the actual template is quite big) It should be used 10 times on the same page, but 'external' html elements are different: <div class="row"> <div class="col-md-12 col-lg-12 block block-color-1"> *django...

group indices of list in list of lists


python,list
I am looking for an elegant solution for the following problem. I have a list of ints and I want to create a list of lists where the indices with the same value are grouped together in the order of the occurrences of said list. [2, 0, 1, 1, 3,...

In sklearn, does a fitted pipeline reapply every transform?


python,scikit-learn,pipeline,feature-selection
Apologies if this is obvious but I couldn't find a clear answer to this: Say I've used a pretty typical pipeline: feat_sel = RandomizedLogisticRegression() clf = RandomForestClassifier() pl = Pipeline([ ('preprocessing', preprocessing.StandardScaler()), ('feature_selection', feat_sel), ('classification', clf)]) pl.fit(X,y) Now when I apply pl on a new set, pl.predict(X_classify); is RandomizedLogisticRegression going...

Python - Opening and changing large text files


python,replace,out-of-memory,large-files
I have a ~600MB Roblox type .mesh file, which reads like a text file in any text editor. I have the following code below: mesh = open("file.mesh", "r").read() mesh = mesh.replace("[", "{").replace("]", "}").replace("}{", "},{") mesh = "{"+mesh+"}" f = open("p2t.txt", "w") f.write(mesh) It returns: Traceback (most recent call last): File...

Replace nodejs for python?


python,node.js,webserver
i'm working in a HTML5 multiplayer game, and i need a server to sync player's movement, chat, battles, etc. So I'm looking for ways to use python instead nodejs, because i have I have more familiarity with python. The server is simple: var express = require('express'); var app = express();...

Why i get can not resolve method error in class android?


android,json
I use this tutorial: JSON and part of code to this: protected String doInBackground(String... urls) { TextView lbl=(TextView) findViewById(R.id.provCODE); person = new Person(); person.setName(lbl.getText().toString()); //person.setCountry("1853"); //person.setTwitter("1892"); return POST(urls[0],person); } but in the this line: person.setName(lbl.getText().toString()); i get this error: Can not resolve method. How can i solve that?what happen? my...

Count function counting only last line of my list


python,python-2.7
Count function counting only last line of my list N = int(raw_input()) cnt = [] for i in range(N): string = raw_input() for j in range(1,len(string)): if string[j] =='K': cnt.append('R') elif string[j] =='R': cnt.append('R') if string[0] == 'k': cnt.append('k') elif string[0] == 'R': cnt.append('R') print cnt.count('R') if I am giving...

Inserting a variable in MongoDB specifying _id field


python,mongodb,pymongo
I want to insert a variable, say, a = {1:2,3:4} into my database with a particular id "56". It is very clear from the docs that I can do the following: db.testcol.insert({"_id": "56", 1:2, 3:4}) However, I cannot figure out any way to insert "a" itself, specifying an id. In...

Uncaught error: Invalid type for google table column


javascript,json,google-maps,google-visualization
I recently started using google visualization due to its simplicity. But while implementing in my recent project i seem to have run into a bit of trouble. The program shown is used to pull data from a JSON object and utilize it to display data in the google tables. Now...

Find the tf-idf score of specific words in documents using sklearn


python,scikit-learn,tf-idf
I have code that runs basic TF-IDF vectorizer on a collection of documents, returning a sparse matrix of D X F where D is the number of documents and F is the number of terms. No problem. But how do I find the TF-IDF score of a specific term in...

Insert data in collection at Meteor's startup


javascript,json,meteor,data,startup
I would like to insert data at Meteor's startup. (And after from a JSON file) At startup, I create a new account and I would like to insert data and link it to this account once this one created. This is the code that creates the new account at startup:...

Strange Behavior: Floating Point Error after Appending to List


python,python-2.7,behavior
I am writing a simple function to step through a range with floating step size. To keep the output neat, I wrote a function, correct, that corrects the floating point error that is common after an arithmetic operation. That is to say: correct(0.3999999999) outputs 0.4, correct(0.1000000001) outputs 0.1, etc. Here's...

Use JSON file to insert data in database


javascript,json,mongodb,meteor,data
I'm using my JSON file like this to insert data in my collection : var content = JSON.parse(Assets.getText('test.json')); console.log('inserting...'); Profiles.insert({ user: id, data:content }; But I would like to have a "data's tree" like that : [ user: "rtegert23423131", firstname:"test", surname:"test2", // ... ] Not like that : [ user:...

Displaying a 32-bit image with NaN values (ImageJ)


python,image-processing,imagej
I wrote a multilanguage 3-D image denoising ImageJ plugin that does some operations on an image and returns the denoised image as a 1-D array. The 1-D array contains NaN values (around the edges). The 1-D array is converted back into an image stack and displayed. It is simply black....

trying to understand LSH through the sample python code


python,similarity,locality-sensitive-hash
the concise python code i study for is here Question A @ line 8 i do not really understand the syntax meaning for "res = res << 1" for the purpose of "get_signature" Question B @ line 49 (SOLVED BY myself through another Q&A) "xor = r1^r2" does not really...

Using counter on array for one value while keeping index of other values


python,collections
After reading the answers on this question How to count the frequency of the elements in a list? I was wondering how to count the frequency of something, and at the same time retreive some extra information, through something like an index. For example a = ['fruit','Item#001'] b = ['fruit','Item#002']...

Matplotlib: Plot the result of an SQL query


python,sql,matplotlib,plot
from sqlalchemy import create_engine import _mssql from matplotlib import pyplot as plt engine = create_engine('mssql+pymssql://**:****@127.0.0.1:1433/AffectV_Test') connection = engine.connect() result = connection.execute('SELECT Campaign_id, SUM(Count) AS Total_Count FROM Impressions GROUP BY Campaign_id') for row in result: print row connection.close() The above code generates an array: (54ca686d0189607081dbda85', 4174469) (551c21150189601fb08b6b64', 182) (552391ee0189601fb08b6b73', 237304) (5469f3ec0189606b1b25bcc0',...

Identify that a string could be a datetime object


python,regex,algorithm,python-2.7,datetime
If I knew the format in which a string represents date-time information, then I can easily use datetime.datetime.strptime(s, fmt). However, without knowing the format of the string beforehand, would it be possible to determine whether a given string contains something that could be parsed as a datetime object with the...

access the json encoded object returned by php in jquery


php,jquery,ajax,json
I want to post some data to php function by ajax, then get the encoded json object that the php function will return, then I want to get the information (keys and values) from this object, but I don't know how, here is my code: $.ajax({ url: "functions.php", dataType: "JSON",...

Sum of two variables in RobotFramework


python,automated-tests,robotframework
I have two variables: ${calculatedTotalPrice} = 42,42 ${productPrice1} = 43,15 I executed ${calculatedTotalPrice} Evaluate ${calculatedTotalPrice}+${productPrice1} I got 42,85,15 How can I resolve it?...

Check for duplicates in JSON


javascript,jquery,json,duplicates
I have a json object: var object1 = [{ "value1": "1", "value2": "2", "value3": "3", }, { "value1": "1", "value2": "5", "value3": "7", }, { "value1": "6", "value2": "9", "value3": "5", }, { "value1": "6", "value2": "9", "value3": "5", }] Now I want to take each record out of that...

How to change the IP address of Amazon EC2 instance using boto library


python,amazon-web-services,boto
How can I assign a new IP address (or Elastic IP) to an already existing AWS EC2 instance using boto library.

Python: can't access newly defined environment variables


python,bash,environment-variables
I can't access my env var: import subprocess, os print os.environ.get('PATH') # Works well print os.environ.get('BONSAI') # doesn't work But the env var is well added in my /home/me/.bashrc: BONSAI=/home/me/Utils/bonsai_v3.2 export BONSAI And I can access this env var from a new terminal....

Python recursive function not recursing


python,recursion
I'm trying to solve a puzzle, which is to reverse engineer this code, to get a list of possible passwords, and from those there should be one that 'stands out', and should work function checkPass(password) { var total = 0; var charlist = "abcdefghijklmnopqrstuvwxyz"; for (var i = 0; i...

JQuery mutiple post in the same time


javascript,php,jquery,json
I have three functions ,and every function post to special php page to get data.. and every function need some time because every php script need some time .. function nb1() { $.post("p1.php", { action: 1 }, function(data) { console.log(data); }, "json") .fail(function(data) { console.log("error"); }); } function nb2() {...

How do I read this list and parse it?


python,list
I'm using requests and the output I get from the sites API is a list, I've been stuck trying to parse it to get the data from it. I use r = requests.get(urlas, params=params) r.json() to get the data I want. Here is a snippet of the list [{'relation_type': None,...

Python Popen - wait vs communicate vs CalledProcessError


python,python-2.7,error-handling,popen
Continuing from my previous question I see that to get the error code of a process I spawned via Popen in python I have to call either wait() or communicate() (which can be used to access the Popen stdout and stderr attributes): app7z = '/path/to/7z.exe' command = [app7z, 'a', dstFile.temp,...

Pandas - Dropping multiple empty columns


python,pandas
I have some tables where the first 11 columns are populated with data, but all columns after this are blank. I tried: df=df.dropna(axis=1,how='all') which didn't work. I then used: df = df.drop(df.columns[range(11,36)], axis=1) Which worked on the first few tables, but then some of the tables were longer or shorter...

How do variables inside python modules work?


python,module,python-module
I am coming from a Java background with Static variables, and I am trying to create a list of commonly used strings in my python application. I understand there are no static variables in python so I have written a module as follows: import os APP_NAME = 'Window Logger' APP_DATA_FOLDER_PATH...

How does the class_weight parameter in scikit-learn work?


python,scikit-learn
I am having a lot of trouble understanding how the class_weight parameter in scikit-learn's Logistic Regression operates. The Situation I want to use logistic regression to do binary classification on a very unbalanced data set. The classes are labelled 0 (negative) and 1 (positive) and the observed data is in...

do calculation inside JSONArray in Java


java,arrays,json
I have a simple issue but cannot solve it as I am not very good at algorithm! I have an JSONArray in this form: [{"values":[{"time":1434976493,"value":"50"}, {"time":1434976494,"value":"100"}],"counter":"counter1"}, {"values":[{"time":1434976493,"value":"200"}, {"time":1434976494,"value":"300"}],"counter":"counter2"}, {"values":[{"time":1434976493,"value":"400"}, {"time":1434976494,"value":"600"}],"counter":"total"}] What I want to do is to get the integer value for counter 1 and counter 2 and then divide...

Python: histogram/ binning data from 2 arrays.


python,histogram,large-files
I have two arrays of data: one is a radius values and the other is a corresponding intensity reading at that intensity: e.g. a small section of the data. First column is radius and the second is the intensities. 29.77036614 0.04464427 29.70281027 0.07771409 29.63523525 0.09424901 29.3639355 1.322793 29.29596385 2.321502 29.22783249...

Sort when values are None or empty strings python


python,list,sorting,null
I have a list with dictionaries in which I sort them on different values. I'm doing it with these lines of code: def orderBy(self, col, dir, objlist): if dir == 'asc': sorted_objects = sorted(objlist, key=lambda k: k[col]) else: sorted_objects = sorted(objlist, key=lambda k: k[col], reverse=True) return sorted_objects Now the problem...

Twilio Client Python not Working in IOS Browser


javascript,python,ios,flask,twilio
I have created a simple twilio client application to make phone calls from Web Browser to phones. I used a sample Flask app to generate a secure Capability Token and used twilio.min.js library to handle calls from my HTML. The functionality works fine in Computer Browsers ans Android Phone Browsers,...