elasticsearch,playframework,elastic4s , How do I implement a PatternAnalyzer in elastic4s and elasticsearch to exclude result with a certain field


How do I implement a PatternAnalyzer in elastic4s and elasticsearch to exclude result with a certain field

Question:

Tag: elasticsearch,playframework,elastic4s

I'm trying to perform a query on my index and get all reviews that do NOT have a reviewer with a gravatar image. To do this I have implemented a PatternAnalyzerDefinition with a host pattern:

"^https?\\:\\/\\/([^\\/?#]+)(?:[\\/?#]|$)"

that should match and extract host of urls like:

https://www.gravatar.com/avatar/blablalbla?s=200&r=pg&d=mm

becomes:

www.gravatar.com

The mapping:

clientProvider.getClient.execute {
          create.index(_index).analysis(
            phraseAnalyzer,
            PatternAnalyzerDefinition("host_pattern", regex = "^https?\\:\\/\\/([^\\/?#]+)(?:[\\/?#]|$)")
          ).mappings(
"reviews" as (
             .... Cool mmappings
              "review" inner (
                "grade" typed LongType,
                "text" typed StringType index "not_analyzed",
                "reviewer" inner (
                  "screenName" typed StringType index "not_analyzed",
                  "profilePicture" typed StringType analyzer "host_pattern",
                  "thumbPicture" typed StringType index "not_analyzed",
                  "points" typed LongType index "not_analyzed"
                ),                    
               .... Other cool mmappings                    
              )
            ) all(false)
} map { response =>
      Logger.info("Create index response: {}", response)
    } recover {
      case t: Throwable => play.Logger.error("Error creating index: ", t)
    }

The query:

val reviewQuery = (search in path)
      .query(
        bool(
          must(
            not(
              termQuery("review.reviewer.profilePicture", "www.gravatar.com")
            )
          )
        )
      )
      .postFilter(
        bool(
          must(
            rangeFilter("review.grade") from 3
          )
        )
      )
      .size(size)
      .sort(by field "review.created" order SortOrder.DESC)

    clientProvider.getClient.execute {      
      reviewQuery
    }.map(_.getHits.jsonToList[ReviewData])

Check the index for the mapping:

reviewer: {
    properties: {
        id: {
            type: "long"
        },
        points: {
            type: "long"
        },
        profilePicture: {
            type: "string",
            analyzer: "host_pattern"
        },
        screenName: {
            type: "string",
            index: "not_analyzed"
        },
        state: {
            type: "string"
        },
        thumbPicture: {
            type: "string",
            index: "not_analyzed"
        }
    }
}

When i perform the query the pattern matching does not seem to work. I still get reviews with a reviewer that has a gravatar image. What am I doing wrong? Maybe I have misunderstood the PatternAnalyzer?

I'm using "com.sksamuel.elastic4s" %% "elastic4s" % "1.5.9",


Answer:

I guess once again RTFM is in order here:

The docs states:

IMPORTANT: The regular expression should match the token separators, not the tokens themselves.

meaning that in my case the matched token www.gravatar.com will not be a part of the tokens after analyzing the field.

Instead use the Pattern Capture Token Filter

First declare a new CustomAnalyzerDefinition:

val hostAnalyzer = CustomAnalyzerDefinition(
    "host_analyzer",
    StandardTokenizer,
    PatternCaptureTokenFilter(
      name = "hostFilter",
      patterns = List[String]("^https?\\:\\/\\/([^\\/?#]+)(?:[\\/?#]|$)"),
      preserveOriginal = false
    )
  )

Then add the analyzer to the field:

"review" inner (              
                "reviewer" inner (
                  "screenName" typed StringType index "not_analyzed",
                  "profilePicture" typed StringType analyzer "hostAnalyzer",
                  "thumbPicture" typed StringType index "not_analyzed",
                  "points" typed LongType index "not_analyzed"
                )
)

create.index(_index).analysis(
            someAnalyzer,
            phraseAnalyzer,
            hostAnalyzer
          ).mappings(

And voila. It works. A very nice tool for checking the tokens and the index is calling:

/[index]/[collection]/[id]/_termvector?fields=review.reviewer.profilePicture&pretty=true

Related:


How to know an object has changed compared to database


java,hibernate,jpa,playframework,playframework-1.x
I need to know if some fields of a model object has been changed before save because I need to compare the new values with the old ones. I can't touch the model classes are they are generated. My problem is that whenever I change an object in a controller...

How to have multiple regex based on or condition in elasticsearch?


elasticsearch
I want to get all 000ANT and 0BBNTA from id, is there something similar to terms which works with regexp or is there any other way? Otherwise I will have to query elasticsearch for each item say 000ANT and 0BBNTA. Please help. Below is something that I am trying out...

How to write search queries in kibana using Query DSL for Elasticsearch aggregation


elasticsearch,querydsl,kibana-4
I am working on ELK stack to process Apache access logs. Spent a lot of time understanding Query DSL format so that more complex queries can be written. Currently am facing issues with running the queries in kibana interface but the same queries work just fine when posted using curl...

How to read data in logs using logstash?


elasticsearch,logstash
I have just started log stash, i have log files in that log file whole object is printed in the logs, Since my object is huge i cant write the grok patterns to the whole object and also i expecting only two values out of those object. Can you please...

How to use arrays in lambda expressions?


c#,elasticsearch,nest
I am writing a program with NEST library of ElasticSearch. I want to write a lambda expression for a function with this argument: HighlighDescriptor<parentdocument> HighlighDescriptor.onFields (param Action<HighlightFieldDescriptor<ParentDocument>>[] fieldHighlighters) I don't know what is the array in the function argument?...

ElasticSearch- “No query registered for…”


search,indexing,elasticsearch
ElasticSearch returns me "No query registered for [likes_count]" error when trying to look up entries using the following query. The field likes_count is a new field of documents and does not exist in every document. The same query works without the sort part. Why does this error appear? Thanks {...

NEST - Using GET instead of POST/PUT for searching


c#,elasticsearch,nest
Is there a way to tell NEST to use GET instead of POST when performing searches? Similar to how the ElasticSearch documentation shows CURL using GET I'd like to use GET when using NEST instead of using POST as it currently does.

Elasticsearch: How to query using partial phrases in quotation marks


elasticsearch
I am trying to implement a search behavior that supports partial phrases. A possible search input could look like this: example "hello world" elasticsearch Now I want to get all documents, that contain the words example and elasticsearch as well as the phrase hello world. As this is a very...

PlayFramework: value as is not a member of Array[Byte]


scala,playframework
I want to make file download from a database using Play framework. But when I use this code I get this message: value as is not a member of Array[Byte] And if I change Ok(bytOfImage.as("image/jpg")) to Ok(bytOfImage) it works good but I get a file with a name: secondindex without...

Get document on some condition in elastic search java API


java,elasticsearch,elasticsearch-plugin
As I know we can parse document in elastic search, And when we search for a keyword, It will return the document using this code of java API:- org.elasticsearch.action.search.SearchResponse searchHits = node.client() .prepareSearch() .setIndices("indices") .setQuery(qb) .setFrom(0).setSize(1000) .addHighlightedField("file.filename") .addHighlightedField("content") .addHighlightedField("meta.title") .setHighlighterPreTags("<span class='badge badge-info'>") .setHighlighterPostTags("</span>") .addFields("*", "_source")...

Operator '??' cannot be applied to operands of type IQueryContainer and lambda expression


c#,elasticsearch,nest
I am trying to create a method to process a certain query. I follow an example posted on the Nest repository (line 60), but still the MatchAll is not recognized by the compiler and if I try to build the solution, the error that shows is: Operator '??' cannot be...

Why does Play Framework 2.3.8 Deployment not work?


java,playframework
I have written a program with the play framework 2.3.8 named quizSystem. It is using a memory database, some controller, viewclasses etc. Now I want to run my program on another machine (windows 7 with JDK 8u45 installed and the PATH set to the JDK), therefore I ran activator dist...

ElasticSearch: How to search on different fields that are not related that are arrays of objects


elasticsearch
I want to search on different fields that are not related that are arrays of objects. I cannot find out how. Given the following mapping and data entry: I want to give the user the ability to search all possible fields in any combination. The user would use a form...

Not able to access Kibana running in a Docker container on port 5601


elasticsearch,docker,dockerfile,kibana-4
I have built a docker image with the following Docker file. # gunicorn-flask FROM devdb/kibana MAINTAINER John Doe <[email protected]> ENV DEBIAN_FRONTEND noninteractive RUN apt-get update RUN apt-get install -y python python-pip python-virtualenv gunicorn # Setup flask application RUN mkdir -p /deploy/app COPY gunicorn_config.py /deploy/gunicorn_config.py COPY app /deploy/app RUN pip install...

Javascript: Altering an object where dot notation is used [duplicate]


javascript,jquery,elasticsearch
This question already has an answer here: How to access object properties containing special characters? 1 answer I'm building an Elasticsearch search interface. My method is to build the initial query object, and then alter it depending on the user input. In the filter part of my object, I...

Elasticsearch boost per field with function score


elasticsearch,lucene,solr-boost
I have a query with different query data for different fields and ORed results. I also want to favor hits with certain fields. Ideally this would only increase ranking but would not cause results that did not contain some of the terms in the other fields. This would skew results...

Docker container http requests limit


http,elasticsearch,docker
I'm new to Docker so, most likely, I'm missing something. I'm running a container with Elasticsearch, using this image. I'm able to setup everyhing correctly. After that I was a using a script developed by a collegue in order to insert some data, basically querying a MySQL database and making...

elastic search sort in aggs by column


sorting,elasticsearch,group-by,order
I am trying to sort in elastic search in aggs, equivalent in mysql "ORDER BY Title ASC/DESC". Here is the index structure: 'body' => array( 'mappings' => array( 'test_type' => array( '_source' => array( 'enabled' => true ), 'properties' => array( 'ProductId' => array( 'type' => 'integer', 'index' => 'not_analyzed'...

Bad scoring due to different maxDocs of IDF


elasticsearch
I have two documents with a field title of: News New Website If I search for the term new website the score for the News document is much higher than the other one which is obviously not what I want. I wrapped an explain around it and got: 'hits': [{'_explanation':...

ScalaJson: Traversing JSValue structure (JSONPath syntax) where key might be one of two different strings


json,scala,playframework,playframework-2.0,jsonpath
I need to retrieve information from a JsValue that may be structured in a few different ways buy the specific values I'm looking for will always be under similar names. So for example I could have something like: { "name" : "Watership Down", "location" : { "lat" : 51.235685, "long"...

How to reuse MappedColumnType in Table classes?


scala,playframework,slick
The use of MappedColumnType is demonstrated in this example: https://github.com/playframework/play-slick/blob/master/samples/computer-database/app/dao/ComputersDAO.scala#L21 How can I reuse dateColumnType in another table class?...

MultiMatch query with Nest and Field Suffix


c#,elasticsearch,nest
Using Elasticsearch I have a field with a suffix - string field with a .english suffix with an english analyser on it as shown in the following mapping ... "valueString": { "type": "string", "fields": { "english": { "type": "string", "analyzer": "english" } } } ... The following query snippet won't...

Convert existing SBT Scala application to Play


scala,playframework,sbt,akka
I've been working on building an application with Akka actors, and now that I've completed the actor-based business logic I'd like to give it a RESTful + websocket front-end. I'm trying to find instructions for how to setup Play within the context of an existing application. The only instructions I...

JQuery Grab text from td when button clicked


javascript,jquery,playframework
Using the Scala Play Framework in case there is something that adds to the question here. I have a test function ... <script> $(document).ready(function(){ $('.editbtn').click(function(){ $(this).html($(this).html() == 'edit' ? 'modify' : 'edit'); }); }); </script> and a table <table> <tr><td><button class="editbtn">edit</button></td></tr> <tr><td><button class="editbtn">edit</button></td></tr> <tr><td><button...

Elasticsearch - Query document missing an array value


elasticsearch
I would like to query my elasticsearch index in order to retrieve the documents that don't contain a specific value in an array. For instance, if my query is : { "query": { "bool": { "must": [ { "match_all": {} } ], "must_not": [], "should": [] } }, "from": 0,...

Play Framework Form Error Handling


scala,playframework,playframework-2.3,playframework-2.4
This is my view file containing the form that has to filled in by the user: @helper.form(call) { @helper.input(resumeForm("surname"), '_label -> "Surname") { (id, name, value, args) => <input name="@name" type="text" value="@value" placeholder="Enter your surname"> } } This is my custom field constructor: @(elements: helper.FieldElements) @if(!elements.args.isDefinedAt('showLabel) || elements.args('showLabel) == true)...

Global Exception Handling in play framework java 2.3.7


java,exception,playframework
I am trying to implement global exception handling in Play framework(RESTful) using Java. While searching I found out that extending GlobalSettings and overriding the onError method should handle all errors and give the appropriate response/view. But in the console where I execute "activator run", I still get the ugly stack...

Play Framework 2.4 don't accept “public static Result” for controllers


java,playframework
I am try to start an app using Play Framework 2.4 with JDK8 in Mac, when I download the base using ./activator new Project play-java the template code contains the next: Project/app/controlles/Application.java package controllers; import play.*; import play.mvc.*; import views.html.*; public class Application extends Controller { public Result index() {...

Re-index object with new fields


elasticsearch,nest
It seems like as long as the id field is maintained, its super easy to re-index a document by simply calling Index(), but is there a way to given an object was updated and new fields were added, to have it include these new fields in the index? I'm not...

Deploying a tar file from CI to heroku


heroku,playframework
I have a play application for my website. Currently, codeship builds it, and heroku deploys it once the build succeeds. The problem is that since the project has become too big, heroku is not able to build it within 15 minutes while codeship can still build it in less than...

Ignore dependency jars on dist task


playframework,sbt,sbt-native-packager
I have a Play 2.3.x app that consists of some subprojects, and I use dist to generate a zip package of the full application. The problem is that this package is too big, because it includes every dependency needed for the app to run (ex: scala, play libs, db driver(s),...

elasticsearch aggregation group by null key


elasticsearch
here is the data in my elasticsearch server: {"system": "aaa"}, {"system": "bbb"}, {"system": null} I want to get the statistics for system. then I did the query: { "aggs" : { "myAggrs" : { "terms" : { "field" : "system" } } } it gives me the result: { "key":...

NEST ElasticSearch.NET Escape Special Characters


c#,elasticsearch,nest
I have been experimenting with the use of the NEST client for Elastic Search, but seem to have hit a barrier when filtering on a term which contains special/reserved characters such as '/' Below is a JSON representation of my model.. "categories": { "count": 1, "default": "root/Hello/World/Category", } When submitting...

Play Framework: How to get the current port number


scala,playframework
How do I get the current port number in a Play application? I use scala.

ElasticSearch asynchronous post


database,post,asynchronous,elasticsearch,get
I'm posting data on my ElasticSearch database. I've noticed that data is not immediately available, it requires some milliseconds to show up in a GET request. I can live with that (after all, the calls are asynchronous so this behavior is expected) but in my test code I need to...

How to get duplicate field values in elastic search by field name without knowing its value


elasticsearch
I have a field "EmployeeName" in an elastic search index - and I would like to execute a query that will return me all the cases where there are duplicate values of "EmployeeName". Can this be done? I found more_like_this but this requires field value for "like_text". But my requirement...

How to compute the scores based on field data in elasticsearch


elasticsearch
I have the following fields in documents { name: "Pearl", age : 43, weight: 54, bodyWeight : 103, height : 1.8 } Now i want to get scores for the documents based on the bodyWeight to height ratio of the documents. How to do that?...

ElasticSearch REST - insert JSON string without using class


elasticsearch,elastic,elasticsearch-net
I am looking for an example where we can push below sample JSON string to ElasticSearch without using classes in REST api. { "UserID":1, "Username": "Test", "EmailID": "[email protected]" } We get the input as xml and we convert it to JSON string using NewtonSoft.JSON dll. I know REST api is...

Retrieving number of rows with non-empty one-to-many relation


playframework,playframework-2.3,ebean
I am using Play Framework 2 (Java) together with Ebean. Among my model classes, I have classA with a one-to-many relation to classB. The associated table for classB therefore has a field which either contains nullor an id for a classA entity. I would like to retrieve the number of...

Providing implicit value for singletons in Play Json library


json,scala,playframework,scala-macros
I have following configuration: sealed trait Status case object Edited extends Status case object NotEdited extends Status case class Tweet(content:String, status:Status) I want to use Play Json format, so I guess I have to have something like this(I don't want to do it in companion object): trait JsonImpl{ implicit val...

Query returns both documents instead of just one


c#,.net,elasticsearch,nest
var res = esclient.Search<MyClass>(q => q .Query(fq => fq .Filtered(fqq => fqq .Query(qq => qq.MatchAll()) .Filter(ff => ff .Bool(b => b .Must(m1 => m1.Term("macaddress", "mac")) .Must(m2 => m2.Term("another_field", 123)) ) ) ) ) ); As far as I can understand the bool and must together are the equivalent of the...

Strange behaviour of limit in Elasticsearch


python,elasticsearch
I tried two queries. First one looks like this (it simply lists all data): # listing 1 from elasticsearch import Elasticsearch from elasticsearch_dsl import Search, Q, F .... .... connection etc s = Search(using=db,index="reestr") rows = s.execute() for r in rows: print(r) listing 1 prints out all documents from the...

Elasticsearch aggregations over regex matching in a list


regex,elasticsearch
My documents in elasticsearch are of the form { ... dimensions : list[string] ... } I'd like to find all dimensions over all the documents that match a regex. I feel like an aggregation would probably do the trick, but I'm having trouble formulating it. For example, suppose I have...

Parsing Google Custom Search API for Elasticsearch Documents


json,python-2.7,elasticsearch,google-search-api
After retrieving results from the Google Custom Search API and writing it to JSON, I want to parse that JSON to make valid Elasticsearch documents. You can configure a parent - child relationship for nested results. However, this relationship seems to not be inferred by the data structure itself. I've...

Elasticsearch and C# - query to find exact matches over strings


c#,.net,database,elasticsearch,nest
I need a way to search documents using a plain exact match over two or multiple fields which are of type "string" and "integer". I'd like to avoid standard query as I don't care about scoring or best match, just a yes/no outcome if both the fields match or not....

Elasticsearch geospatial search, problems with index setup


elasticsearch,geospatial
I'm trying to search for documents previously added to an index, which has been configured to allow geospatial queries (or so I think). My elasticsearch instance is hosted on qbox.io. This is the code I wrote to create an index from the command line curl -XPOST username:[email protected]/events -d '{ "settings"...

Elasticsearch NumberFormatException when running two consecutive java tests


java,date,elasticsearch,numberformatexception,spring-data-elasticsearch
I have two test in a class, each of them containing the following query: SearchQuery searchQuery = new NativeSearchQueryBuilder().withQuery(matchAllQuery()).withFilter(rangeFilter("publishDate").lt(date)).build(); In one of the tests, the number of the results elasticsearchTemplate.count(searchQuery, Article.class), in the other one the returned values are verified elasticsearchTemplate.queryForPage(searchQuery,Article.class) If I run any of these two tests separately,...

Get elasticsearch result based on two keys


elasticsearch,elastic
I want to get all docs who's "PayerAccountId" should equal to "123" and "UsageStartDate" should be in range [2015-05-01 TO 2015-05-10] I am expecting something to run like this, curl -X GET http://192.168.1.3:9200/_all/_search -d '{"query" : {"match" : { "PayerAccountId:\"156023466485\" AND UsageStartDate:[2015-01-01 TO 2015-01-10]" }}}' Obviously it's not working any...

ElasticSearch - Configuration to Analyse a document on Indexing


elasticsearch
In a single request, I want to retrieve documents from a SOR, store them in ElasticSearch, and then search those documents using the ES search API. There seems to be some lag from the time the document is indexed and the time it is analyzed and ready to be searched....

Elasticsearch standard analyser stopwords


elasticsearch
I am trying to guess what is the default stopwords list in standard analyzer in elasticsearch. I run version 1.3.1, and it seems to me that the English list is used, because running a wildcard query like this { "wildcard" : { "name" : { "wildcard" : "*in*" } }...