FAQ Database Discussion Community


In the CoreNLP pipeline, is it possible to use the Coref tool (dcoref) with the new dependency parser tool (depparse)? [closed]

java,stanford-nlp
This is how you would normally initialize a pipeline to run on some text: //stanford NLP static Properties props = new Properties(); static StanfordCoreNLP pipeline; static void initStanfordPipeline() { // creates a StanfordCoreNLP object, with POS tagging, lemmatization, NER, parsing, and coreference resolution props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse,...

Stanford Core NLP example code SemanticGraph exception

stanford-nlp
I have just tried out the Core NLP example code, included as StanfordCoreNlpDemo.java with the download. When trying to parse a chapter of a book, an exception is thrown from the Semantic Graph: Exception in thread "main" java.lang.NullPointerException at edu.stanford.nlp.semgraph.SemanticGraph.removeEdge(SemanticGraph.java:122) at edu.stanford.nlp.trees.UniversalEnglishGrammaticalStructure.expandPPConjunction(UniversalEnglishGrammaticalStructure.java:553) at...

How to not split English into separate letters in the Stanford Chinese Parser

python,nlp,stanford-nlp,segment,chinese-locale
I am using the Stanford Segmenter at http://nlp.stanford.edu/software/segmenter.shtml in Python. For the Chinese segmenter, whenever it encounters a English word, it will split the word into many characters one by one, but I want to keep the characters together after the segmentation is done. For example: 你好abc我好 currently will become...

NLP Postagger can't grok imperatives?

stanford-nlp,pos-tagger
Stanford NLP postagger claims imperative verbs added to recent version. I've inputted lots of text with abundant and obvious imperatives, but there seems to be no tag for them on output. Must one, after all, train it for this pos?

How to suppress unmatched words in Stanford NER classifiers?

nlp,stanford-nlp,named-entity-recognition
I am new to Stanford NLP and NER and trying to train a custom classifier with a data sets of currencies and countries. My training data in training-data-currency.tsv looks like - USD CURRENCY GBP CURRENCY And, training data in training-data-countries.tsv looks like - USA COUNTRY UK COUNTRY And, classifiers properties...

configuring a separate model jar in stanford nlp

java,stanford-nlp
I have implemented a logic to use stanford nlp to get the location from the particular english sentence. I was using the following jar stanford-corenlp-3.2.0.jar stanford-corenlp-3.2.0-models.jar The logic that I wrote is following public static edu.stanford.nlp.pipeline.StanfordCoreNLP snlp; /** * @see ServletContextListener#contextInitialized(ServletContextEvent) */ public void contextInitialized(ServletContextEvent arg0) { Properties props =...

How to use Stanford CoreNLP java library with Ruby for sentiment analysis?

java,ruby,twitter,nlp,stanford-nlp
I'm trying to do sentiment analysis on a large corpus of tweets in a local MongoDB instance with Ruby on Rails 4, Ruby 2.1.2 and Mongoid ORM. I've used the freely available https://loudelement-free-natural-language-processing-service.p.mashape.com API on Mashape.com, however it starts timing out after pushing through a few hundred tweets in rapid...

StanfordNLP Tokenizer

tokenize,stanford-nlp,misspelling
I use StanfordNLP to tokenize a set of messages written with smartphones. These texts have a lot of typos and do not respect the punctuation rules. Very often the blank spaces are missing affecting the tokenization. For instance, the following sentences miss the blankspace in "California.This" and "university,founded". Stanford University...

Annotator for Relationship Extraction

regex,nlp,nltk,stanford-nlp,gate
I have a set of urls in a text file. For each url in that text file, I want to tag the entities and relationships in the text contained in that url. I am aware of the entity taggers like Stanford NER, NLTK and GATE which can perform the entity...

NLP Shift reduce parser is throwing null pointer Exception for Sentiment calculation

nlp,stanford-nlp,sentiment-analysis,shift-reduce
i am trying to find out sentiments using nlp.The version i am using is 3.4.1. I have some junk data to process and it looks around 45 seconds to process using default PCFG file. here is the example String text = "Nm n n 4 n n bkj nun4hmnun Onn...

NLP- Sentiment Processing for Junk Data takes time

nlp,stanford-nlp,sentiment-analysis,pos-tagger
I am trying to find the Sentiment for the input text. This test is a junk sentence and when I tried to find the Sentiment the Annotation to parse the sentence is taking around 30 seconds. For normal text it takes less than a second. If i need to process...

Get TypedDependencies using StanfordParser Shift Reduce Parser

stanford-nlp,shift-reduce
I am trying to use the Stanford Shift Reduce Parser with the Spanish model supplied. I am noticing, however, that unlike the Lexicalized Parser, I cannot get the TypedDependencies, despite sending the adequate flag -outputFormat typedDependencies, as it can be seen in lexparser.bat/sh. Just in case, this is the Java...

Chinese sentence segmenter with Stanford coreNLP

java,nlp,tokenize,stanford-nlp
I'm using the Stanford coreNLP system with the following command: java -cp stanford-corenlp-3.5.2.jar:stanford-chinese-corenlp-2015-04-20-models.jar -Xmx3g edu.stanford.nlp.pipeline.StanfordCoreNLP -props StanfordCoreNLP-chinese.properties -annotators segment,ssplit -file input.txt And this is working great on small chinese texts. However, I need to train a MT system which just requires me to segment my input. So I just need...

StanfordCoreNLP: Why multiple roots for SemanticGraph (e.g. dependency parsing)

nlp,stanford-nlp
In the the definition of the SemanticGraph class which is being used for Dependency Parsing. Here is the definition of the variable "roots" as a collection of vertices: private final Collection<IndexedWord> roots; My question is why collection? In what cases we would need more than one vertex as the root?...

an index of chinese characters organized by component radicals. stanford core nlp

java,jar,nlp,stanford-nlp
I want to use the one described here, part of the Stanford CoreNLP, as it looks promising but I can't understand how it works. I downloaded the entire CoreNLP but the .jar file mentioned in the README document, i.e. chinese_map_utils.jar is nowhere to be found. Do you think they're expecting...

Separately tokenizing and pos-tagging with CoreNLP

java,nlp,stanford-nlp
I'm having few problems with the way Stanford CoreNLP divides text into sentences, namely: It treats ! and ? (exclamation and question marks) inside a quoted text as a sentence end where it shouldn't, e.g.: He shouted "Alice! Alice!" - here it treats the ! after the first Alice as...

Stanford NLP: Chinese Part of Speech labels?

python,nlp,stanford-nlp,pos-tagger,part-of-speech
I am trying to find a table explaining each label in the Chinese part-of-speech tagger for the 2015.1.30 version. I couldn't find anything on this topic. The closest thing I could find was in the "Morphological features help POS tagging of unknown words across language varieties" article, but it doesn't...

CoreNLP API for N-grams?

nlp,stanford-nlp,n-gram,pos-tagger
Does CoreNLP have an API for getting unigrams, bigrams, trigrams, etc.? For example, I have a string "I have the best car ". I would love to get: I I have the the best car based on the string I am passing....

python corenlp batch parse

python,batch-processing,stanford-nlp
I am trying to batch parse document using corenlp python wrapper. batch_parse() gives generator, when I try to iterate over this generator it gives me following error: Invalid maximum heap size: -XmxTrue Error: Could not create the Java Virtual Machine. Here is my code : from corenlp import batch_parse corenlp_dir...

Different results performing Part of Speech tagging using Core NLP and Stanford Parser?

stanford-nlp,part-of-speech
The Part Of Speech (POS) models that Stanford parser and Stanford CoreNlp uses are different, that's why there is difference in the output of the POS tagging performed through Stanford Parser and CoreNlp. Online Core NLP Output The/DT man/NN is/VBZ smoking/NN ./. A/DT woman/NN rides/NNS a/DT horse/NN ./. Online Stanford...

using Wordnet with standford nlp

java,stanford-nlp,wordnet
I'm trying to get WordnetSynAnnotation of a token, it always returns null. I'm not sure what I'm missing, is there an annotator for WordnetSynAnnotation here is my full code http://pastebin.com/x3kjU04C...

Does the Stanford NER CRF implementation use sentences in the training phase?

stanford-nlp
I am new to CRFs and some of my terminology might be skewed so bear with me. I'm assuming the Stanford NER implements a linear chain CRF. Let x be a sequence of words and y the sequence of corresponding tags. Call x an example and y its label. A...

Setting intercept in Stanford-NLP LogisticClassifier

stanford-nlp,logistic-regression
I want to instantiate a Stanford-NLP LogisticClassifier using features/weights being read in from a text file (from a classifier trained separately). The classifier I've trained (in Python, using scikit-learn) consists of weights, features, and also an intercept term. On the Stanford-NLP end, though, the classifier constructor doesn't take an intercept....

Error when using StanfordCoreNLP

java,stanford-nlp
I'm trying to use Stanford CoreNLP as a library in my java program. I use IntelliJ as the IDE. I was trying to test the library, so I wrote this code: import edu.stanford.nlp.pipeline.StanfordCoreNLP; import java.util.Properties; /** * Created by Benjamin on 15/5/4. */ public class SentimentAnaTest { public static void...

How to use serialized CRFClassifier with StanfordCoreNLP prop 'ner'

java,nlp,stanford-nlp
I'm using the StanfordCoreNLP API interface to programmatically do some basic NLP. I need to train a model on my own corpus, but I'd like to use the StanfordCoreNLP interface to do it, because it handles a lot of the dry mechanics behind the scenes and I don't need much...

Instruction for training model in Stanford Core NLP

stanford-nlp,sentiment-analysis,training-data
I am a novice in the area of sentiment analysis, and I am very interested to learn about training models, could you please explain each of the instructions contained in the following command? java -mx8g edu.stanford.nlp.sentiment.SentimentTraining -numHid 25 -trainPath train.txt -devPath dev.txt -train -model model.ser.gz what is the function of:...

How to collect output from a Python subprocess

python,subprocess,stanford-nlp,python-multithreading
I am trying to make a python process that reads some input, processes it and prints out the result. The processing is done by a subprocess (Stanford's NER), for ilustration I will use 'cat'. I don't know exactly how much output NER will give, so I use run a separate...

Stanford coreNLP : how to get Label, position, and typed dependecies from parse Tree

stanford-nlp
I am using Stanford coreNLP to parse some text. I get multiple sentences. On these sentences I managed to extract Noun Phrases using TregexPattern. So I get a child Tree that is my Noun Phrase. I also managed to figure out the Head of the noun phrase. How is it...

Typed Dependency Parsing in NLTK Python

python,nltk,stanford-nlp
I have a sentence "I shot an elephant in my sleep" The typed dependency of the sentence is nsubj(shot-2, I-1) det(elephant-4, an-3) dobj(shot-2, elephant-4) prep(shot-2, in-5) poss(sleep-7, my-6) pobj(in-5, sleep-7) How do I get the typed dependency using Stanford Parser (or any parser) using NLTK (preferably, but anthing is fine)...

Swapping in Berkley parser in Stanford corenlp

nlp,stanford-nlp
I was using stanford nlp stack in my experiment and it was working nice until stanford PCFG parser started behaving weird for some of the sentences. I found http://tomato.banatao.berkeley.edu:8080/parser/parser.html the berkley parser giving correct parse tree for the sentences in my dataset. How could i swap in stanford pos tagger...

StanfordNLP lemmatization cannot handle -ing words

java,nlp,stanford-nlp,stemming,lemmatization
I've been experimenting with Stanford NLP toolkit and its lemmatization capabilities. I am surprised how it lemmatize some words. For example: depressing -> depressing depressed -> depressed depresses -> depress It is not able to transform depressing and depressed into the same lemma. Simmilar happens with confusing and confused, hopelessly...

StanfordNLP Spanish Tokenizer

tokenize,stanford-nlp
I want to tokenize a text in Spanish with StanfordNLP and my problem is that the model splits any word matching the pattern "\d*s " (a word composed by digits and ending with an "s") in two tokens. If the word finished with another letter, such as "e", the tokenizer...

Converting Stanford dependency relation to dot format

parsing,stanford-nlp
I am a newbie to this filed. I have dependency relation in the this form, amod(clarity-2, sound-1) nsubj(good-6, clarity-2) cop(good-6, is-3) advmod(good-6, also-4) neg(good-6, not-5) root(ROOT-0, good-6) nsubj(ok-10, camera-8) cop(ok-10, is-9) ccomp(good-6, ok-10) As mentioned in the links we have to convert this dependency relation to dot format and then...

Unknown symbol in nltk pos tagging for Arabic

python,nlp,nltk,stanford-nlp,pos-tagger
I have used nltk to tokenize some arabic text However, i ended up with some results like (u'an arabic character/word', '``') or (u'an arabic character/word', ':') However, they do not provide the `` or : in the documentation. hence i would like to find out what is this from nltk.toeknize.punkt...

Lazy parsing with Stanford CoreNLP to get sentiment only of specific sentences

java,performance,parsing,stanford-nlp,sentiment-analysis
I am looking for ways to optimize the performance of my Stanford CoreNLP sentiment pipeline. As a result, a want to get sentiment of sentences but only those which contain specific keywords given as an input. I have tried two approaches: Approach 1: StanfordCoreNLP pipeline annotating entire text with sentiment...

What is the default behavior of Stanford NLP's WordsToSentencesAnnotator when splitting a text into sentences?

nlp,stanford-nlp
Looking at WordToSentenceProcessor.java, DEFAULT_BOUNDARY_REGEX = "\\.|[!?]+"; led me to think that the text would get split into sentences based on ., ! and ?. However, if I pass the string D R E L I N. Okay. as input, e.g. using the command line interface: java -cp "*" -Xmx2g edu.stanford.nlp.pipeline.StanfordCoreNLP...

Stanford CorpNLP returning wrong results

java-7,stanford-nlp,eclipse-3.4,lemmatization
I am trying lemmatization with stanford corenlp following this question. My environment is:- Java 1.7 Eclipse 3.4.0 StandfordCoreNLP version 3.4.1 (downloaded from here). my code snippet is:- //...........lemmatization starts........................ Properties props = new Properties(); props.put("annotators", "tokenize, ssplit, pos, lemma"); StanfordCoreNLP pipeline = new StanfordCoreNLP(props, false); String text = "painting"; Annotation...

Why POS tagging algorithm tags `can't` as separate words?

stanford-nlp,pos-tagger
I'm using Stanford Log-linear Part-Of-Speech Tagger and here is the sample sentence that I tag: He can't do that When tagged I get this result: He_PRP ca_MD n't_RB do_VB that_DT As you can see, can't is split into two words, ca is marked as Modal (MD) and n't is marked...

In CoreNLP what is the different between the default generated dependency trees?

stanford-nlp
Using Stanford CoreNLP, I generate dependency trees using default annotators. I view the XML output with the XSLT transformation provided on the project's website. I see three dependency tree categories each very similar, and they are: Uncollapsed dependencies Collapsed dependencies Collapsed dependencies with CC processed See an example - http://nlp.stanford.edu/software/example.xml....

How to extract an unlabelled/untyped dependency tree from a TreeAnnotation using Stanford CoreNLP?

java,stanford-nlp
The target language is Spanish. The English pipeline has support for typed dependencies whereas the Spanish pipeline, to my knowledge, does not. The goal is to produce a dependency tree from a TreeAnnotation where the end result is a list of directed edges. Is this possible with CoreNLP 3.4.1 and...

Identify prepositons and individual POS

nlp,stanford-nlp
I am trying to find correct parts of speech for each word in paragraph. I am using Stanford POS Tagger. However, I am stuck at a point. I want to identify prepositions from the paragraph. Penn Treebank Tagset says that: IN Preposition or subordinating conjunction how, can I be sure...

Stanford CoreNLP Error creating edu.stanford.nlp.time.TimeExpressionExtractorImpl

c#,nlp,stanford-nlp
I am trying to learn the Stanford CoreNLP library. I am using C# with the posted example (https://sergeytihon.wordpress.com/2013/10/26/stanford-corenlp-is-available-on-nuget-for-fc-devs/). I loaded the package “Stanford.NLP.CoreNLP” (it added IKVM.NET) via nuget and downloaded the code. Unzipped the .jar models. My directory is correct. I get the following error: > edu.stanford.nlp.util.ReflectionLoading.ReflectionLoadingException was > unhandled...

Coreference resolution using Stanford CoreNLP

java,nlp,stanford-nlp
I am new to the Stanford CoreNLP toolkit and trying to use it for a project to resolve coreferences in news texts. In order to use the Stanford CoreNLP coreference system, we would usually create a pipeline, which requires tokenization, sentence splitting, part-of-speech tagging, lemmarization, named entity recoginition and parsing....

swiftly generate and sort full encoding dictionary and corresponding primary radicals

character-encoding,command-line-interface,stanford-nlp
Chinese characters, according to the unihan encoding schema, can be indexed by their primary radical. The Stanford Word Segmenter has a command that can execute this, as described in their documentation i.e. java -cp stanford-segmenter-VERSION.jar edu.stanford.nlp.trees.international.pennchinese.RadicalMap -infile whitespace_seperated_chinese_characters.input > each_character_denoted_by_radical.output I want to create a comprehensive table of chinese characters...

How can I get the edges containing the “root” modifier dependency in the Stanford NLP parser?

nlp,stanford-nlp
I have created the dependency graph for my scenario that takes a text input. SemanticGraph dependencies = sentence.get(CollapsedCCProcessedDependenciesAnnotation.class); I am successful in getting the "num" modifier dependency using the following code: List<SemanticGraphEdge> edgesContainingNumModifierDependency = dependencies.findAllRelns(GrammaticalRelation.valueOf("num")); However, I want to find the edges pertaining to the "root" and hence, the following...

Sentence-level to document-level sentiment analysis. Analysing news

stanford-nlp,sentiment-analysis
I need to perform sentiment analysis on news articles about a specific topic using the Stanford NLP tool. Such tool only allows sentence based sentiment analysis while I would like to extract a sentiment evaluation of the whole articles with respect to my topic. For instance, if my topic is...

Sentiment Analysis in Spanish with Stanford coreNLP

stanford-nlp,sentiment-analysis
I'm new here and wanted to know if anyone can help me with the following question. I'm doing sentiment analysis of text in Spanish and using Stanford CoreNLP but I can not get a positive result. That is, if I analyze any English text analyzes it perfect to put it...

how to make a light-weighted stanford-nlp.jar

stanford-nlp
I've noticed the whole library is quite large, ~300MB. But I'm only using tokenize, ssplit, pos. How can I make a light library? Many thanks. Best, Huang...

How can the NamedEntityTag be used as EntityMention in RelationMention in the RelationExtractor?

nlp,stanford-nlp
I'm trying to train my own NamedEntityRecognizer and RelationExtractor. I've managed the NER model, but the integration with the RelationExtractor is a bit tricky. I get the right NamedEntityTags, but the RelationMentions found by the are only one-term and with no extra NamedEntity than the default ones. I got input...

Stanford NLP: Sentence splitting without tokenization?

stanford-nlp
Can I detect sentences via the command line interface of Stanford NLP like Apache OpenNLP? https://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#tools.sentdetect Based on the docs, Stanford NLP requires tokenization as per http://nlp.stanford.edu/software/corenlp.shtml...

Detect relation between two persons in text

nlp,stanford-nlp,opennlp
Goal is to find all the pairs of persons between which there is any kind of relation in a piece of text. Particularly, if we have this piece of text: Alice Wilson, doctor with more than 30 years of experience in suppressing virus epidemics, has met with the president of...

Stanford coreNLP : can a word in a sentence be part of multiple Coreference chains

nlp,stanford-nlp
The question is in the title. Using Stanford's NLP coref module, I am wondering if a given word can be part of multiple coreference chains. Or can it only be part of one chain. Could you give me examples of when this might occur. Similarly, can a word be part...

NLP - Error while Tokenization and Tagging etc [duplicate]

java,nlp,stanford-nlp
This question already has an answer here: How to fix: Unsupported major.minor version 51.0 error? 30 answers I want to identify all the Tokens and also PartsOfSpeech Tagging using the Stanford NLP jar file. I have added all the required jar files into the build path of the project..The...

java.lang.NullPointerException while doing sentimental analysis with standford-nlp API

stanford-nlp
I am new to standford-nlp API. I am trying to just sentimental analysis with stanford API but its throwing exception. please see the below logs. Adding annotator tokenize Adding annotator ssplit Adding annotator pos Reading POS tagger model from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [1.4 sec]. Adding annotator lemma Adding annotator ner...

StanfordNLP does not extract relations between entities

stanford-nlp
I'm trying out the StanfordNLP Relation Extractor which according to the page at http://nlp.stanford.edu/software/relationExtractor.shtml has 4 relations that it can extract : Live_In, Located_In, OrgBased_In, Work_For. My code is : Properties props = new Properties(); props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref, relation"); StanfordCoreNLP pipeline = new StanfordCoreNLP(props); String...

Data format for Stanford POS-tagger

stanford-nlp,dataformat
I am re-training the Stanford POS-tagger on my own data. I have trained two other taggers on the same data in the following one-token-per-line format: word1_TAG word2_TAG word3_TAG word4_TAG . Is this format ok for the Stanford tagger, or does it need to be one-sentence-per-line? word1_TAG word2_TAG word3_TAG word4_TAG ....

ssplit.eolonly with Chinese text

stanford-nlp
I am trying to parse a raw Chinese text file (one line per sentence) with the Stanford NN Dependency Parser. For English text I was able to use the 'ssplit' annotator with the 'ssplit.eolonly' option in order to split the document into sentences, however this option seems to fail for...

Stanford Word Segmenter for Chinese in Python how to return results without punctuation

python,stanford-nlp,punctuation,chinese-locale
I am trying to segment a Chinese sentence with the Stanford Word Segmenter in Python, but currently the results has punctuation marks in it. I want to return results without the punctuations, only the words. What is the best way to do that? I tried Googling for an answer, but...

How to Identify mentions in a text?

nlp,stanford-nlp
I am looking for rule-based methods or any other methods to identify all mentions in a text. I have found several libraries that give coreferences but no exact options for only mentions. What I want is something like below: Input text: [This painter]'s indulgence of visual fantasy, and appreciation of...

incompatible types: Object cannot be converted to CoreLabel

stanford-nlp
I'm trying to use the Stanford tokenizer with the following example from their website: import java.io.FileReader; import java.io.IOException; import java.util.List; import edu.stanford.nlp.ling.CoreLabel; import edu.stanford.nlp.ling.HasWord; import edu.stanford.nlp.process.CoreLabelTokenFactory; import edu.stanford.nlp.process.DocumentPreprocessor; import edu.stanford.nlp.process.PTBTokenizer; public class TokenizerDemo { public static void main(String[] args) throws IOException { for (String arg : args) { // option...

Stanford CoreNLP wrong coreference resolution

nlp,stanford-nlp
I am still playing with Stanford's CoreNLP and I am encountering strange results on a very trivial test of Coreference resolution. Given the two sentences : The hotel had a big bathroom. It was very clean. I would expect "It" in sentence 2 to be coreferenced by "bathroom" or at...

Stanford Parser - Factored model and PCFG

parsing,nlp,stanford-nlp,sentiment-analysis,text-analysis
What is the difference between the factored and PCFG models of stanford parser? (In terms of theoretical working and mathematical perspective)

software to extract word functions like subject, predicate, object etc

nlp,stanford-nlp
I need to extract relations of the words in a sentence. I'm mostly interested in identifying a subject, predicate and an object. For example, for the follwoing sentence: She gave him a pen I'd like to have: She_subject gave_predicate him a pen_object. Is Stanford NLP can do that? I've tried...

Getting locations in standford core nlp

java,stanford-nlp
I am using stanford-corenlp-3.2.0.jar and stanford-corenlp-3.2.0-models.jar for identifying locations in a particular sentence. However I have observed that stanford-nlp is not able to identify the location if the word is passed in small case. For example: "Find a restaurant in London". Here stanford will identify London as location. However if...

How to integrate the GATE Twitter PoS model with Stanford NLP?

twitter,machine-learning,stanford-nlp,sentiment-analysis
I'm currently using Stanford NLP library for sentiment analysis of a twitter stream (version 3.3.0 but that’s not set.) I was looking for ways to increase the accuracy when I came across this https://gate.ac.uk/wiki/twitter-postagger.html I'm relatively new to sentiment analysis but am I right in saying that if I choose...

Choosing correct word for the given string

nlp,stanford-nlp
Suppose the given word is" connnggggggrrrraaatsss" and we need to convert it to congrats . Or for other example is "looooooovvvvvveeeeee" should be changed to "love" . Here the given words can be repeated for any number of times but it should be changed to correct form. We need to...

Text tokenization with Stanford NLP : Filter unrequired words and characters

java,machine-learning,tokenize,stanford-nlp
I use Stanford NLP for string tokenization in my classification tool. I want to get only meaning words, but I'm getting non-word tokens (like ---, >, . etc.) and not important words like am, is, to (stop words). Does anybody know way to solve this problem?

Efficient batch processing with Stanford CoreNLP

batch-file,stanford-nlp
Is it possible to speed up batch processing of documents with CoreNLP from command line so that models load only one time? I would like to trim any unnecessarily repeated steps from the process. I have 320,000 text files and I am trying to process them with CoreNLP. The desired...

CoreNLP ConLL format with CollapsedCCProcessedDependenciesAnnotation

parsing,stanford-nlp
I am using the recent version of CoreNLP. My task is to parse a text and get an output in conll format with CollapsedCCProcessedDependenciesAnnotation. I run the following command time java -cp $CoreNLP/javanlp-core.jar edu.stanford.nlp.pipeline.StanfordCoreNLP -props $CoreNLP/config.properties -file 12309959 -outputFormat conll depparse.model = english_SD.gz The problem is how to get CollapsedCCProcessedDependenciesAnnotation....

How to get the Stanford parser output as a list of nodes and edges?

java,graph,nodes,stanford-nlp,edges
I'm processing a batch of text files, and I need to use the Stanford parser's output as a numeric list of nodes and edges where Nodes have IDs and labels, edges consist of two node ids and an edge weight like: Node List: 1  A , 2  B... Edge list: 1 2 10,...

Stanford NLP: Tokenize output on a single line?

stanford-nlp
Can we have a tokenizer output on a single line like that of Apache OpenNLP with the command line tool? http://nlp.stanford.edu/software/tokenizer.shtml https://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#tools.tokenizer

Annotating a treebank with lexical information (Head Words) in JAVA

java,nlp,stanford-nlp,lexical-analysis
I have a treebank with syntactic parse tree for each sentence as given below: (S (NP (DT The) (NN government)) (VP (VBZ charges) (SBAR (IN that) (S (PP (IN between) (NP (NNP July) (CD 1971)) (CC and) (NP (NNP July) (CD 1992))) (, ,) (NP (NNP Rostenkowski)) (VP (VBD placed)...

How should I figure out the POS tag of “last” in this sentence?

stanford-nlp
I'm trying to use Stanford CoreNLP to tag a sentence. "How long does a soccer game last?" It seems on CoreNLP demo the token "last" is tagged as JJ instead of VB. Is there a way to fix this?...

Processing input before giving input to parser

parsing,stanford-nlp
What kind of processing should be done to the input which is given to the parser. As of know i am using stanford parser.jar but there is also stanford coreNLP.jar what is the difference between parser.jar and coreNLP.jar parsing method As per coreNLP documentation you can pass the operation you...

Stanford NLP - Using Parsed or Tagged text to generate Full XML

parsing,nlp,stanford-nlp,pos-tagging
I'm trying to extract data from the PennTreeBank, Wall Street Journal corpus. Most of it already has the parse trees, but some of the data is only tagged. i.e. wsj_DDXX.mrg and wsj_DDXX.pos files. I would like to use the already parsed trees and tagged data in these files so as...

How can I use Stanford NLP commercially?

stanford-nlp
I'm working on the company that makes toy cars that can talk with children. We want to use Stanford Core NLP as a parser. However, it is licensed in GPL: they doesn't allow using the NLP commercially. Can I purchase other license from Stanford NLP group? Or can't I use...

How to train a naive bayes classifier with pos-tag sequence as a feature?

machine-learning,nltk,stanford-nlp,text-classification,naivebayes
I have two classes of sentences. Each has reasonably distinct pos-tag sequence. How can I train a Naive-Bayes classifier with POS-Tag sequence as a feature? Does Stanford CoreNLP/NLTK (Java or Python) provide any method for building a classifier with pos-tag as a feature? I know in python NaiveBayesClassifier allows for...

How configure Stanford QNMinimizer to get similar results as scipy.optimize.minimize L-BFGS-B

java,optimization,machine-learning,scipy,stanford-nlp
I want to configurate the QN-Minimizer from Stanford Core NLP Lib to get nearly similar optimization results as scipy optimize L-BFGS-B implementation or get a standard L-BFSG configuration that is suitable for the most things. I set the standard paramters as follow: The python example I want to copy: scipy.optimize.minimize(neuralNetworkCost,...

Get list of annotators in Stanford CoreNLP

stanford-nlp
I'm customizing Stanford CoreNLP by adding some new Annotators, each one with its requirements. Is there a way to get the list of requirements and satisfactions from the StanfordCoreNLP object? For example, I instantiate the CoreNLP object: Properties props = new Properties(); props.setProperty("annotators", "tokenize, ssplit, pos, lemma"); StanfordCoreNLP pipeline =...

StanfordCoreNLP: why two different data structures for cons. parse and dependency parse?

nlp,stanford-nlp
Why Stanford:CoreNLP is using different data structures to represent its trees (e.g. dep. trees with 'BasicDependenciesAnnotation' and cons. tree with 'TreeAnnotation')? It seems like these annotation are representable with the same data structure (like a DAG with labels). Is there any mechanism to cast these to each other? (at least...

How to replace a word by its most representative mention using Stanford CoreNLP Coreferences module

java,nlp,stanford-nlp
I am trying to figure out the way to rewrite sentences by "resolving" (replacing words with) their coreferences using Stanford Corenlp's Coreference module. The idea is to rewrite a sentence like the following : John drove to Judy’s house. He made her dinner. into John drove to Judy’s house. John...