FAQ Database Discussion Community


Bag of words representation using sklearn plus Snowballstemmer

python,python-3.x,scikit-learn,nltk
I have a list with songs, something like list2 = ["first song", "second song", "third song"...] Here is my code: from sklearn.feature_extraction.text import CountVectorizer from nltk.corpus import stopwords vectorizer = CountVectorizer(stop_words=stopwords.words('english')) bagOfWords = vectorizer.fit(list2) bagOfWords = vectorizer.transform(list2) And it's working, but I want to stem a list of my words....

How Can I Access the Brown Corpus in Java (aka outside of NLTK)

java,nlp,nltk,corpus,tagged-corpus
I'm trying to write a program that makes use of natural language parts-of-speech in Java. I've been searching on Google and haven't found the entire Brown Corpus (or another corpus of tagged words). I keep finding NLTK information, which I'm not interested in. I want to be able to load...

NLTK: sentiment analysis: result one value

python,nltk
So sorry for posting this, as the answer probably is in either this: NLTK sentiment analysis is only returning one value or this post: Python NLTK not sentiment calculate correct but I don't get how to apply it to my code. I'm a huge newbie at Python and NLTK and...

How compare wordnet Synsets with another word?

python,nltk,wordnet
I need to check if some word its sysnset of another words.. for example : cat and dog .. first i need to find synsets of cat by this code: list= wn.synsets('cat') then the list of synsets are returned: [Synset('cat.n.01'), Synset('guy.n.01'), Synset('cat.n.03'), Synset('kat.n.01'), Synset('cat-o'-nine-tails.n.01'), Synset('caterpillar.n.02'), Synset('big_cat.n.01'), Synset('computerized_tomography.n.01'), Synset('cat.v.01'), Synset('vomit.v.01') So,...

Python NLTK - Making a 'Dictionary' from a Corpus and Saving the Number Tags

python,nlp,nltk,corpus,tagged-corpus
I'm not super experienced with Python, but I want to do some Data analytics with a corpus, so I'm doing that part in NLTK Python. I want to go through the entire corpus and make a dictionary containing every word that appears in the corpus dataset. I want to be...

Error when accessing synonyms in python using nltk?

python,python-2.7,nltk
I have written a very simple piece of code to try and print the synonyms associated with a word. import nltk from nltk.corpus import wordnet as wn wordNetSynset = wn.synsets('small') for synSet in wordNetSynset: for synWords in synSet.lemma_names: synonymList.add(synWords) print synonymList However, I get the following error: Traceback (most recent...

UnicodeDecodeError for medieval characters

python,unicode,encoding,utf-8,nltk
I am trying to run an nltk tokenize program on medieval texts. These texts use medieval characters such as yogh (ȝ), thorn (þ), and eth (ð). When I run the program (pasted below) with standard unicode (utf-8) encoding, I get the following error: Traceback (most recent call last): File "me_scraper_redux2.py",...

How to train a naive bayes classifier with pos-tag sequence as a feature?

machine-learning,nltk,stanford-nlp,text-classification,naivebayes
I have two classes of sentences. Each has reasonably distinct pos-tag sequence. How can I train a Naive-Bayes classifier with POS-Tag sequence as a feature? Does Stanford CoreNLP/NLTK (Java or Python) provide any method for building a classifier with pos-tag as a feature? I know in python NaiveBayesClassifier allows for...

Why did PortStemmer in NLTK converts my “string” into u“string”

python,nltk,sax,stemming
import nltk import string from nltk.corpus import stopwords from collections import Counter def get_tokens(): with open('comet_interest.xml','r') as bookmark: text=bookmark.read() lowers=text.lower() no_punctuation=lowers.translate(None,string.punctuation) tokens=nltk.word_tokenize(no_punctuation) return tokens #remove stopwords tokens=get_tokens() filtered = [w for w in tokens if not w in stopwords.words('english')] count = Counter(filtered) print count.most_common(10) #stemming from nltk.stem.porter import * def...

Navigate an NLTK tree (follow-up)

python,tree,nlp,nltk
I've asked the question how I can properly navigate through an NTLK tree. How do I properly navigate through an NLTK tree (or ParentedTree)? I would like to identify a certain leaf with the parent node "VBZ", then I would like to move from there further up the tree and...

NLTK fcfg sem value is awkward

python,nlp,nltk,context-free-grammar
My FCFG that I used for this sentence was S[SEM=<?vp(?np)>] -> NP[NUM=?n, SEM=?np] VP[NUM=?n,SEM=?vp] VP[NUM=?n,SEM=<?v(?obj)>] -> TV[NUM=?n,SEM=?v] DET NP[SEM=?obj NP[NUM=?n, SEM=?np] -> N[NUM=?n, SEM=?np] N[NUM=sg, SEM=<\P.P(I)>] -> 'I' TV[NUM=sg,SEM=<\x y.(run(y,x))>] -> 'run' DET -> "a" N[NUM=sg, SEM=<\P.P(race)>] -> 'race' I want to parse out the sentence "I run a race"...

Finding Word Stems in nltk python

nltk
! http://pik.vn/2015c740128e-11bc-40b0-8354-7fa58579d1d1.png -i don't know [0] in the bove how it work...

How do I remove 1 instance of x characters in a string and find the word it makes in Python3?

python,python-3.x,nltk
This is what I have so far, but I'm stuck. I'm using nltk for the word list and trying to find all the words with the letters in "sand". From this list I want to find all the words I can make from the remaining letters. import nltk.corpus.words.words() pwordlist =...

Python NLTK Brill Tagger does not have SymmetricProximateTokensTemplate, ProximateTokensTemplate, ProximateTagsRule, ProximateWordsRule

python,tags,nltk,pos-tagger
When i try importing, from nltk.tag.brill import SymmetricProximateTokensTemplate, ProximateTokensTemplate from nltk.tag.brill import ProximateTagsRule, ProximateWordsRule Python Throws Import Error, ImportError: cannot import name 'SymmetricProximateTokensTemplate' What's the problem? but this works from nltk.tag import brill ...

UnicodeDecodeError unexpected end of data while stemming over dataset

python,unicode,pandas,nltk,stemming
I am new to python and I am trying to work on a small chunk of Yelp! dataset which was in JSON but I converted to CSV, using pandas libraries and NLTK. While doing preprocessing of data, I first try to remove all the punctuations and also the most common...

Text mining and NLP: from R to Python

python,r,nltk,text-mining,tm
First of all, saying that I am new to python. At the moment, I am "translating" a lot of R code into python and learning along the way. This question relates to this one replicating R in Python (in there they actually suggest to wrap it up using rpy2, which...

Bigrams with list as input using NLTK module

python,nltk
I have following list of which I'd like to obtain the equivalent but rearranged in bigrams: filtered_words = ['friends', 'friend, 'know', 'hate', 'love', 'you?', 'like', 'name?'] Then, when applying the bigrams()-function, the following way: list(bigrams(filtered_words)) I get: 'list' object is not callable. I also tried list(bigrams([filtered_words])), with the same result....

Many Repeated Tuples in list

python,list,tuples,nltk
I am having a trouble to handle the tuples in the list. Let's assume that we have list which consists of a lot of tuples in it. simpleTag=[**('samsung', 'ADJ')**, ('user', 'NOUN'), ('huh', 'NOUN'), ('weird', 'NOUN'), (':', '.'), ('MDPai05', 'NOUN'), (':', '.'), ('Samsung', 'NOUN'), ('Electronics', 'NOUN'), ('to', 'PRT'), ('Build', 'NOUN'), ('$',...

TypeError: __init__() got an unexpected keyword argument 'shuffle'

python,nltk
I got this error while runing my code in python 2.6.6. And there is no issue while running in Python 3.4.3 usr/lib64/python2.6/site-packages/sklearn/feature_selection/univariate_selection.py:319: UserWarning: Duplicate scores. Result may depend on feature ordering.There are probably duplicate features, or you used a classification score for a regression task. warn("Duplicate scores. Result may depend...

ValueError when using a variable to call a function

python,variables,nltk
I am trying to write a simple script that starts with a word and then keeps printing words that rhyme with the one before it (i.e. egg, aaberg, mpeg). It uses NLTK. However whilst running the code I get an error: Traceback (most recent call last): File "C:\Users\myname\Google Drive\Python codes\Rhyming...

NLTK can't open files (UnicodeDecoreError)

python,nltk
I got a task to work with some files and I need to use NLTK. I work with Harry Potter books and short stories by J. K. Rowling. Some files are opened clerale, I can count words, sentences, etc., but I have a problem. When I try to open big...

Abbreviation Reference for NLTK Parts of Speach

python,nlp,nltk
I'm using nltk to find the parts of speech for each word in a sentence. It returns abbreviations that I both can't fully intuit and can't find good documentation for. Running: import nltk sample = "There is no spoon." tokenized_words = nltk.word_tokenize(sample) tagged_words = nltk.pos_tag(tokenized_words) print tagged_words Returns: [('There', 'EX'),...

Counting the number of specified words

python,count,nltk
I want to count the number of 'america' and 'citizen' in the 'inaugural' files on the files that start with 1789 and 1793 . cfd = nltk.ConditionalFreqDist( (target, file[:4]) for fileid in inaugural.fileids() for w in inaugural.words(fileid) for target in ['america', 'citizen'] if w.lower().startswith(target)) year = ['1789', '1793'] word =...

Negation handling in NLP

python,regex,nlp,nltk,text-processing
I'm currently working on a project, where I want to extract emotion from text. As I'm using conceptnet5 (a semantic network), I can't however simply prefix words in a sentence that contains a negation-word, as those words would simply not show up in conceptnet5's API. Here's an example: The movie...

Data Mining and Text Mining

nlp,bigdata,nltk,data-mining,text-mining
What is the difference between Data Mining and Text Mining? Both refers to the extraction of unstructured data to structured ones. Is both forms work in the same fashion? please provide a clarity on that.

Typed Dependency Parsing in NLTK Python

python,nltk,stanford-nlp
I have a sentence "I shot an elephant in my sleep" The typed dependency of the sentence is nsubj(shot-2, I-1) det(elephant-4, an-3) dobj(shot-2, elephant-4) prep(shot-2, in-5) poss(sleep-7, my-6) pobj(in-5, sleep-7) How do I get the typed dependency using Stanford Parser (or any parser) using NLTK (preferably, but anthing is fine)...

Get the word from a WordNet synset

python,python-3.x,nltk,wordnet
Given a synset like this: Synset("pascal_celery.n.01") I want to get the word(s) it represents: "pascal celery" Currently, I'm doing this: synset.name().split(".")[0] but this doesn't convert underscores into spaces. Is there an inbuilt way of doing this?...

FCFG error in NLTK, Python. Grammar Issue

python,nlp,nltk,context-free-grammar
A line in a feature-based context free grammar I am writing in Python using NLTK gives me the following error. Error parsing feature structure ADJ[SEM=<\x.x(\y.(some(y))>] -> 'some' ^ Expected logic expression I thought the expression after SEM= was a logic expression. ...

How do I use NLTK's default tokenizer to get spans instead of strings?

python,nltk,tokenize
NLTK's default tokenizer, nltk.word_tokenizer, chains two tokenizers, a sentence tokenizer and then a word tokenizer that operates on sentences. It does a pretty good job out of the box. >>> nltk.word_tokenize("(Dr. Edwards is my friend.)") ['(', 'Dr.', 'Edwards', 'is', 'my', 'friend', '.', ')'] I'd like to use this same algorithm...

Unknown symbol in nltk pos tagging for Arabic

python,nlp,nltk,stanford-nlp,pos-tagger
I have used nltk to tokenize some arabic text However, i ended up with some results like (u'an arabic character/word', '``') or (u'an arabic character/word', ':') However, they do not provide the `` or : in the documentation. hence i would like to find out what is this from nltk.toeknize.punkt...

Python 2.7: Lesk algorithm returns None

python-2.7,nltk,disambiguation,word-sense-disambiguation
I am creating a program that will disamiguate ambiguos words and I was using nltk. Now, when I came to the stage to use lesk algorithm I am having some trouble. For example, if I try: c = lesk('There sign bothered consider inverse logic namely mental illness substance abuse might...

Editing the NLTK Corpus

python,nltk,corpus,tagged-corpus
In addition to the corpus that comes with nltk I want to train it with my own corpus that follows the same part of speech rules. How can I find the corpus that it is using, and how can I add my own corpus (in addition, not as a replacement)?...

How to match integers in NLTK CFG?

python,regex,nlp,nltk
If I want to define a grammar in which one of the tokens will match an integer, how can i achieve it using nltk's string CFG? For example - S -> SK SO FK SK -> 'SELECT' SO -> '\d+' FK -> 'FROM' ...

NLTK PunktSentenceTokenizer ellipsis splitting

python,python-2.7,nltk,tokenize
I'm working with NLTK PunktSentenceTokenizer and I'm facing a situation where the a text containing multiple sentences separated by the ellipsis character (...). Here is the example I'm working on: >>> from nltk.tokenize import PunktSentenceTokenizer >>> pst = PunktSentenceTokenizer() >>> pst.sentences_from_text("Horrible customer service... Cashier was rude... Drive thru took hours......

Errors nltk.gaac.demo() is run

nltk
When I run nltk.gaac.demo() can you please help me if i am missing out on something ? I get the following errors. I am using nltk 3.0.1 Python 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 10:38:22) [MSC v.1600 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information....

how to check which version of nltk, scikit learn installed?

python,linux,shell,scikit-learn,nltk
In shell script I am checking whether this packages are installed or not, if not installed then install it. So withing shell script: import nltk echo nltk.__version__ but it stops shell script at import line in linux terminal tried to see in this manner: which nltk which gives nothing thought...

How to pass in an estimator to NLTK's NgramModel?

python,nlp,nltk,n-gram,linguistics
I am using NLTK to train a bigram model using a Laplace estimator. The contructor for the NgramModel is: def __init__(self, n, train, pad_left=True, pad_right=False, estimator=None, *estimator_args, **estimator_kwargs): After some research, I found that a syntax that works is the following: bigram_model = NgramModel(2, my_corpus, True, False, lambda f, b:LaplaceProbDist(f))...

Django webapp (on an Apache2 server) hangs indefintely when importing nltk in views.py

python,django,apache,ubuntu,nltk
To elaborate a little more from the title, I'm having issues importing nltk to use in a django web app. I've deployed the web app on an apache2 server. When I import nltk in views.py, the web page refuses to load and eventually times out after a few minutes of...

Print 10 most frequently occurring words of a text that including and excluding stopwords

python,nltk,word-frequency,find-occurrences
I got the question from here with my changes. I have following code: from nltk.corpus import stopwords >>> def content_text(text): stopwords = nltk.corpus.stopwords.words('english') content = [w for w in text if w.lower() in stopwords] return content How can I print the 10 most frequently occurring words of a text that...

use java in python 3.4 with nltk

java,python,nltk
I want to use stanford-tagger in my project but below error is occur. File "C:\Python34\lib\site-packages\nltk\tag\stanford.py", line 59, in tag return self.tag_sents([tokens])[0] File "C:\Python34\lib\site-packages\hazm\POSTagger.py", line 25, in tag_sents return super(stanford.POSTagger, self).tag_sents(refined) File "C:\Python34\lib\site-packages\nltk\tag\stanford.py", line 64, in tag_sents config_java(options=self.java_options, verbose=False) File "C:\Python34\lib\site-packages\nltk\internals.py", line 82, in config_java _java_bin =...

How can I find all substrings that have this pattern: some_word.some_other_word with python?

python,regex,nltk
I am trying to clean up some very noisy user-generated web data. Some people do not add a space after a period that ends the sentence. For example, "Place order.Call us if you have any questions." I want to extract each sentence, but when I try to parse a sentence...

How to properly navigate an NLTK parse tree?

python,tree,nlp,nltk
NLTK is driving me nuts again. How do I properly navigate through an NLTK tree (or ParentedTree)? I would like to identify a certain leaf with the parent node "VBZ", then I would like to move from there further up the tree and to the left to identify the NP...

Create Dictionary from Penn Treebank Corpus sample from NLTK?

python,dictionary,nlp,nltk,corpus
I know that the Treebank corpus is already tagged, but unlike the Brown corpus, I can't figure out how to get a dictionary of tags. For instance, >>> from nltk.corpus import brown >>> wordcounts = nltk.ConditionalFreqDist(brown.tagged_words()) This doesn't work on the Treebank corpus?...

Strip Numbers From String in Python [duplicate]

python,nltk
This question already has an answer here: Remove specific characters from a string in python 10 answers Is there an efficient way to strip out numbers from a string in python? Using nltk or base python? Thanks, Ben...

NLTK getting dependencies from raw text

python-2.7,nlp,nltk
I need get dependencies in sentences from raw text using NLTK. As far as I understood, stanford parser allows us just to create tree, but how to get dependencies in sentences from this tree I didn't find out (maybe it's possible, maybe not) So I've started using MaltParser. Here is...

filtering stopwords near punctuation

python,nlp,nltk
I am trying to filter out stopwords in my text like so: clean = ' '.join([word for word in text.split() if word not in (stopwords)]) The problem is that text.split() has elements like 'word.' that don't match to the stopword 'word'. I later use clean in sent_tokenize(clean), however, so I...

How to Break a sentence into a few words

python-2.7,parsing,nlp,nltk
I want to ask how to break a sentence into a few words, what this is using of NLP (Natural Language Processing) in python called NLTK or PARSER ?

Extract word from a list of synsets in NLTK for Python

python,nltk
Using this [x for x in wn.all_synsets('n')] I am able to get a list allnouns with all nouns from Wordnet with help from NLTK. The list allnouns looks like this Synset('pile.n.01'), Synset('compost_heap.n.01'), Synset('mass.n.03') and so on. Now I am able to get any element by using allnouns[2] and this should...

Find the sentence’s index (sentences in a list) of a specific word in Python

python,list,indexing,nltk
i currently have a file that contains a list that is looks like example = ['Mary had a little lamb' , 'Jack went up the hill' , 'Jill followed suit' , 'i woke up suddenly' , 'it was a really bad dream...'] I would like to find the index of...

How to un-stem a word in Python?

python,nlp,nltk
I want to know if there is anyway that I can un-stem them to a normal form? The problem is that I have thousands of words in different forms e.g. eat, eaten, ate, eating and so on and I need to count the frequency of each word. All of these...

Annotator for Relationship Extraction

regex,nlp,nltk,stanford-nlp,gate
I have a set of urls in a text file. For each url in that text file, I want to tag the entities and relationships in the text contained in that url. I am aware of the entity taggers like Stanford NER, NLTK and GATE which can perform the entity...

NLTK: Parsing sentences using a simple grammar and part of speech tags

python,parsing,nltk,grammar
For a sentence like "This is a simple sentence" which has been part of speech tagged to: [('This', 'DT'), ('is', 'VBZ'), ('a', 'DT'), ('simple', 'JJ'), ('sentence', 'NN')] And using the following grammar: my_grammar = nltk.CFG.fromstring(""" ... S -> DP VP ... DP -> Det NP ... NP -> Adj N...

Python, ImportError: cannot import name AbstractLazySequence

python,nltk,importerror
I am using nltk but the problem I am facing does not seem to be related to nltk specifically. I have a module named util.tokenize inside which there are some classes and I have the following first line: util/tokenizer.py from nltk.tokenize.regexp import RegexpTokenizer ... class SentTokenizer(object): def __init__(self, stem=False, pattern='[^\w\-\']+'):...

Refering to a directory in a Flask app doesn't work unless the path is absolute

python,flask,nltk
I downloaded nltk data into the data directory in my Flask app. The views reside in a blueprint in another directory on the same level as the data directory. In the view I'm trying to set the path to the data, but it doesn't work. nltk.data.path.append('../nltk_data/') This doesn't work. If...

How to find the Lexical Category of a word in wordnet using NLTK(python)

python,nltk,wordnet
The wordnet demo as shown here displays the lexical information of a file in its search result. for example the word motion has many lexical categories( as it has many "senses" ) one of them being "verb.motion".I have seen the other questions but they do not explain as to how...

How to remove a custom word pattern from a text using NLTK with Python

python,regex,nlp,nltk,tokenize
I am currently working on a project of analyzing the quality examination paper questions.In here I am using Python 3.4 with NLTK. So first I want to take out each question separately from the text.The question paper format is given below. (Q1). What is web 3.0? (Q2). Explain about blogs....

wordnet lemmatizer in NLTK is not working for adverbs [duplicate]

python,nlp,nltk,wordnet
This question already has an answer here: Getting adjective from an adverb in nltk or other NLP library 1 answer from nltk.stem import WordNetLemmatizer x = WordNetLemmatizer() x.lemmatize("angrily", pos='r') Out[41]: 'angrily' Here is reference documnetation for pos tags in nltk wordnet, http://www.nltk.org/_modules/nltk/corpus/reader/wordnet.html I may be missing some basic things....

Bad zip file error while using nltk pos tagger

python,nltk
I'm trying to use the NLTK POS-tagger, but am getting a "zipfile.BadZipfile: File is not a zip file" error. The error comes from this code: import nltk sentence = "I love python" tokens = nltk.word_tokenize(sentence) pos_tags = nltk.pos_tag(tokens) print nltk.ne_chunk(pos_tags, binary=True) I found this question related to my problem. Unfortunately...

Python: NLTK: access element of list of list

python,nltk
I have following list of lists, as a result after tokenizing: [['Who', 'are', 'you', '?'], ['I', 'do', 'not', 'know', 'who', 'you', 'are'], ['What', 'is', 'your', 'name', '?']] Now I would like to have a list containing the "simple" elements, e.g.: ['Who','are','you','?','I','do','not','know','who'...] I have already tried everything I could possibly think...

NLTK POS tagset help not working

nltk,pos-tagger
I downloaded nltk tagset help is not working. Whenever I try to access tagset meanings by:- nltk.help.upenn_tagset('NN') I get result as :- Traceback (most recent call last): File "<pyshell#30>", line 1, in <module> nltk.help.upenn_tagset('NN') File "C:\Python34\lib\site-packages\nltk\help.py", line 25, in upenn_tagset _format_tagset("upenn_tagset", tagpattern) File "C:\Python34\lib\site-packages\nltk\help.py", line 39, in _format_tagset tagdict =...

How to output NLTK chunks to file?

python,regex,file-io,nlp,nltk
I have this python script where I am using nltk library to parse,tokenize,tag and chunk some lets say random text from the web. I need to format and write in a file the output of chunked1,chunked2,chunked3. These have type class 'nltk.tree.Tree' More specifically I need to write only the lines...

Stanford Entity Recognizer (caseless) in Python Nltk

python,nlp,nltk
I am trying to figure out how to use the caseless version of the entity recognizer from NLTK. I downloaded http://nlp.stanford.edu/software/stanford-ner-2015-04-20.zip and placed it in the site-packages folder of python. Then I downloaded http://nlp.stanford.edu/software/stanford-corenlp-caseless-2015-04-20-models.jar and placed it in the folder. Then I ran this code in NLTK from nltk.tag.stanford import...

Semantics - creating grammar in NLTK

python,nltk,semantics,context-free-grammar
I'm trying to expand the simple-sem.fcfg (link here: https://github.com/nltk/nltk_teach/blob/master/examples/grammars/book_grammars/simple-sem.fcfg) so that it supports coordination of phrases. I want it to successfully parse a sentence like: Irene walks and Angus barks. Since this is represented as walk(Irene) & bark(Angus), I think the best way to achieve this is by adding a...

Removing URL features from tokens in NLTK

python,django,python-2.7,twitter,nltk
I'm building a little 'trending' algorithm. The tokeniser works as originally intended, bar a couple of hiccups around URLs, which are causing some problems. Obviously, as I'm pulling info from twitter, there are a lot of t.co URL shortner type links. I'd like to remove these as not 'words', preferably...

nltk sentence tokenizer, consider new lines as sentence boundary

python,nlp,nltk,tokenize
I am using nltk's PunkSentenceTokenizer to tokenize a text to a set of sentences. However, the tokenizer doesn't seem to consider new paragraph or new lines as a new sentence. >>> from nltk.tokenize.punkt import PunktSentenceTokenizer >>> tokenizer = PunktSentenceTokenizer() >>> tokenizer.tokenize('Sentence 1 \n Sentence 2. Sentence 3.') ['Sentence 1 \n...

Check if items in list a are found in list b and return list c with matching indexes of list b in Python

python,list,nltk
I have list a = ["string2" , "string4"] and list b = ["string1" , "string2" , "string3" , "string4" , "string5"] and I want to check if "string2" and "string4" from list a match those in list b and if it does, append list c with it's corresponding index in...

Parsing multiple sentences with MaltParser using NLTK

java,python,parsing,nlp,nltk
There have been many MaltParser and/or NLTK related questions: Malt Parser throwing class not found exception How to use malt parser in python nltk MaltParser Not Working in Python NLTK NLTK MaltParser won't parse Dependency parser using NLTK and MaltParser Dependency Parsing using MaltParser and NLTK Parsing with MaltParser engmalt...

Is there a Spanish to English dictionary for use with python 3? [closed]

python,dictionary,nltk,python-3.4
I am trying to create my own digital Spanish to English database by translating the entire Spanish corpus found in nltk 3.0 for python 3. I am using the Google tool-kit to do the translating and its proving to be a very slow process. i am wondering if there exists...

Python match string to string exactly

python,text,nltk
Given a string, I want to identify whether a two strings are within it. For example, given "The dog barks loudly .", I want to search for "dog" and "barks loudly". If the sentence were "The dogged man .", however, I would NOT want to match 'dog' to 'dogged'. I...

To find frequency of every word in text file in python

python,python-3.x,nltk
I want to find frequency of all words in my text file so that i can find out most frequently occuring words from them. Can someone please help me the command to be used for that. import nltk text1 = "hello he heloo hello hi " // example text fdist1...

how to create the bigram matrix in my code?

python,nltk
I want to make a matrix of the bigram model. How can I do it? Any suggestions which match my code, please? import nltk from collections import Counter import codecs with codecs.open("Pezeshki339.txt",'r','utf8') as file: for line in file: token=line.split() spl = 80*len(token)/100 train = token[:int(spl)] test = token[int(spl):] print(len(test)) print(len(train))...

different nltk results in django and at command line

python,django,django-views,nltk
I have a django 1.8 view that looks like this: def sourcedoc_parse(request, sourcedoc_id): sourcedoc = Sourcedoc.objects.get(pk=sourcedoc_id) nltk.data.path.append('/root/nltk_data') new_words = [] english_vocab = set(w.lower() for w in nltk.corpus.gutenberg.words()) #<---the line where the error occurs results = {} template = 'sourcedoc_parse.html' params = {'sourcedoc': sourcedoc,'results': results, 'new_words': new_words, 'BASE_URL': BASE_URL} return render_to_response(template,...

NLTK - Get and Simplify List of Tags

python,nltk,corpus,tagged-corpus
I'm using the Brown Corpus. I want some way to print out all the possible tags and their names (not just tag abbreviations). There are also quite a few tags, is there a way to 'simplify' the tags? By simplify I mean combine two extremely similar tags into one and...

TypeError: 'WordListCorpusReader' object has no attribute '__getitem__' while using nltk.classify.apply_features

python,machine-learning,nlp,classification,nltk
I'm following this tutorial to learn NaiveBayes on this site. The code I have is: from nltk.corpus import names from nltk.classify import apply_features def gender_features(word): return {'last_letter': word[-1]} labeled_names = ([(name, 'male') for name in names.words('male.txt')] + [(name, 'female') for name in names.words('female.txt')]) feature_sets = [(gender_features(n), gender) for (n, gender)...

Python NLTK pos_tag not returning the correct part-of-speech tag

python,machine-learning,nlp,nltk,pos-tagger
Having this: text = word_tokenize("The quick brown fox jumps over the lazy dog") And running: nltk.pos_tag(text) I get: [('The', 'DT'), ('quick', 'NN'), ('brown', 'NN'), ('fox', 'NN'), ('jumps', 'NNS'), ('over', 'IN'), ('the', 'DT'), ('lazy', 'NN'), ('dog', 'NN')] This is incorrect. The tags for quick brown lazy in the sentence should be:...

Python convert list of multiple words to single words

python,nlp,nltk
I have a list of words for example: words = ['one','two','three four','five','six seven'] # quote was missing And I am trying to create a new list where each item in the list is just one word so I would have: words = ['one','two','three','four','five','six','seven'] Would the best thing to do be...