FAQ Database Discussion Community


Using Topic Model, how should we set up a “stop words” list?

stop-words,lda,topic-modeling,text-classification
There are some standard stop lists, giving words like "a the of not" to be removed from corpus. However, I'm wondering, should the stop list change case by case? For example, I have 10K of articles from a journal, then because of the structure of an article, basically you will...

Solr Cloud Managed Resources

solr,solrcloud,synonym,stop-words
I am implementing Solr Cloud for the first time. I've worked with normal Solr and have that down pretty well, but I'm not finding a lot on what you can and can't do with Solr Cloud. So my question is about Managed Resources. I know you can CRUD stop words...

Including multi-word stopwords in Solr

solr,stop-words
Is it possible to include multi-word stopwords in stopfilterfactory of Solr? If yes, kindly tell me the way. Right now first I am putting all the multiple-word stopwords in synonyms.txt file and then using one synonym for all these words in stopwords.txt, but its not working....

How to delete certain words from a variable or a list python

python,text,replace,stop-words
common_words = set(['je', 'tek', 'u', 'još', 'a', 'i', 'bi', 's', 'sa', 'za', 'o', 'kojeg', 'koju', 'kojom', 'kojoj', 'kojega', 'kojemu', 'će', 'što', 'li', 'da', 'od', 'do', 'su', 'ali', 'nego', 'već', 'no', 'pri', 'se', 'li', 'ili', 'ako', 'iako', 'bismo', 'koji', 'što', 'da', 'nije', 'te', 'ovo', 'samo', 'ga', 'kako', 'će', 'dobro', 'to', 'sam',...