FAQ Database Discussion Community


agrep string matching in R

r,string-matching,tm,agrep,qdap
I have two list of some product names. My problem is "Operating system" is matching with "system", "cooling system",etc. But it has to match only with "Operating","OS". Another example is "Key Board" should be matched with "key" or "KB" but not with "Mother Board" or just "Board". How to give...

replace string in R giving a vector of patterns and vector of replacements

r,stringr,qdap
Given a string with different placeholders I want to replace, does R have a function that replace all of them given a vector of patterns and a vector of replacements? I have managed to accomplish that with a list and a loop > library(stringr) > tt_ori <- 'I have [%VAR1%]...

R qdap::mgsub, how to pass a pattern with a regular expression?

r,qdap
In a previous question (replace string in R giving a vector of patterns and vector of replacements) y found that mgsub does have as pattern a string that does not need to br escape. That is good when you want to replace text like '[%.+%]' as a literal string, but...

R: TM package Finding Word Frequency from a Single Column

r,tm,qdap
I've recently been working on trying to find the word frequency within a single column in a data.frame in R using the tm package. While the data.frame itself has many columns that are both numeric and character based, I'm only interested in a single column that is pure text. While...

Extract and count common word-pairs from character vector

r,regex-lookarounds,tm,qdap
How can someone find frequent pairs of adjacent words in a character vector? Using the crude data set, for example, some common pairs are "crude oil", "oil market", and "million barrels". The code for the small example below tries to identify frequent terms and then, using a positive lookahead assertion,...

Can't plot Zipf's law in R

r,distribution,tm,qdap
I have a big list of terms and their frequency loaded from a text file and I converted it to a table: myTbl = read.table("word_count.txt") # read text file colnames(myTbl)<-c("term", "frequency") head(myTbl, n = 10) > head(myTbl, n = 10) term frequency 1 de 35945 2 i 34850 3 \xe3n...