python,regex,string,loops,twitter , How to match words in 2 list against another string of words without sub-string matching in Python?


How to match words in 2 list against another string of words without sub-string matching in Python?

Question:

Tag: python,regex,string,loops,twitter

I have 2 lists with keywords in them:

slangNames = [Vikes, Demmies, D, MS Contin]
riskNames = [enough, pop, final, stress, trade]

i also have a dictionary called overallDict, that contains tweets. The key value pairs are {ID: Tweet text) For eg:

{1:"Vikes is not enough for me", 2:"Demmies is okay", 3:"pop a D"}

I am trying to isolate only those tweets that have atleast one keyword from both slangNames and riskNames. So the tweet has to have any keyword from slangNames AND any keyword from riskNames. So from the above example, my code should return keys 1 and 3, i.e.,

{1:"Vikes is not enough for me", 3:"pop a D"}. 

But my code is picking up substrings instead of complete words. So basically, anything withthe letter 'D' is getting picked up. How do I match these as whole words and not substrings? Please help. Thanks!

My code so far is as below:

for key in overallDict:
    if any(x in overallDict[key] for x in strippedRisks) and (any(x in overallDict[key] for x in strippedSlangs)):
        output.append(key)

Answer:

Store slangNames and riskNames as sets, split the strings and check if any of the words appear in both sets

slangNames = set(["Vikes", "Demmies", "D", "MS", "Contin"])
riskNames = set(["enough", "pop", "final", "stress", "trade"])
d =  {1: "Vikes is not enough for me", 2:"Demmies is okay", 3:"pop a D"}

for k,v in d.items():
    spl = v.split() # split once
    if any(word in slangNames for word in spl) and any(word  in riskNames for word in spl):
        print(k,v)

Output:

1 Vikes is not enough for me
3 pop a D

Or use not set.isdisjoint:

slangNames = set(["Vikes", "Demmies", "D", "MS", "Contin"])
riskNames = set(["enough", "pop", "final", "stress", "trade"])
d =  {1: "Vikes is not enough for me", 2:"Demmies is okay", 3:"pop a D"}

for k,v in d.items():
    spl = v.split()
    if not slangNames.isdisjoint(spl) and not riskNames.isdisjoint(spl):
        print(k, v)

Using any should be the most efficient as we will short circuit on the first match. Two sets are disjoint if their intersection is an empty set so if if not slangNames.isdisjoint(spl) is True at least one common word appears.

If MS Contin is actually one word you also need to catch that:

import re
slangNames = set(["Vikes", "Demmies", "D"])
r = re.compile(r"\bMS Contin\b")
riskNames = set(["enough", "pop", "final", "stress", "trade"])
d =  {1: "Vikes is not enough for me", 2:"Demmies is okay", 3:"pop a D"}

for k,v in d.items():
    spl = v.split()
    if (not slangNames.isdisjoint(spl) or r.search(v)) and not riskNames.isdisjoint(spl):
        print(k,v)

Related:


How to write RegEx for inserting line break for line length more than 30 characters?


regex
I am using a text editor which lets use regular expression to find / replace text. I have a large text file. I want to insert new line in each lines which are more than 30 characters. I want the line to break after 30th character (doesnt matter if a...

Strange Behavior: Floating Point Error after Appending to List


python,python-2.7,behavior
I am writing a simple function to step through a range with floating step size. To keep the output neat, I wrote a function, correct, that corrects the floating point error that is common after an arithmetic operation. That is to say: correct(0.3999999999) outputs 0.4, correct(0.1000000001) outputs 0.1, etc. Here's...

Count function counting only last line of my list


python,python-2.7
Count function counting only last line of my list N = int(raw_input()) cnt = [] for i in range(N): string = raw_input() for j in range(1,len(string)): if string[j] =='K': cnt.append('R') elif string[j] =='R': cnt.append('R') if string[0] == 'k': cnt.append('k') elif string[0] == 'R': cnt.append('R') print cnt.count('R') if I am giving...

In sklearn, does a fitted pipeline reapply every transform?


python,scikit-learn,pipeline,feature-selection
Apologies if this is obvious but I couldn't find a clear answer to this: Say I've used a pretty typical pipeline: feat_sel = RandomizedLogisticRegression() clf = RandomForestClassifier() pl = Pipeline([ ('preprocessing', preprocessing.StandardScaler()), ('feature_selection', feat_sel), ('classification', clf)]) pl.fit(X,y) Now when I apply pl on a new set, pl.predict(X_classify); is RandomizedLogisticRegression going...

Python recursive function not recursing


python,recursion
I'm trying to solve a puzzle, which is to reverse engineer this code, to get a list of possible passwords, and from those there should be one that 'stands out', and should work function checkPass(password) { var total = 0; var charlist = "abcdefghijklmnopqrstuvwxyz"; for (var i = 0; i...

How does the class_weight parameter in scikit-learn work?


python,scikit-learn
I am having a lot of trouble understanding how the class_weight parameter in scikit-learn's Logistic Regression operates. The Situation I want to use logistic regression to do binary classification on a very unbalanced data set. The classes are labelled 0 (negative) and 1 (positive) and the observed data is in...

How to Match a string with the format: “20959WC-01” in php?


php,regex
i want to restrict a user to enter a value which is similar to the value "20959WC-01", means it must contains 5 integers followed by two character, a '-' and two integers, can anyone please give me a solution to sort out this problem. Thanks in advance :) ...

Get number from string


regex
I am trying to get the enclosed number between two slashes in a URL using regex. The code regex I have is not working, I am fairly new to regex and don't really understand it. The regex: http:\/\/?www\.?example\.com\/g\/(^\d$)\/\w The URL: http://www.example.com/g/1337/Game-Title Trying to get the "1337", which is the PlaceId....

How many characters are visible like a space, but are not space characters?


php,regex
If I want to discover the hexadecimal equivalent of a space in PHP I can play with bin2hex: php > echo var_dump(bin2hex(" ")); string(2) "20" I can also obtain space character from "20" php > echo var_dump(hex2bin("20")); string(1) " " But there exist Unicode versions of a "visible" space: php...

How to create the javascript regular expression for number with some special symbols


javascript,regex
what can be the java-script regular expression which gives the numbers with some symbols For example following condition must be pass. Number can start with $ Can have the . or , : symbols between and % sign at the send. Passing valus: $233 48.23% 278 22.33 45:23 10,000 Number...

How to remove structure with python from this case?
python,python-2.7
How to remove "table" from HTML using python? I had case like this: paragraph = ''' <p>Lorem ipsum dolor sit amet, consectetur adipisicing elit. Quidem molestiae consequuntur officiis corporis sint.<br /><br /> <table> <tr> <td> text title </td> <td> text title 2 </td> </tr> </table> <p> lorem ipsum</p> ''' how...

Sum of two variables in RobotFramework


python,automated-tests,robotframework
I have two variables: ${calculatedTotalPrice} = 42,42 ${productPrice1} = 43,15 I executed ${calculatedTotalPrice} Evaluate ${calculatedTotalPrice}+${productPrice1} I got 42,85,15 How can I resolve it?...

How do variables inside python modules work?


python,module,python-module
I am coming from a Java background with Static variables, and I am trying to create a list of commonly used strings in my python application. I understand there are no static variables in python so I have written a module as follows: import os APP_NAME = 'Window Logger' APP_DATA_FOLDER_PATH...

Displaying a 32-bit image with NaN values (ImageJ)


python,image-processing,imagej
I wrote a multilanguage 3-D image denoising ImageJ plugin that does some operations on an image and returns the denoised image as a 1-D array. The 1-D array contains NaN values (around the edges). The 1-D array is converted back into an image stack and displayed. It is simply black....

Get all prices with $ from string into an array in Javascript


javascript,regex,currency
var string = 'Our Prices are $355.00 and $550, down form $999.00'; How can I get those 3 prices into an array?...

Reg ex matching a word


regex
I need to match only first two files, out of four files listed below: ABD_DEF_GHIJ_20150611 ABD_DEF_GHIJ ABD_DEF_GHIJ_FX_20150611 ABD_DEF_GHIJ_FX I am using reg ex - ABD_DEF_GHIJ(_\d{8}|\b) and it's working fine. I would like to know if my solution is ok or there is any better alternate solution....

match line break except line begin with spcific word or blank line


regex,notepad++
If I have text that the line breaks is broken: Chapter 1 Lorem ipsum dolor sit amet, consectetur adipisci ng elit, sed do eiusmod tempor incididunt ut la bore et dolore magna aliqua. Ut enim ad minim ve niam, quis nostrud exercitation ullamco labo ris nisi ut aliquip ex ea...

Pandas Dataframe Complex Calculation


python,python-2.7,pandas,dataframes
I have the following dataframe,df: Year totalPubs ActualCitations 0 1994 71 191.002034 1 1995 77 2763.911781 2 1996 69 2022.374474 3 1997 78 3393.094951 I want to write code that would do the following: Citations of currentyear / Sum of totalPubs of the two previous years I want something to...

Twilio Client Python not Working in IOS Browser


javascript,python,ios,flask,twilio
I have created a simple twilio client application to make phone calls from Web Browser to phones. I used a sample Flask app to generate a secure Capability Token and used twilio.min.js library to handle calls from my HTML. The functionality works fine in Computer Browsers ans Android Phone Browsers,...

Finding embeded xpaths in a String


java,regex
I have a string where I have the user should be able to specify xpaths that will be evaluated at runtime. I was thinking about having a the following way to specify it. String = "Hi my name is (/message/user) how can i help you with (/message/message) "; How can...

Replace nodejs for python?


python,node.js,webserver
i'm working in a HTML5 multiplayer game, and i need a server to sync player's movement, chat, battles, etc. So I'm looking for ways to use python instead nodejs, because i have I have more familiarity with python. The server is simple: var express = require('express'); var app = express();...

Regex that allow void fractional part of number


c#,regex
@"[+-]?\d+(\.\d+)?" -this is a regex I have wrote for numbers it allows [+-] minus before the number digits before and digits after the point the question is how to change this to allow "not finished" values so that input of "5." - is fine too ?...

Inserting a variable in MongoDB specifying _id field


python,mongodb,pymongo
I want to insert a variable, say, a = {1:2,3:4} into my database with a particular id "56". It is very clear from the docs that I can do the following: db.testcol.insert({"_id": "56", 1:2, 3:4}) However, I cannot figure out any way to insert "a" itself, specifying an id. In...

How to check for multiple attributes in a list


python,python-2.7
I am making a TBRPG game using Python 2.7, and i'm currently making a quest system. I wanted to make a function that checks all of the quests in a list, in this case (quests), and tells you if any of of the quests in the list have the same...

Python - Opening and changing large text files


python,replace,out-of-memory,large-files
I have a ~600MB Roblox type .mesh file, which reads like a text file in any text editor. I have the following code below: mesh = open("file.mesh", "r").read() mesh = mesh.replace("[", "{").replace("]", "}").replace("}{", "},{") mesh = "{"+mesh+"}" f = open("p2t.txt", "w") f.write(mesh) It returns: Traceback (most recent call last): File...

Calling function and passing arguments multiple times


python,function,loops
I want to call the function multiple time and use it's returned argument everytime when it's called. For example: def myfunction(first, second, third): return (first+1,second+1,third+1) 1st call: myfunction(1,2,3) 2nd call is going to be pass returned variables: myfunction(2,3,4) and loop it until defined times. How can I do such loop?...

Regex with whitespaces and preceding zeros


regex,sas
I want to match the string 11 with a regular Expression in SAS. The 11 can be preceded by zero or more 0 and/or by white spaces. Any other character is not allowed. Likewise, if anything there should only be white spaces following the 11. Examples: Match: 0000011 11 11<space><space>...

PHP Regular Expressions Counting starting consonants in a string


php,regex
I need to find out how many starting consonants a word has. The number is used later in the program. The code below does work, I am wondering if it is possible to do this with a regular expression. $mystring ="SomeStringExample"; $mystring2 =("bcdfghjklmnpqrstvwxyzABCDFGHJKLMNPQRSTWVXYZ"); $var = strspn($mystring, $mystring2); Using a regular...

Identify that a string could be a datetime object


python,regex,algorithm,python-2.7,datetime
If I knew the format in which a string represents date-time information, then I can easily use datetime.datetime.strptime(s, fmt). However, without knowing the format of the string beforehand, would it be possible to determine whether a given string contains something that could be parsed as a datetime object with the...

trying to understand LSH through the sample python code


python,similarity,locality-sensitive-hash
the concise python code i study for is here Question A @ line 8 i do not really understand the syntax meaning for "res = res << 1" for the purpose of "get_signature" Question B @ line 49 (SOLVED BY myself through another Q&A) "xor = r1^r2" does not really...

How do I read this list and parse it?


python,list
I'm using requests and the output I get from the sites API is a list, I've been stuck trying to parse it to get the data from it. I use r = requests.get(urlas, params=params) r.json() to get the data I want. Here is a snippet of the list [{'relation_type': None,...

The event loop is already running


python,python-3.x,pyqt,pyqt4
I have the following 5 files: gui.py # -*- coding: utf-8 -*- from PyQt4 import QtCore, QtGui try: _fromUtf8 = QtCore.QString.fromUtf8 except AttributeError: def _fromUtf8(s): return s try: _encoding = QtGui.QApplication.UnicodeUTF8 def _translate(context, text, disambig): return QtGui.QApplication.translate(context, text, disambig, _encoding) except AttributeError: def _translate(context, text, disambig): return QtGui.QApplication.translate(context, text, disambig)...

group indices of list in list of lists


python,list
I am looking for an elegant solution for the following problem. I have a list of ints and I want to create a list of lists where the indices with the same value are grouped together in the order of the occurrences of said list. [2, 0, 1, 1, 3,...

Sort when values are None or empty strings python


python,list,sorting,null
I have a list with dictionaries in which I sort them on different values. I'm doing it with these lines of code: def orderBy(self, col, dir, objlist): if dir == 'asc': sorted_objects = sorted(objlist, key=lambda k: k[col]) else: sorted_objects = sorted(objlist, key=lambda k: k[col], reverse=True) return sorted_objects Now the problem...

How to use template within Django template?


python,html,django,templates,django-1.4
I have the django template like below: <a href="https://example.com/url{{ mylist.0.id }}" target="_blank"><h1 class="title">{{ mylist.0.title }}</h1></a> <p> {{ mylist.0.text|truncatewords:50 }}<br> ... (the actual template is quite big) It should be used 10 times on the same page, but 'external' html elements are different: <div class="row"> <div class="col-md-12 col-lg-12 block block-color-1"> *django...

Find the tf-idf score of specific words in documents using sklearn


python,scikit-learn,tf-idf
I have code that runs basic TF-IDF vectorizer on a collection of documents, returning a sparse matrix of D X F where D is the number of documents and F is the number of terms. No problem. But how do I find the TF-IDF score of a specific term in...

Pandas - Dropping multiple empty columns


python,pandas
I have some tables where the first 11 columns are populated with data, but all columns after this are blank. I tried: df=df.dropna(axis=1,how='all') which didn't work. I then used: df = df.drop(df.columns[range(11,36)], axis=1) Which worked on the first few tables, but then some of the tables were longer or shorter...

Using counter on array for one value while keeping index of other values


python,collections
After reading the answers on this question How to count the frequency of the elements in a list? I was wondering how to count the frequency of something, and at the same time retreive some extra information, through something like an index. For example a = ['fruit','Item#001'] b = ['fruit','Item#002']...

represent an index inside a list as x,y in python


python,list,numpy,multidimensional-array
I have a list which contains 1000 integers. The 1000 integers represent 20X50 elements of dimensional array which I read from a file into the list. I need to walk through the list with an indicator in order to find close elements to each other. I want that my indicator...

MySQL substring match using regular expression; substring contain 'man' not 'woman'


mysql,regex
I have an issue while I fetch data from database using regular expression. While I search for 'man' in tags it returns tags contains 'woman' too; because its substring. SELECT '#hellowomanclothing' REGEXP '^(.)*[^wo]man(.)*$'; # returns 0 correct, it contains 'woman' SELECT '#helloowmanclothing' REGEXP '^(.)*[^wo]man(.)*$'; # returns 0 incorrect, it can...

regex - Match filename with or without extension


regex,logstash-grok
Need a regex pattern to match all of the following: hello hello. hello.cc I tried \b\w+\.?\w+?\b, but this doesn't match "hello." (the second string mentioned above)....

Matplotlib: Plot the result of an SQL query


python,sql,matplotlib,plot
from sqlalchemy import create_engine import _mssql from matplotlib import pyplot as plt engine = create_engine('mssql+pymssql://**:****@127.0.0.1:1433/AffectV_Test') connection = engine.connect() result = connection.execute('SELECT Campaign_id, SUM(Count) AS Total_Count FROM Impressions GROUP BY Campaign_id') for row in result: print row connection.close() The above code generates an array: (54ca686d0189607081dbda85', 4174469) (551c21150189601fb08b6b64', 182) (552391ee0189601fb08b6b73', 237304) (5469f3ec0189606b1b25bcc0',...

Regular Expression for whole world


regex,c#-4.0,vb6
First of all, I use C# 4.0 to parse the code of a VB6 application. I have some old VB6 code and about 500+ copies of it. And I use a regular expression to grab all kinds of global variables from the code. The code is described as "Yuck" and...

SQLAlchemy. 2 different relationships for 1 column


python,sqlalchemy
I have a simple many-to-many relationship with associated table: with following data: matches: users: users_mathces: ONE user can play MANY matches and ONE match can involve up to TWO users I want to realize proper relationships in both "Match" and "User" classes users_matches_table = Table('users_matches', Base.metadata, Column('match_id', Integer, ForeignKey('matches.id', onupdate="CASCADE",...

Regex to remove `.` from a sub-string enclosed in square brackets


c#,.net,regex,string,replace
I have this regex in C#: \[.+?\] This regex extracts the sub-strings enclosed between square brackets. But before doing that I want to remove . inside these sub-strings. For example, the string hello,[how are yo.u?]There are [300.2] billion stars in [Milkyw.?ay]. should become hello,[how are you?]There are [3002] billion stars...

Python: histogram/ binning data from 2 arrays.


python,histogram,large-files
I have two arrays of data: one is a radius values and the other is a corresponding intensity reading at that intensity: e.g. a small section of the data. First column is radius and the second is the intensities. 29.77036614 0.04464427 29.70281027 0.07771409 29.63523525 0.09424901 29.3639355 1.322793 29.29596385 2.321502 29.22783249...

How to change the IP address of Amazon EC2 instance using boto library


python,amazon-web-services,boto
How can I assign a new IP address (or Elastic IP) to an already existing AWS EC2 instance using boto library.

Please can someone help me understand the exec method for regular expressions?


javascript,regex
The best place I have found for the exec method is Eloquent Javascript Chapter 9: "Regular expressions also have an exec (execute) method that will return null if no match was found and return an object with information about the match otherwise. An object returned from exec has an index...

Swing regular expression for phone number validation


java,regex
I want to validate phone number field in swing, so I am writing code to allow user to enter only digits, comma, spaces. For this I am using regular expression, when user enter characters or other than the pattern text field will consume. My code is not working. Can anyone...