Top
Sp.4ML > Data Engineering  > Toolbox: Check if the text from DataFrame is a part of another phrase with Python and Pandas
Toolbox: find matching words in a Pandas DataFrame

Toolbox: Check if the text from DataFrame is a part of another phrase with Python and Pandas

Imagine that we have a database with specific words. We expect them to be a part of a longer sentence; for example, they might appear in URL. Pandas has the method isin(), but it checks only exact matches. Another method named contains() checks if strings in our Series contain a specific phrase. However, we have a reversed problem. We want to find if strings in our Series are part of another text fragment.

How to do it? The minimally viable solution is to use lambda expression:

import pandas as pd

texts = ['apple', 'orange', 'berry']
phrase = 'https://www.apple.com/'

series = pd.Series(texts)

test_output = series.apply(lambda x: True if x in phrase else False)
print(test_output)
0     True
1    False
2    False
dtype: bool

Output is a boolean series that we can later use to find the records of interest.

Szymon
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x