{"id":526,"date":"2021-10-08T13:30:59","date_gmt":"2021-10-08T13:30:59","guid":{"rendered":"https:\/\/ml-gis-service.com\/?p=526"},"modified":"2021-10-08T13:35:29","modified_gmt":"2021-10-08T13:35:29","slug":"toolbox-check-if-the-text-from-dataframe-is-a-part-of-another-phrase-with-python-and-pandas","status":"publish","type":"post","link":"https:\/\/ml-gis-service.com\/index.php\/2021\/10\/08\/toolbox-check-if-the-text-from-dataframe-is-a-part-of-another-phrase-with-python-and-pandas\/","title":{"rendered":"Toolbox: Check if the text from DataFrame is a part of another phrase with Python and Pandas"},"content":{"rendered":"\n<p>Imagine that we have a database with specific words. <strong>We expect them to be a part of a longer sentence; <\/strong>for example, they might appear in URL. Pandas has the method <code>isin()<\/code>, but it checks only exact matches. Another method named <code>contains()<\/code> checks if strings in our <code>Series<\/code> contain a specific phrase. However, we have a reversed problem. We want to find if strings in our Series are part of another text fragment.<\/p>\n\n\n\n<p>How to do it? The minimally viable solution is to use lambda expression:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">import pandas as pd\n\ntexts = ['apple', 'orange', 'berry']\nphrase = 'https:\/\/www.apple.com\/'\n\nseries = pd.Series(texts)\n\ntest_output = series.apply(lambda x: True if x in phrase else False)\nprint(test_output)<\/pre>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">0     True\n1    False\n2    False\ndtype: bool<\/pre>\n\n\n\n<p>Output is a boolean series that we can later use to find the records of interest.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Test if a record from DataFrame is a part of other phrase<\/p>\n","protected":false},"author":1,"featured_media":527,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2,68,79,3,17],"tags":[110,113,116,64,7,112,111,115,114],"class_list":["post-526","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-engineering","category-natural-language-processing","category-pandas","category-python","category-scripts","tag-check-if-text-is-part-of-other-text","tag-lambda","tag-match-records-with-sentence","tag-pandas","tag-python","tag-records","tag-series","tag-similar-words-in-dataframe","tag-string-processing"],"_links":{"self":[{"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/posts\/526","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/comments?post=526"}],"version-history":[{"count":3,"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/posts\/526\/revisions"}],"predecessor-version":[{"id":532,"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/posts\/526\/revisions\/532"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/media\/527"}],"wp:attachment":[{"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/media?parent=526"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/categories?post=526"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/tags?post=526"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}