{"id":1091,"date":"2023-01-23T16:11:04","date_gmt":"2023-01-23T16:11:04","guid":{"rendered":"https:\/\/ml-gis-service.com\/?p=1091"},"modified":"2023-01-23T16:11:05","modified_gmt":"2023-01-23T16:11:05","slug":"toolbox-name-and-frequency-of-unique-elements-from-a-list-in-python","status":"publish","type":"post","link":"https:\/\/ml-gis-service.com\/index.php\/2023\/01\/23\/toolbox-name-and-frequency-of-unique-elements-from-a-list-in-python\/","title":{"rendered":"Toolbox: Name and Frequency of unique elements from a List in Python"},"content":{"rendered":"\n<p>Sometimes we need to get names and counts of unique elements from a list or array. Usually, we use <code>pandas<\/code> method <code>.value_counts()<\/code> which returns a frequency table. In some cases, it is not convenient to use <code>pandas<\/code>. Maybe our processing pipeline uses only numpy (a common scenario with ML models), but at some intermediate data processing step, we need to check counts of categorical values. My current use case: I perform clustering and want to log how many values were assigned to each label.<\/p>\n\n\n\n<p>The function for this task is short:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">from typing import Dict, Iterable\nimport numpy as np\n\n\ndef get_categories_frequency(ds: Iterable) -> Dict:\n    \"\"\"\n    Function calculates how many times each unique category appears in a list.\n\n    Parameters\n    ----------\n    ds : Iterable\n        List-like object with categorical data.\n\n    Returns\n    -------\n    : Dict\n        Dict with ``{category: frequency}`` records.\n    \"\"\"\n    lbls, counts = np.unique(ds, return_counts=True)\n    zipped = dict(zip(lbls, counts))\n    return zipped\n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Count unique categories in a list<\/p>\n","protected":false},"author":1,"featured_media":1093,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2,18,3,17],"tags":[221,241,196,10,7,240],"class_list":["post-1091","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-engineering","category-data-science","category-python","category-scripts","tag-dictionary","tag-frequency","tag-list","tag-numpy","tag-python","tag-unique"],"_links":{"self":[{"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/posts\/1091","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/comments?post=1091"}],"version-history":[{"count":5,"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/posts\/1091\/revisions"}],"predecessor-version":[{"id":1097,"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/posts\/1091\/revisions\/1097"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/media\/1093"}],"wp:attachment":[{"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/media?parent=1091"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/categories?post=1091"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/tags?post=1091"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}