{"id":368,"date":"2021-07-19T17:25:02","date_gmt":"2021-07-19T17:25:02","guid":{"rendered":"https:\/\/ml-gis-service.com\/?p=368"},"modified":"2021-07-19T17:25:03","modified_gmt":"2021-07-19T17:25:03","slug":"toolbox-mongodb-nested-bson-to-the-flattened-dataframe","status":"publish","type":"post","link":"https:\/\/ml-gis-service.com\/index.php\/2021\/07\/19\/toolbox-mongodb-nested-bson-to-the-flattened-dataframe\/","title":{"rendered":"Toolbox: MongoDB nested bson to the flattened DataFrame"},"content":{"rendered":"\n<p>Nested structures in the MongoDB dumps are very common. Direct transformation of those entities to the DataFrame leads to the strange results where a single entry in a DataFrame is a whole dictionary. Do you want to parse those nested structures and create DataFrame with flattened columns? Use function from the toolbox!<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">import pandas as pd\nfrom bson import json_util\n\n\ndef nested_bson_to_df(bson_file):\n    \"\"\"\n    Function transforms input bson files (from the MongoDB) with nested structures into a DataFrame.\n\n    INPUT:\n    :param bson_file: (str) bson file path from the MongoDB database.\n\n    OUTPUT:\n    :returns: (pandas.DataFrame)\n    \"\"\"\n    with open(bson_file, 'r') as inp_str:\n        data = json_util.loads(inp_str.read())\n\n    normalized = pd.json_normalize(data)\n    return normalized<\/pre>\n\n\n\n<p>What happend?<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li>First we open <code>bson_file<\/code> as a string and parse it with <code>json_util.loads()<\/code> method from the <code>bson<\/code> package.<\/li><li>Next we normalize nested structures of the <code>data<\/code> with <code>pandas.json_normalize()<\/code> method.<\/li><li>Function returns the flattened <code>DataFrame<\/code>!<\/li><\/ol>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>How to read and flatten nested bson files with Python and pandas.<\/p>\n","protected":false},"author":1,"featured_media":369,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2,79,3,17],"tags":[85,84,80,64,82,83,81,7],"class_list":["post-368","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-engineering","category-pandas","category-python","category-scripts","tag-dataframe","tag-flatten","tag-mongodb","tag-pandas","tag-parse-nested-bson","tag-parse-nested-json","tag-pymongo","tag-python"],"_links":{"self":[{"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/posts\/368","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/comments?post=368"}],"version-history":[{"count":3,"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/posts\/368\/revisions"}],"predecessor-version":[{"id":533,"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/posts\/368\/revisions\/533"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/media\/369"}],"wp:attachment":[{"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/media?parent=368"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/categories?post=368"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ml-gis-service.com\/index.php\/wp-json\/wp\/v2\/tags?post=368"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}