Toolbox: MongoDB nested bson to the flattened DataFrame
July 19, 2021
Nested structures in the MongoDB dumps are very common. Direct transformation of those entities to the DataFrame leads to the strange results where a single entry in a DataFrame is a whole dictionary. Do you want to parse those nested structures and create DataFrame with flattened columns? Use function from the toolbox!
import pandas as pd
from bson import json_util
def nested_bson_to_df(bson_file):
"""
Function transforms input bson files (from the MongoDB) with nested structures into a DataFrame.
INPUT:
:param bson_file: (str) bson file path from the MongoDB database.
OUTPUT:
:returns: (pandas.DataFrame)
"""
with open(bson_file, 'r') as inp_str:
data = json_util.loads(inp_str.read())
normalized = pd.json_normalize(data)
return normalized
What happend?
- First we open
bson_fileas a string and parse it withjson_util.loads()method from thebsonpackage. - Next we normalize nested structures of the
datawithpandas.json_normalize()method. - Function returns the flattened
DataFrame!
Subscribe
Login
0 Comments
Oldest