Toolbox: MongoDB nested bson to the flattened DataFrame
Nested structures in the MongoDB dumps are very common. Direct transformation of those entities to the DataFrame leads to the strange results where a single entry in a DataFrame is a whole dictionary. Do you want to parse those nested structures and create DataFrame with flattened columns? Use function from the toolbox!
import pandas as pd from bson import json_util def nested_bson_to_df(bson_file): """ Function transforms input bson files (from the MongoDB) with nested structures into a DataFrame. INPUT: :param bson_file: (str) bson file path from the MongoDB database. OUTPUT: :returns: (pandas.DataFrame) """ with open(bson_file, 'r') as inp_str: data = json_util.loads(inp_str.read()) normalized = pd.json_normalize(data) return normalized
- First we open
bson_fileas a string and parse it with
json_util.loads()method from the
- Next we normalize nested structures of the
- Function returns the flattened