Toolbox: MongoDB nested bson to the flattened DataFrame
July 19, 2021
Nested structures in the MongoDB dumps are very common. Direct transformation of those entities to the DataFrame leads to the strange results where a single entry in a DataFrame is a whole dictionary. Do you want to parse those nested structures and create DataFrame with flattened columns? Use function from the toolbox!
import pandas as pd from bson import json_util def nested_bson_to_df(bson_file): """ Function transforms input bson files (from the MongoDB) with nested structures into a DataFrame. INPUT: :param bson_file: (str) bson file path from the MongoDB database. OUTPUT: :returns: (pandas.DataFrame) """ with open(bson_file, 'r') as inp_str: data = json_util.loads(inp_str.read()) normalized = pd.json_normalize(data) return normalized
What happend?
- First we open
bson_file
as a string and parse it withjson_util.loads()
method from thebson
package. - Next we normalize nested structures of the
data
withpandas.json_normalize()
method. - Function returns the flattened
DataFrame
!
Subscribe
Login
0 Comments
Oldest