Top
Sp.4ML > Data Engineering  > Toolbox: MongoDB nested bson to the flattened DataFrame
Decorative Image of a bird's nest with blue eggs inside.

Toolbox: MongoDB nested bson to the flattened DataFrame

Nested structures in the MongoDB dumps are very common. Direct transformation of those entities to the DataFrame leads to the strange results where a single entry in a DataFrame is a whole dictionary. Do you want to parse those nested structures and create DataFrame with flattened columns? Use function from the toolbox!

import pandas as pd
from bson import json_util


def nested_bson_to_df(bson_file):
    """
    Function transforms input bson files (from the MongoDB) with nested structures into a DataFrame.

    INPUT:
    :param bson_file: (str) bson file path from the MongoDB database.

    OUTPUT:
    :returns: (pandas.DataFrame)
    """
    with open(bson_file, 'r') as inp_str:
        data = json_util.loads(inp_str.read())

    normalized = pd.json_normalize(data)
    return normalized

What happend?

  1. First we open bson_file as a string and parse it with json_util.loads() method from the bson package.
  2. Next we normalize nested structures of the data with pandas.json_normalize() method.
  3. Function returns the flattened DataFrame!

Szymon
No Comments
Add Comment
Name*
Email*