Top
Sp.4ML > Data Engineering  > Toolbox: Python List of Dicts to JSONL (json lines)
Python list of dicts to json lines

Toolbox: Python List of Dicts to JSONL (json lines)

Converting Python dict to JSON is very simple. We can use json module and its json.dump or json.dumps methods, and voila! We have our JSON. But nowadays, with unstructured data streams, the new type of JSON has become a popular choice: JSON Lines. It is a text file where each line is a valid JSON separated by the newline character \n. It is natural to think about this structure in terms of a Python list of dict(s). It could be a stream of website events with additional properties, for example, user actions or viewed products:

{"user": "xyz", "action": "click", "element": "submit button"}
{"user": "zla", "action": "products view", "items": ["product a", "product x"]}
{"user": "iks", "action": "add to cart", "items": ["product b"], "item properties": {"price": 3.5, "color": "silver"}}

How to store this object?

Here is a function that can be used for this task:

import gzip
import json


def dicts_to_jsonl(data_list: list, filename: str, compress: bool = True) -> None:
    """
    Method saves list of dicts into jsonl file.

    :param data: (list) list of dicts to be stored,
    :param filename: (str) path to the output file. If suffix .jsonl is not given then methods appends
        .jsonl suffix into the file.
    :param compress: (bool) should file be compressed into a gzip archive?
    """

    sjsonl = '.jsonl'
    sgz = '.gz'

    # Check filename

    if not filename.endswith(sjsonl):
        filename = filename + sjsonl

    # Save data
    
    if compress:
        filename = filename + sgz
        with gzip.open(filename, 'w') as compressed:
            for ddict in data:
                jout = json.dumps(ddict) + '\n'
                jout = jout.encode('utf-8')
                compressed.write(jout)
    else:
        with open(filename, 'w') as out:
            for ddict in data:
                jout = json.dumps(ddict) + '\n'
                out.write(jout)

The function works as follow:

  • It takes a list of dicts and filename as the main parameters. Optional compress parameter is used to gzip data – a handy feature if you stream or store those files.
  • In the first step, it creates a valid filename.
  • In the next step, if compression is set to True, the function opens the gzipped file, encodes each dict to string, and stores it in a gzipped file.
  • Otherwise, a dict is transformed to a string and stored in a plain JSON Lines file.
Szymon
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x