Toolbox: Python List of Dicts to JSONL (json lines)
April 27, 2022
Converting Python dict to JSON is very simple. We can use json
module and its json.dump
or json.dumps
methods, and voila! We have our JSON. But nowadays, with unstructured data streams, the new type of JSON has become a popular choice: JSON Lines. It is a text file where each line is a valid JSON separated by the newline character \n
. It is natural to think about this structure in terms of a Python list
of dict
(s). It could be a stream of website events with additional properties, for example, user actions or viewed products:
{"user": "xyz", "action": "click", "element": "submit button"} {"user": "zla", "action": "products view", "items": ["product a", "product x"]} {"user": "iks", "action": "add to cart", "items": ["product b"], "item properties": {"price": 3.5, "color": "silver"}}
How to store this object?
Here is a function that can be used for this task:
import gzip import json def dicts_to_jsonl(data_list: list, filename: str, compress: bool = True) -> None: """ Method saves list of dicts into jsonl file. :param data: (list) list of dicts to be stored, :param filename: (str) path to the output file. If suffix .jsonl is not given then methods appends .jsonl suffix into the file. :param compress: (bool) should file be compressed into a gzip archive? """ sjsonl = '.jsonl' sgz = '.gz' # Check filename if not filename.endswith(sjsonl): filename = filename + sjsonl # Save data if compress: filename = filename + sgz with gzip.open(filename, 'w') as compressed: for ddict in data: jout = json.dumps(ddict) + '\n' jout = jout.encode('utf-8') compressed.write(jout) else: with open(filename, 'w') as out: for ddict in data: jout = json.dumps(ddict) + '\n' out.write(jout)
The function works as follow:
- It takes a list of dicts and filename as the main parameters. Optional
compress
parameter is used to gzip data – a handy feature if you stream or store those files. - In the first step, it creates a valid filename.
- In the next step, if compression is set to
True
, the function opens the gzipped file, encodes each dict to string, and stores it in a gzipped file. - Otherwise, a dict is transformed to a string and stored in a plain JSON Lines file.
Subscribe
Login
0 Comments
Oldest