Top
Sp.4ML > Data Engineering  > Toolbox: Shapefile to GeoJSON with Python

## Toolbox: Shapefile to GeoJSON with Python

Shapefile is not an optimal data format and it’s main disadventage is a number of files that are a part of one object. We may read more about this problem here. Here we present a simple Python function to parse shapefile into:

• a single GeoJSON file with all geometries and features from the core shapefile,
• a single zipped GeoJSON that is approximately 2 times smaller than a raw JSON.

First, we take a look into a basic solution:

import os
import geopandas as gpd

def shp_2_json(shapefiles_path: str, output_path: str) -> None:
"""
Function transforms given directory with shapefiles into geojsons.

:param shapefiles_path: (str) Path to the directory with shapefiles.
:param output_path: (str) Path to the output geojson file.
"""

shp = '.shp'
geojson = 'json'
shapefiles = [fname for fname in os.listdir(shapefiles_path) if fname.endswith(shp)]

for sfile in shapefiles:
# Get and create paths
spath = os.path.join(shapefiles_path, sfile)
gjfile = sfile[:-3]+geojson
gjpath = os.path.join(output_path, gjfile)

gdf.to_file(gjpath, driver='GeoJSON')

This function requires Geopandas module to work. It reads all shapefiles from a given directory and transforms them into GeoJSONs in a different directory. The pros of using JSON structure is that it is a single file and it easily integrated with non-relational Databases and Web Apps. The problem is the size of an output that is usually a few times larger than the full packet of shapefiles. If we want to just store or send data elsewhere then we should consider making an archive from it. For example with gzip module. It is one condtion more in our function:

import gzip
import os
import shutil
import geopandas as gpd

def shp_2_json(shapefiles_path: str, output_path: str, compress=True) -> None:
"""
Function transforms given directory with shapefiles into geojsons.

:param shapefiles_path: (str) Path to the directory with shapefiles.
:param output_path: (str) Path to the output geojson file.
:param compress: (bool) Compress output JSON into gz archive.
"""

shp = '.shp'
geojson = 'json'
shapefiles = [fname for fname in os.listdir(shapefiles_path) if fname.endswith(shp)]

for sfile in shapefiles:
# Get and create paths
spath = os.path.join(shapefiles_path, sfile)
gjfile = sfile[:-3]+geojson
gjpath = os.path.join(output_path, gjfile)

gdf.to_file(gjpath, driver='GeoJSON')
if compress:
with open(gjpath, 'rb') as f_in:
gzipped = gjpath + '.gz'
with gzip.open(gzipped, 'wb') as f_out:
shutil.copyfileobj(f_in, f_out)
# remove JSON
os.remove(gjpath)

Now, we are compressing JSON file and we can gain 2 times smaller files than in the initial setting. The last line removes a raw file. The packages os, shutil and gzip are included in the core bundle of Python. There’s no need to install them separately.

Subscribe
Notify of