Toolbox: Shapefile to GeoJSON with Python
Shapefile is not an optimal data format and it’s main disadventage is a number of files that are a part of one object. We may read more about this problem here. Here we present a simple Python function to parse shapefile into:
- a single GeoJSON file with all geometries and features from the core shapefile,
- a single zipped GeoJSON that is approximately 2 times smaller than a raw JSON.
First, we take a look into a basic solution:
import os import geopandas as gpd def shp_2_json(shapefiles_path: str, output_path: str) -> None: """ Function transforms given directory with shapefiles into geojsons. :param shapefiles_path: (str) Path to the directory with shapefiles. :param output_path: (str) Path to the output geojson file. """ shp = '.shp' geojson = 'json' shapefiles = [fname for fname in os.listdir(shapefiles_path) if fname.endswith(shp)] for sfile in shapefiles: # Get and create paths spath = os.path.join(shapefiles_path, sfile) gjfile = sfile[:-3]+geojson gjpath = os.path.join(output_path, gjfile) # Load & save gdf = gpd.read_file(spath) gdf.to_file(gjpath, driver='GeoJSON')
This function requires Geopandas
module to work. It reads all shapefiles from a given directory and transforms them into GeoJSONs in a different directory. The pros of using JSON structure is that it is a single file and it easily integrated with non-relational Databases and Web Apps. The problem is the size of an output that is usually a few times larger than the full packet of shapefiles. If we want to just store or send data elsewhere then we should consider making an archive from it. For example with gzip
module. It is one condtion more in our function:
import gzip import os import shutil import geopandas as gpd def shp_2_json(shapefiles_path: str, output_path: str, compress=True) -> None: """ Function transforms given directory with shapefiles into geojsons. :param shapefiles_path: (str) Path to the directory with shapefiles. :param output_path: (str) Path to the output geojson file. :param compress: (bool) Compress output JSON into gz archive. """ shp = '.shp' geojson = 'json' shapefiles = [fname for fname in os.listdir(shapefiles_path) if fname.endswith(shp)] for sfile in shapefiles: # Get and create paths spath = os.path.join(shapefiles_path, sfile) gjfile = sfile[:-3]+geojson gjpath = os.path.join(output_path, gjfile) # Load & save gdf = gpd.read_file(spath) gdf.to_file(gjpath, driver='GeoJSON') if compress: with open(gjpath, 'rb') as f_in: gzipped = gjpath + '.gz' with gzip.open(gzipped, 'wb') as f_out: shutil.copyfileobj(f_in, f_out) # remove JSON os.remove(gjpath)
Now, we are compressing JSON file and we can gain 2 times smaller files than in the initial setting. The last line removes a raw file. The packages os
, shutil
and gzip
are included in the core bundle of Python. There’s no need to install them separately.