Top
Sp.4ML > Data Engineering  > Toolbox: Shapefile to GeoJSON with Python
Shapefiles to GeoJSON with Python and Geopandas

Toolbox: Shapefile to GeoJSON with Python

Shapefile is not an optimal data format and it’s main disadventage is a number of files that are a part of one object. We may read more about this problem here. Here we present a simple Python function to parse shapefile into:

  • a single GeoJSON file with all geometries and features from the core shapefile,
  • a single zipped GeoJSON that is approximately 2 times smaller than a raw JSON.

First, we take a look into a basic solution:

import os
import geopandas as gpd


def shp_2_json(shapefiles_path: str, output_path: str) -> None:
    """
    Function transforms given directory with shapefiles into geojsons.
    
    :param shapefiles_path: (str) Path to the directory with shapefiles.
    :param output_path: (str) Path to the output geojson file.
    """
    
    shp = '.shp'
    geojson = 'json'
    shapefiles = [fname for fname in os.listdir(shapefiles_path) if fname.endswith(shp)]
    
    for sfile in shapefiles:
        # Get and create paths
        spath = os.path.join(shapefiles_path, sfile)
        gjfile = sfile[:-3]+geojson
        gjpath = os.path.join(output_path, gjfile)
        
        # Load & save
        gdf = gpd.read_file(spath)
        gdf.to_file(gjpath, driver='GeoJSON')

This function requires Geopandas module to work. It reads all shapefiles from a given directory and transforms them into GeoJSONs in a different directory. The pros of using JSON structure is that it is a single file and it easily integrated with non-relational Databases and Web Apps. The problem is the size of an output that is usually a few times larger than the full packet of shapefiles. If we want to just store or send data elsewhere then we should consider making an archive from it. For example with gzip module. It is one condtion more in our function:

import gzip
import os
import shutil
import geopandas as gpd


def shp_2_json(shapefiles_path: str, output_path: str, compress=True) -> None:
    """
    Function transforms given directory with shapefiles into geojsons.
    
    :param shapefiles_path: (str) Path to the directory with shapefiles.
    :param output_path: (str) Path to the output geojson file.
    :param compress: (bool) Compress output JSON into gz archive.
    """
    
    shp = '.shp'
    geojson = 'json'
    shapefiles = [fname for fname in os.listdir(shapefiles_path) if fname.endswith(shp)]
    
    for sfile in shapefiles:
        # Get and create paths
        spath = os.path.join(shapefiles_path, sfile)
        gjfile = sfile[:-3]+geojson
        gjpath = os.path.join(output_path, gjfile)
        
        # Load & save
        gdf = gpd.read_file(spath)
        gdf.to_file(gjpath, driver='GeoJSON')
        if compress:
            with open(gjpath, 'rb') as f_in:
                gzipped = gjpath + '.gz'
                with gzip.open(gzipped, 'wb') as f_out:
                    shutil.copyfileobj(f_in, f_out)
            # remove JSON
            os.remove(gjpath)

Now, we are compressing JSON file and we can gain 2 times smaller files than in the initial setting. The last line removes a raw file. The packages os, shutil and gzip are included in the core bundle of Python. There’s no need to install them separately.

Szymon
Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x