Zonal Statistics: A Comprehensive Guide
Zonal statistics calculates aggregate statistics (e.g., mean, sum, maximum) of the pixel values of a raster that fall within areas defined by a Polygon dataset. The method has diverse applications such as determining vegetation health in an agricultural field, assessing surface water availability, or approximating building heights from DSM rasters.
Building Polygon features colored based on DSM aggregate values.
Applications
- Height approximation: Estimate the height of buildings from DSM
- Crop yield estimation: Determine vegetation health within crop polygons using NDVI rasters
- Water resource management: Assess surface water availability in watersheds using NDWI rasters
- Forestry: Track changes in forest cover over time within administrative boundaries
Implementing zonal statistics
The analysis involves first determining the pixels within a polygon and then calculating their aggregate statistics, such as their count, mean, sum, or maximum.
Implementation steps
To perform this operation, first load a raster and a vector dataset for spatially overlapping areas, then perform zonal aggregations as shown below using the xarray and GeoPandas libraries.
- Load overlapping the raster and vector data
- For the extent of each Polygon, clip the raster to create a mask array
- Perform an aggregation operation (e.g., mean, sum, max) on each masked array
- Return the results in a GeoDataFrame, where each row corresponds to aPolygonand the columns contain the value for the aggregate zonal statistics
Example UDF
This example UDF loads the building footprint table gdf and DSM raster arr from S3 only for a section defined by bbox, which corresponds to a map tile. It then calculates the zonal statistics and returns the results as the GeoDataFrame gdf_zonal. This process enables Fused Workbench to perform the calculation dynamically as users scroll and zoom in on the map.
@fused.udf
def udf(
    bbox: fused.types.TileGDF=None, min_zoom=15, table="s3://fused-asset/infra/building_msft_us/"
):
    dsm_to_tile = fused.load("https://github.com/fusedio/udfs/tree/eda5aec/public/DSM_JAXA_Example").utils.dsm_to_tile
    utils = fused.load("https://github.com/fusedio/udfs/tree/2a76b6a/public/common/").utils
    gdf = utils.table_to_tile(bbox, table, min_zoom)
    arr = dsm_to_tile(bbox, z_levels=[4, 6, 9, 11], verbose=False)
    gdf_zonal = utils.geom_stats(gdf, arr)
    return gdf_zonal
The UDF would return a GeoDataFrame like the following, with an aggregate stats column for each input Polygon:
                                                geometry      stats  count
660    POLYGON ((-122.39806 37.76221, -122.39806 37.7...      22.00      5
661    POLYGON ((-122.39881 37.76219, -122.39876 37.7...      21.99    193
1452   POLYGON ((-122.39619 37.76326, -122.39613 37.7...      13.50      2
1458   POLYGON ((-122.39774 37.76327, -122.39776 37.7...      11.00     28
3033   POLYGON ((-122.39680 37.76362, -122.39701 37.7...      10.20      5
...
Conclusion
While zonal statistics is a valuable tool, it's often just the starting point for more complex analyses. Combining zonal statistics with other techniques such timeseries analysis or machine learning approaches can provide even richer insights into your data.