Zonal Statistics: A Comprehensive Guide
Zonal statistics calculates aggregate statistics (e.g., mean, sum, maximum) of the pixel values of a raster that fall within areas defined by a Polygon
dataset. The method has diverse applications such as determining vegetation health in an agricultural field, assessing surface water availability, or approximating building heights from DSM rasters.
Building Polygon
features colored based on DSM aggregate values.
Applications
- Height approximation: Estimate the height of buildings from DSM
- Crop yield estimation: Determine vegetation health within crop polygons using NDVI rasters
- Water resource management: Assess surface water availability in watersheds using NDWI rasters
- Forestry: Track changes in forest cover over time within administrative boundaries
Implementing zonal statistics
The analysis involves first determining the pixels within a polygon and then calculating their aggregate statistics, such as their count, mean, sum, or maximum.
Implementation steps
To perform this operation, first load a raster and a vector dataset for spatially overlapping areas, then perform zonal aggregations as shown below using the xarray
and GeoPandas
libraries.
- Load overlapping the raster and vector data
- For the extent of each
Polygon
, clip the raster to create a mask array - Perform an aggregation operation (e.g., mean, sum, max) on each masked array
- Return the results in a
GeoDataFrame
, where each row corresponds to aPolygon
and the columns contain the value for the aggregate zonal statistics
Example UDF
This example UDF loads the building footprint table gdf
and DSM raster arr
from S3 only for a section defined by bbox
, which corresponds to a map tile. It then calculates the zonal statistics and returns the results as the GeoDataFrame
gdf_zonal
. This process enables Fused Workbench to perform the calculation dynamically as users scroll and zoom in on the map.
@fused.udf
def udf(
bbox: fused.types.TileGDF=None, min_zoom=15, table="s3://fused-asset/infra/building_msft_us/"
):
dsm_to_tile = fused.load("https://github.com/fusedio/udfs/tree/eda5aec/public/DSM_JAXA_Example").utils.dsm_to_tile
utils = fused.load("https://github.com/fusedio/udfs/tree/2a76b6a/public/common/").utils
gdf = utils.table_to_tile(bbox, table, min_zoom)
arr = dsm_to_tile(bbox, z_levels=[4, 6, 9, 11], verbose=False)
gdf_zonal = utils.geom_stats(gdf, arr)
return gdf_zonal
The UDF would return a GeoDataFrame
like the following, with an aggregate stats
column for each input Polygon
:
geometry stats count
660 POLYGON ((-122.39806 37.76221, -122.39806 37.7... 22.00 5
661 POLYGON ((-122.39881 37.76219, -122.39876 37.7... 21.99 193
1452 POLYGON ((-122.39619 37.76326, -122.39613 37.7... 13.50 2
1458 POLYGON ((-122.39774 37.76327, -122.39776 37.7... 11.00 28
3033 POLYGON ((-122.39680 37.76362, -122.39701 37.7... 10.20 5
...
Conclusion
While zonal statistics is a valuable tool, it's often just the starting point for more complex analyses. Combining zonal statistics with other techniques such timeseries analysis or machine learning approaches can provide even richer insights into your data.