Raster to H3: A Deep Dive
Transforming raster data to the H3 grid system is a powerful technique that offers significant advantages in geospatial data analysis and processing. This conversion process uses DuckDB to aggregate numpy arrays by H3 indices. It opens up new possibilities for efficient raster analysis.
Application
Earth Observation imagery analysis
- Agricultural parcels and field-level data
- Global environmental
- Land cover and land use change detection
Implementing Raster to H3
Implementation steps
- Load and chunk the raster into manageable parts
- Optionally coarsen the data to reduce resolution and speed up processing
- Bin the raster data to H3 indices based on points
- Aggregate the data by H3 indices
Example UDF
@fused.udf
def udf(
    tiff_path: str = "s3://fused-asset/gfc2020/JRC_GFC2020_V1_S10_W40.tif",
    chunk_id: int = 0,
    x_chunks: int = 20,
    y_chunks: int = 40,
    h3_size=6,
):
    import geopandas as gpd
    import pandas as pd
    from shapely.geometry import box
    utils = fused.load("https://github.com/fusedio/udfs/blob/main/public/common/").utils
    df_tiff = utils.chunked_tiff_to_points(tiff_path, i=chunk_id, x_chunks=x_chunks, y_chunks=y_chunks)
    qr = f"""
        SELECT
            h3_latlng_to_cell(lat, lng, {h3_size}) AS hex,
            AVG(lat) as lat, avg(lng) AS lng,
            ARRAY_AGG(data) AS agg_data
        FROM df_tiff
        group by 1
    """
    df = utils.run_query(qr, return_arrow=True)
    df = df.to_pandas()
    df["agg_data"] = df.agg_data.map(lambda x: pd.Series(x).sum())
    df["hex"] = df["hex"].map(lambda x: hex(x)[2:])
    df["metric"] = df.agg_data / df.agg_data.max() * 100
    gdf = utils.df_to_gdf(df)
    return gdf