Skip to main content

Raster to H3: A Deep Dive

Transforming raster data to the H3 grid system is a powerful technique that offers significant advantages in geospatial data analysis and processing. This conversion process uses DuckDB to aggregate numpy arrays by H3 indices. It opens up new possibilities for efficient raster analysis.

Application

Earth Observation imagery analysis

  • Agricultural parcels and field-level data
  • Global environmental
  • Land cover and land use change detection

Implementing Raster to H3

Implementation steps

  1. Load and chunk the raster into manageable parts
  2. Optionally coarsen the data to reduce resolution and speed up processing
  3. Bin the raster data to H3 indices based on points
  4. Aggregate the data by H3 indices

Example UDF

@fused.udf
def udf(
tiff_path: str = "s3://fused-asset/gfc2020/JRC_GFC2020_V1_S10_W40.tif",
chunk_id: int = 0,
x_chunks: int = 20,
y_chunks: int = 40,
h3_size=6,
):
import geopandas as gpd
import pandas as pd
from shapely.geometry import box

utils = fused.load("https://github.com/fusedio/udfs/blob/main/public/common/").utils

df_tiff = utils.chunked_tiff_to_points(tiff_path, i=chunk_id, x_chunks=x_chunks, y_chunks=y_chunks)

qr = f"""
SELECT
h3_latlng_to_cell(lat, lng, {h3_size}) AS hex,
AVG(lat) as lat, avg(lng) AS lng,
ARRAY_AGG(data) AS agg_data
FROM df_tiff
group by 1
"""

df = utils.run_query(qr, return_arrow=True)
df = df.to_pandas()
df["agg_data"] = df.agg_data.map(lambda x: pd.Series(x).sum())
df["hex"] = df["hex"].map(lambda x: hex(x)[2:])
df["metric"] = df.agg_data / df.agg_data.max() * 100
gdf = utils.df_to_gdf(df)
return gdf