Raster to H3: A Deep Dive
Transforming raster data to the H3 grid system is a powerful technique that offers significant advantages in geospatial data analysis and processing. This conversion process uses DuckDB to aggregate numpy
arrays by H3 indices. It opens up new possibilities for efficient raster analysis.
Application
Earth Observation imagery analysis
- Agricultural parcels and field-level data
- Global environmental
- Land cover and land use change detection
Implementing Raster to H3
Implementation steps
- Load and chunk the raster into manageable parts
- Optionally coarsen the data to reduce resolution and speed up processing
- Bin the raster data to H3 indices based on points
- Aggregate the data by H3 indices
Example UDF
@fused.udf
def udf(
tiff_path: str = "s3://fused-asset/gfc2020/JRC_GFC2020_V1_S10_W40.tif",
chunk_id: int = 0,
x_chunks: int = 20,
y_chunks: int = 40,
h3_size=6,
):
import geopandas as gpd
import pandas as pd
from shapely.geometry import box
utils = fused.load("https://github.com/fusedio/udfs/blob/main/public/common/").utils
df_tiff = utils.chunked_tiff_to_points(tiff_path, i=chunk_id, x_chunks=x_chunks, y_chunks=y_chunks)
qr = f"""
SELECT
h3_latlng_to_cell(lat, lng, {h3_size}) AS hex,
AVG(lat) as lat, avg(lng) AS lng,
ARRAY_AGG(data) AS agg_data
FROM df_tiff
group by 1
"""
df = utils.run_query(qr, return_arrow=True)
df = df.to_pandas()
df["agg_data"] = df.agg_data.map(lambda x: pd.Series(x).sum())
df["hex"] = df["hex"].map(lambda x: hex(x)[2:])
df["metric"] = df.agg_data / df.agg_data.max() * 100
gdf = utils.df_to_gdf(df)
return gdf