DuckDB
Fused interfaces DuckDB with the Python duckdb
library.
1. Create a DuckDB client and set home_directory
@fused.udf
def udf(bbox: fused.types.TileGDF=None, agg_factor=3, min_count=5):
import duckdb
utils = fused.load("https://github.com/fusedio/udfs/tree/f928ee1/public/common/").utils
con = duckdb.connect()
con.sql("""
SET home_directory='/tmp/';
install 'httpfs'; load 'httpfs';
""")
2. Write a query
This example UDF shows how to query the NYC Taxi dataset with a DuckDB.
@fused.udf
def udf(bbox: fused.types.TileGDF=None, agg_factor=3, min_count=5):
import duckdb
utils = fused.load("https://github.com/fusedio/udfs/tree/f928ee1/public/common/").utils
con = duckdb.connect()
con.sql("""
SET home_directory='/tmp/';
install 'httpfs'; load 'httpfs';
""")
df = con.sql(f"""
SELECT
round(pickup_longitude*{agg_factor},3)/{agg_factor} lng,
round(pickup_latitude*{agg_factor},3)/{agg_factor} lat,
count(1) cnt
FROM
read_parquet('https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2010-01.parquet')
GROUP BY
round(pickup_longitude*{agg_factor},3),
round(pickup_latitude*{agg_factor},3)
HAVING
cnt>{min_count} AND
lat>40 AND
lat<41
""").df()
print("Number of trips:", df.cnt.sum())
gdf = utils.geo_convert(df)
return gdf