Skip to main content

The Strength in Weak Data Part 3: Prepping the Model Dataset

Β· 4 min read
Kristin Scholten
Data Scientist @ Nationwide

Hello friends, thanks for following my journey so far. To catch you up, I'm trying to solve the problem of farmers and traders relying on weak and untimely predictions of corn yield. Weak because they are at the national level and untimely because the predictions come once a month.

So here's the deal: farmers and traders have been relying on national-level corn yield predictions that are not only weak but also painfully slow, arriving just once a month. Imagine making critical decisions based on a single data point each month.

Not ideal, right? That's exactly the issue we're tackling in this blog post series.

Streamlining Infrastructure Risk Analysis with Fused

Β· 2 min read
Jacob Prince-Bieker
Senior ML Engineer @ VIDA

VIDA uses dozens of the latest generation of climate models to have the most up-to-date climate information, collectively part of the Coupled Model Intercomparison Project 6 (CMIP6). These models provide a range of possible futures, under different emission scenarios, as well as differences in how each model does its forecast. Using this information, we can create ensembles from the models, and have higher confidence in the risks and hazards we derive from the models, and which we present to our customers.

In this blog post, I show how I created a UDF to pre-process and visually inspect the Zarrs we generate as the output from our climate risk models.

Map Overture Buildings and Foursquare Places with Leafmap

Β· One min read
Qiusheng Wu
Associate Professor @ University of Tennessee
Plinio Guzman
Founding Engineer @ Fused

Dr. Qiusheng Wu is an Associate Professor of Geography and Sustainability at the University of Tennessee and a Founding Editorial Board Member at the Cloud-Native Geospatial Forum (CNG). As part of his commitment to making open-source geospatial analysis and visualization more accessible, he has developed several widely used open-source packages, including geemap, leafmap, and segment-geospatial.

In this Notebook Qiusheng shows a few examples of how Cloud Native Geospatial datasets help you easily load data into a Jupyter Notebook environment using leafmap. His practical examples showcase how you can call the Overture Maps UDF and Foursquare Places UDF to load data into a custom area of interest and render it in a leaflet map.


From query to map: Exploring GeoParquet Overture Maps with Ibis, DuckDB, and Fused

Β· 3 min read
Naty Clementi
Sr. Software Engineer - Ibis Project Committer

Naty is a Senior Software Engineer and a contributor to Ibis, the portable Python dataframe library. One of her main contributions was enabling the DuckDB spatial extension for Ibis in 2023.

In this blog post, she shows us how to leverage the spatial extension in DuckDB with Ibis to query Overture data. Ibis works by compiling Python expressions into SQL, you write Python dataframe-like code, and Ibis takes care of the SQL. Thanks to Ibis integration with Pandas and GeoPandas, you only need to do to_pandas() to get your expression as a GeoDataFrame.

Creating an app to model road mobility networks in Lima, Peru

Β· 3 min read
Claudio Ortega
Head of AI @ Vitrus

On December 2023, I visited the Institute for Metropolitan Planning (IMP) in Lima. The director had invited me to share some of my geospatial analysis projects from my master's studies and explore potential collaborations. Around that time, Lima's mayor had announced a bold infrastructure initiative: building 60 flyover bridges to ease traffic congestion in one of the most gridlocked cities in Latin America.

When I asked how the city was simulating the impact of new network designs on urban mobility, the answer was: "We are not simulating anything, our budget is constrained, and there is no political will to solve this problem." I couldn't think of anything else after this meeting. I started thinking about how I could create an easy-to-use tool to simulate urban mobility using open-source data, tools with no subscriptions or licenses, and without data privacy concerns.

My first attempt with FastAPI and React came to an unfortunate halt. Fused allowed me to revisit the idea and easily create an API endpoint and lightweight app I could easily share with anyone.


Beyond RGB: Interactive Exploration of NEON's Hyperspectral Data

Β· 3 min read
Guillermo Ponce
Research Specialist @ University of Arizona

As a research specialist focused on remote sensing applications in semi-arid rangelands, I'm constantly seeking tools that can enhance our ability to process and analyze large-scale geospatial data. The excitement of discovering new platforms that streamline complex workflows never gets old, especially when dealing with the massive datasets typical in remote sensing research.

My journey with Fused began unexpectedly through the "Minds Behind Maps" podcast, where host Maxime Lenormand interviewed Sina Kashuk, Co-Founder and CEO of Fused (see episode). The conversation sparked my curiosity, leading me to explore Fused's community examples and documentation. After joining their waitlist and receiving access, I knew exactly how I wanted to test it: an interactive tool for exploring NEON's Airborne Observation Platform (AOP) data.

How DigitalTwinSim Models Wireless Networks with DuckDB, IBIS, and Fused

Β· 3 min read
Sameer Lalwani
Co-Founder @ DigitalTwinSim

Sameer, co-founder of DigitalTwinSim, leads the development of advanced geospatial analysis tools to support the telecom industry in strategic network planning. DigitalTwinSim specializes in using high-resolution data to optimize the placement of network towers ensuring reliable, high-speed connectivity.

In this blog post, Sameer shares how he leverages Ibis with a DuckDB backend, and Fused to model wireless networks at high resolution. This approach enables him to quickly generate network coverage models for his clients. He explains and shares a Fused UDF that processes data in an H3 grid to evaluate optimal locations for network towers.

info

Check out Sameer's UDF on Workbench here::

The Fastest Way to Download Foursquare's new POI Dataset

Β· One min read
Max Lenormand
Developer Advocate @ Fused
Sina Kashuk
CEO @ Fused

Foursquare just released an open dataset of over 100M global places of interest.

We at Fused have partitioned these points into easily accessible GeoParquet files, and hosted them on Source Cooperative

On top of that, we've build a publicly available User Defined Function (UDF) that anyone can use to easily look at & download to GeoJSON, all from the browser


Try it out for yourself!

You don't need to login or create an account to easily access the Foursquare POI points

How I Got Started Making Maps with Python and SQL

Β· 4 min read
Stephen Kent
Data Engineering and Visualization

I am a self taught developer and data enthusiast. I first came across the spatial data community when I saw a Matt Forrest video on LinkedIn where he demonstrated how to visualize buildings from the Vida Combined Building Footprints dataset with DuckDB. Immediately I thought, what if you could see all the buildings in a country, say, Egypt? I set out to do just that and made this map with DuckDB and Datashader.

File

Buildings in Egypt.

Discovering NYC Chronotypes with Fused

Β· 3 min read
Elizabeth Cutrone
Director of Data Science @ Precisely
info

πŸŽ₯ Watch the Webinar recording associated with this blog post here.

Neighborhoods within a city have consistent characteristics but often have ill-defined boundaries. Some neighborhoods are more similar than others even though they’re not nearby. Understanding these local boundaries and the demographics, dynamics and behaviors of different areas affects a wide range of business applications, including advertising, site selection, business analytics, and many more.

DuckDB, Fused, and your data warehouse

Β· 3 min read
Stefano Bourscheid
Facilitating Engineer @ GLS
info

πŸŽ₯ Watch the Webinar recording associated with this blog post here.

GLS (General Logistics Systems) is an international parcel delivery service provider, primarily operating in Europe and North America. To stay ahead in the fast-paced logistics industry, GLS launched GLS Studioβ€”an innovation lab aimed at optimising and modernising its depots and processes through cutting-edge technology.

Stefano co-founded GLS Studio to build the next generation of data-driven products. In this post, he shares how GLS Studio uses Fused to drive efficiency and innovation in parcel delivery.

In this blog post, Stefano shows how his team powers GLS's ParcelPlanner app, which helps GLS evaluate delivery routes efficiently. The app uses Fused to query Snowflake and serve H3-partitioned geospatial data to the frontend, which is powered by Honeycomb Maps and DuckDB WASM.

The Strength in Weak Data Part 2: Zonal Statistics

Β· 3 min read
Kristin Scholten
Data Scientist @ Nationwide
info

πŸŽ₯ Watch the Webinar recording associated with this blog post here.

A raster, a vector, and an array walk into a bar…

Ok I will spare you the corny jokes.

But seriously, I was facing a problem with these three data types when I approached Fused. It felt impossible to join this information together in a meaningful way. Fortunately, I was quickly proven wrong with the power of UDFs. Let me catch you up.

Analyzing traffic speeds from 100 billion drive records

Β· 5 min read
Christopher Kyed
Data Scientist @ Pacific Spatial
info

πŸŽ₯ Watch the Webinar recording associated with this blog post here.

Over the last few decades, it has become increasingly evident that passenger vehicles are by far the most dangerous way to travel. As a result, it has become more and more important to find an efficient and effective method to predict traffic risk. However, predicting traffic accidents and where they are likely to occur is a very complex problem, with large amounts of data being needed for most meaningful predictions.

At Pacific Spatial Solutions, we are currently trying to tackle this problem by training a machine learning model to predict road and intersection risk in Japan nationwide. As we are trying to predict traffic risk on a national level it is only natural that the data we use cover the same area.

Creating cloud-free composite HLS imagery with Fused

Β· 4 min read
Marie Hoeger
Staff Software Engineer @ Pachama
Plinio Guzman
Founding Engineer @ Fused
info

πŸŽ₯ Watch the Webinar recording associated with this blog post here.

High-quality satellite imagery is essential to assess the carbon impact of nature-based forest conservation and restoration projects [1]. However, getting that high quality imagery is uniquely difficult in areas that need carbon financing the most: tropical forests. Tropical forests present a unique challenge for satellite imagery analysis due to persistent cloud cover, which often renders optical imagery unusable and creates data gaps.

File

Example composites highlight how the HLS-L30 product alone can have gaps when attempting to make a seasonal composite, as fewer cloud-free observations.

The Strength in Weak Data Part 1: Navigating the NetCDF

Β· 3 min read
Kristin Scholten
Data Scientist @ Nationwide
info

πŸŽ₯ Watch the Webinar recording associated with this blog post here.

Ever tried to make sense of the myriad file types in spatial data science and felt like you've wandered into a linguistic labyrinth? Trust me, you're not alone. As a data scientist who's spent more time wrangling datasets than I care to admit, I thought I'd take a casual stroll down memory lane with an old high school friend: regression models. Just a simple plot of actual vs. predicted, right? But when spatial data's involved, you can't just sit back and relaxβ€”you've got to keep one eye on the geometries.

I'm currently working on an agricultural project, and growing up on a farm gives me a personal stake in this. This blog illustrates my solution to the geometry debacle. I'll first take you to the area where I grew up: Lyon County.

Enrich your dataset with GERS and create a Tile server

Β· 3 min read
Jennings Anderson
Software Engineer @ Meta
Plinio Guzman
Founding Engineer @ Fused
info

πŸŽ₯ Watch the Webinar recording associated with this blog post here.

Overture is an open data project that publishes interoperable map datasets. It aims to foster an ecosystem of developers creating downstream map services around its data products. Fused emerged as a solution to enrich Overture datasets on the fly and serve them with XYZ Tile endpoints.

This clip shows how coverage expands in (top right) Astoria when I add building heights from the NSI dataset (as num_story * 3) to Overture buildings.

Six ways to use Fused

Β· 4 min read
Daniel Jahn
Platform Engineer @ Sylvera
info

πŸŽ₯ Watch the Webinar recording associated with this blog post here.

Fused is a powerful and versatile tool that can do nearly anything with just Python. Its versatility is its strength, but it is also an obstacle. It's easy to walk about wondering: what, concretely, can Fused do for me?

Here are six concrete ways you can use Fused today.

Summarizing building energy ratings

Β· One min read
Isaac Brodsky
CTO @ Fused

In this video tutorial, I show a complete data app workflow in Fused. Starting with exploring the data in Fused, the tutorial walks through developing a UDF to serve the data, and then a Fused App to share results.

With Fused, this whole workflow takes just minutes from beginning to end. Fused helps me visualize the data at every step, iterate on my analytical logic, and finally publish a dashboard.

ML-less global vegetation segmentation at scale

Β· 4 min read
Kevin Lacaille
Senior Software Engineer @ Spexi
info

πŸŽ₯ Watch the Webinar recording associated with this blog post here.

In an era where data-driven decisions are vital, accurate and scalable vegetation analysis plays a crucial role across various industries, from environmental monitoring to urban planning. While AI and machine learning have transformed image analysis, they often bring complexities and resource demands that aren't always practical for large-scale, real-time applications.

How Pachama creates maps on-the-fly with Fused

Β· 4 min read
Andrew Campbell
Senior Software Engineer @ Pachama
Plinio Guzman
Founding Engineer @ Fused
info

πŸŽ₯ Watch the Webinar recording associated with this blog post here.

Pachama is a technology company that harnesses satellite data and AI to empower companies to confidently invest in nature. The engineering team at Pachama created a Land Suitability Tool to help landowners and project developers qualify parcels of land to implement carbon projects. They turned to Fused to simplify their data workflows.

Geospatial workflows of any size

Β· One min read
Isaac Brodsky
CTO @ Fused
Matt Forrest
Field CTO @ CARTO

Isaac Brodsky, the CTO of Fused, delved into the power of Fused during a LinkedIn live session with Matt Forrest. They discussed the contrast of Python vs. SQL for data analytics, the advantages of serverless geospatial processing, and showcased a live demo of the UDF Builder. During the demo, Isaac created a User Defined Function visualize Overture building footprints that are within a certain proximity of water.

DuckDB + Fused: Fly beyond the serverless horizon

Β· 6 min read
Sina Kashuk
CEO @ Fused
Isaac Brodsky
CTO @ Fused
File

The combination of Fused serverless operations and DuckDB offers blazing fast data processing. Fused embraced Python to create serverless User Defined Functions (UDFs). Now, with the help of DuckDB, Fused enables developers to leverage the ease and familiarity of SQL in these functions β€Š- β€Šwithout compromising performance and parallelism.