Fusion of In-Situ Network and Auxiliary Information: A Probabilistic Approach

PhD Thesis Defense Presentation

Gabriel Oduori

University College Dublin

Tuesday, 21 April, 2026

Overview

Setting the Scene
Contributions
Closing

Problem
Space
1
2
Gaps
Research
Questions
3
4
Data
Sources
Modelling
Approach
5
6
RQ0
Systematic Review
RQ1
FusionGP
7
8
RQ2
GAM‑SSM
RQ3
Transfer Learning
9
10
RQ4
Uncertainty Quantification
Conclusions
11
12
Future
Work

Problem Space

Urban air quality monitoring requires the integration of large, heterogeneous datasets — yet existing data fusion methods have a restricted capacity to process such diverse inputs while simultaneously providing robust uncertainty quantification.
🕸️
Spatial & Temporal Dependencies
Jointly represent complex dependencies across space and time
🔍
Interpretable Structure
Model outputs must be explainable to scientists and policymakers
📊
Rigorous Uncertainty Quantification
Propagate and communicate uncertainty end-to-end
⚙️
Computational Tractability
Scalable and deployable in real urban settings
This thesis proposes a probabilistic approach to address all four requirements simultaneously

Gaps in the State of the Art

Gap 1: Probabilistic fusion is rare
Most methods produce point estimates. Uncertainty either ignored or treated as an afterthought. Limited probabilistic framework within multiple heterogeneous fusion.
→ RQ1
Gap 2: Temporal dynamics overlooked
Most models are static spatial snapshots. Time-varying pollutant behaviour — diurnal cycles, traffic peaks, seasonal trends — is rarely modelled explicitly.
→ RQ2
Gap 3: Models do not transfer
Models are trained and validated in a single city. Transferability to new cities or sparse-sensor environments is rarely tested or reported.
→ RQ3
Gap 4: Uncertainty never reaches the decision-maker
Even where uncertainty is estimated, it is rarely communicated in formats that support policy or public health decisions.
→ RQ4

From Gaps to Research Questions

Central Research Question

How can probabilistic models be designed to fuse heterogeneous air quality data (in-situ sensors, satellite observations, and spatial covariates) for accurate and uncertainty-aware estimation of pollutant concentrations?

RQ1: How can probabilistic models fuse satellite and spatial data for accurate NO₂ mapping?
RQ2: How can LUR be embedded in a State Space Model to capture temporal dynamics?
RQ3: How transferable are probabilistic air quality models across cities and deployments?
RQ4: How can uncertainty quantification improve estimation reliability and support decision-making?

Methodology — Data Sources

🏭
In-Situ Networks
High accuracy
Low density
🛰️
Satellite (TROPOMI)
Global coverage
Coarse resolution
🚗
Traffic Volume
Primary NO₂
emission proxy
🗺️
OSM Geodata
Land use
spatial structure
Approach Uncertainty Multi-source Temporal Transferable
LUR Partial Limited
Kriging Partial
Deep Learning Limited
This thesis

Methodology — Probabilistic Modelling

Gaussian Processes
  • Non-parametric Bayesian framework
  • Principled uncertainty quantification
  • Flexible kernel design for spatial correlation
  • Treats each source as a noisy observation of a latent field
State Space Models
  • Dynamic latent state representation
  • Kalman filter / smoother for inference
  • Natural for temporal evolution of pollution fields
  • Embeds Land Use Regression as the spatial observation model
Why probabilistic? Every data source in this thesis is a noisy, incomplete observation of the same underlying truth — the actual NO₂ concentration field. Only a probabilistic framework propagates that uncertainty honestly end-to-end.

Contributions Overview

Principal contributions to probabilistic air quality modelling:
Methodological
  1. A probabilistic GP fusion framework integrating satellite, spatial, and in-situ data (RQ1)
  2. A hybrid GAM–State Space model embedding LUR for spatio-temporal NO₂ prediction (RQ2)
Applied
  1. A transfer learning approach for generalising probabilistic models across cities and sensor deployments (RQ3)
  2. An UQ framework propagating estimation uncertainty through the full fusion pipeline (RQ4)
These four contributions form a coherent end-to-end pipeline: from data fusion → temporal dynamics → transferability → uncertainty communication. Each builds on the previous. Each is a paper.

RQ0: Systematic Review

Method: Systematic Review of 80+ papers across probabilistic and deterministic fusion methods

Key finding: No unified probabilistic framework exists — deterministic methods dominate, uncertainty is largely ignored

Title: Data Fusion for Low-Cost Sensors: Systematic Review
Venue: Information Fusion
🟢 Published

RQ1: FusionGP

Method: Gaussian Process framework integrating TROPOMI satellite observations and LUR covariates with in-situ reference data

Key finding: High-resolution probabilistic NO₂ maps with calibrated uncertainty estimates at every location

Title: FusionGP: Probabilistic Data Fusion for High-Resolution Urban Air Quality Mapping
Venue: Computers, Environment and Urban Systems
🔵 Manuscript submitted and returned

RQ2: Hybrid GAM–SSM

Method: Hybrid Generalised Additive Model combined with State Space formulation embedding LUR

Key finding: Dynamic prediction with principled uncertainty — model updates as observations arrive. Currently revising with wind-sector based covariates.

Title: Hybrid Generalized Additive-State Space Modelling for Urban NO₂ Prediction: Integrating Spatial and Temporal Dynamics
Venue: Environmental Modelling & Software
🟡 Under revision

RQ3: Transfer Learning

Method: Transfer learning applied to pre-trained probabilistic fusion models across different urban deployments

Key finding: Probabilistic structure transfers well even when city morphology differs — reducing city-specific training data requirements substantially

Title: Transfer Learning for Generalizing Air Quality Models
Venue: Applied Soft Computing
🟡 Under major revision

RQ4: Uncertainty Quantification

Method: Gaussian Surrogate Uncertainty Quantification Model — propagating sensor noise, retrieval error, and model uncertainty end-to-end

Key finding: Source-decomposed uncertainty enables transparent, decision-relevant communication of what the model does and does not know

Title: Uncertainty Quantification from Air Quality Fusion Models
Venue: To be confirmed
🔵 In preparation

Progress Update: January–April 2026

Published / Complete Submitted Under Revision In Preparation
January
February
March
April
Thesis
Submitted
RQ0: Systematic Review
Published — Information Fusion
RQ1: FusionGP
Finalising
Submitted — CEUS
RQ2: Hybrid GAM–SSM
Under Revision, updated methodology with — wind-sector covariates
RQ3: Transfer Learning
Major Revision — Applied Soft Computing
RQ4: Uncertainty Quantification
In Preparation
🎤 Viva — Apr 21

Conclusions

Central Research Question — Answered
Probabilistic models can be designed to fuse heterogeneous air quality data — delivering accurate, uncertainty-aware, and transferable NO₂ estimates.
Research Question Key Finding Status
RQ0: Systematic Review Unified probabilistic framework absent from literature 🟢 Published
RQ1: Probabilistic Fusion FusionGP — calibrated spatial uncertainty maps 🔵 Submitted
RQ2: Spatio-temporal LUR Hybrid GAM-SSM captures temporal evolution 🟡 In revision
RQ3: Transferability Probabilistic structure transfers across cities 🟡 Under review
RQ4: Uncertainty Source-decomposed uncertainty supports decisions 🔵 In prep

Future Work

Near-term Extensions
  • Extend FusionGP eyond NO₂
  • Incorporate real-time sensor streams for online model updating
Longer-term Vision
  • Replace hand-crafted LUR covariates with learned urban spatial embeddings from satellite imagery and OSM
  • Pre-train probabilistically across many cities → true foundation model for air quality
🧠 Spatial Intelligence
AI systems that understand physical space — moving beyond hand-crafted covariates to learned urban representations (Fei-Fei Li's paradigm applied to environmental sensing)
🎯 Probabilistic Spatial Intelligence
Spatial Intelligence provides rich representation · Probabilistic modelling provides the inference · Together: uncertainty-aware predictions that transfer across any city
🌍 Impact
Every city deserves accurate, air quality estimates — not just those with dense reference networks. This framework.

Thank You


Fusion of In-Situ Network and Auxiliary Information: A Probabilistic Approach

A unified probabilistic framework for heterogeneous air quality data fusion — accurate, uncertainty-aware, and transferable across urban environments.


Gabriel Oduori · University College Dublin

gabriel.oduori@ucdconnect.ie


Supervised by Assistant Prof Chiara Cocco & Professor Francesco Pilla