Deploying machine-learning pipelines in real-world systems is challenging due to data distribution drift and inherent instability. Conventional machine-learning models typically rely on fixed weights optimized for a specific training distribution, which leads to performance degradation when exposed to unseen and noisy data in practice. To address this limitation, this group project develops a framework that supports online learning and introduces an evaluation benchmark that more closely reflects real-world operating conditions.
Specifically, the project consists of two main components:
1. A pipeline built around a pretrained Graph Neural Network (GNN) to estimate unknown hydraulic measurements from a limited set of sensors deployed across a water distribution network. This component focuses on implementing a Test-Time Training strategy that adapts model weights using only incoming test inputs.
2. A benchmarking platform that simulates real-world steady-state snapshots, incorporating hydraulic measurements such as pressure, demand, and network topology across multiple water distribution systems. The benchmark is designed to evaluate the robustness and adaptability of machine-learning pipelines under what-if analyses and out-of-distribution conditions.
References: Test-Time Training with Self-Supervision for Generalization under Distribution Shifts.
Multivariate State Estimation in Drinking Water Distribution Networks
Monitoring water distribution networks plays the main role in ensuring safe drinking water delivery to millions of residents in the urban area. Traditionally, this task relies on physics-based mathematical simulations; however, such models require a large number of parameters and frequent recalibration to maintain accuracy consistent with sensor measurements. As an alternative, recent studies have proposed data-driven approaches based on Graph Neural Networks (GNNs), which leverage pressure measurements from a limited set of sensors at known locations to infer pressure values at unmonitored nodes in the network. Building on this idea, the project extends the existing univariate method to a multivariate framework, aiming to jointly estimate multiple hydraulic quantities, including pressure, demand, flow rate, head loss, and others. The candidate is expected to have a basis of machine-learning foundation and proficiency in one of the deep learning frameworks (PyTorch, TensorFlow).
Reference: Graph Neural Networks for Pressure Estimation in Water Distribution Systems**.**
Graph Neural Networks (GNNs) have emerged as promising approaches in processing graph-based systems. GNNs leverage a message passing mechanism to update node features given neighborhood information. However, this mechanism often paired with several issues, particularly for over-smoothing, a phenomenon in which GNNs encode similar representations for all nodes in the graph. This hinders the scalability, constraining these models’ depth to a shallow level. This work explores a recursive approach to extend the number of layers virtually while measuring the impact of over-smoothing in this specific setting. The new approach is validated in the context of the water domain. Students interested in joining this project should have a basis of machine-learning knowledge and be familiar with one of deep-learning frameworks (PyTorch, Tensorflow).
References: Less is More: Recursive Reasoning with Tiny Networks.Assessing the performances and transferability of graph neural network metamodels for water distribution systems**.**Hierarchical Reasoning Model**.**
Have you ever managed a large dataset? This project provides an opportunity to handle a dataset with over 8,000 downloads each month. You will reorganize the dataset by task, document it thoroughly, and create a user-friendly interface and leaderboard. The project also involves working with HPC clusters, Hugging Face libraries, and GitHub Pages for documentation. Basic Python skills and familiarity with Linux commands are required.
Water researchers have relied on simulations to monitor the behavior of Water Distribution Networks. These simulations require a comprehensive set of parameters- such as elevation, demand, and pipe diameters-to determine hydraulic states accurately. This increases labor cost, time consumption and, therefore, poses a significant challenge. But what if we could reverse the process and let AI infer the missing pieces? Building on this idea, the project explores an innovative approach: leveraging data-driven deep learning methods to predict initial input conditions based on available output states. As a researcher on this project, the candidate will select and train a cutting-edge Graph Neural Network on a massive dataset. As a result, the model should be able to predict initial conditions while considering the structural and physical constraints. The candidate will submit the implementation code and a report detailing the problem and the proposed solution. The ideal candidate should have a background in machine learning and be familiar with at least one deep-learning framework.
References: Truong, Huy, et al. "DiTEC-WDN: A Large-Scale Dataset of Hydraulic Scenarios across Multiple Water Distribution Networks." (2025).
Can we train a Neural Network with Forward-Forward “harmoniously”?
Back Propagation(BP) is a de facto approach to training neural network models. Nevertheless, it is biologically implausible and requires complete knowledge (i.e., tracks the entire flow of information from start to end of a model) to perform a backward pass. Instead, an alternative approach called Forward-Forward(FF) can replace the backward pass with an additional forward one and update the model weights in an unsupervised fashion. In particular, FF performs forward passes using positive and negative inputs, respectively, and, therefore, employs the difference between the two activation versions at each layer in the neural network to compute the loss and update weights. Here, the project studies the behavior of FF employing different losses: (1) cross-entropy and (2) harmonic loss. Also, it is valuable to study the relevance between harmonic loss and FF in terms of distance metrics or geometric properties in an embedding space. As a deliverable, the candidate should submit a detailed report and implementation code. For primary requirements, the candidate should be familiar with one of the deep learning frameworks and have experience in setting up machine learning experiments.
References: Hinton, Geoffrey. "The forward-forward algorithm: Some preliminary investigations." *arXiv preprint arXiv:2212.13345* (2022).Baek, David D., et al. "Harmonic Loss Trains Interpretable AI Models." *arXiv preprint arXiv:2502.01628* (2025).