Reliable anomaly detection in water distribution networks requires diverse and well-structured data covering both normal and faulty operating conditions. In this project, the student will design and implement an automated simulation and data-generation pipeline that produces time-series data (e.g. pressure, flow, demand) for water networks under a variety of scenarios, including leaks and sensor malfunctions. The work focuses on wrapping and orchestrating existing simulation tools (such as EPANET/WNTR), systematically varying configurations and fault parameters, and organising outputs into reproducible, machine-learning-ready datasets. The project is well suited for Bachelor or Master students with a background in computer science, data science, or AI, who are comfortable programming in Python and willing to independently learn domain-specific tools through documentation and examples. Prior knowledge of water networks is not required.
References: DiTEC-WDN DatasetDiTEC-WDN: A Large-Scale Dataset of Hydraulic Scenarios across Multiple Water Distribution NetworksLeakG3PD: A Python Generator and Simulated Water Distribution System DatasetEPANETWNTR
Evaluating Server-Based and Serverless Deployment Strategies for Machine Learning Prediction Workloads in KServe
KServe is an open-source, Kubernetes-native framework for deploying machine learning inference services. It supports both server-based deployments using standard Kubernetes resources and serverless deployments using Knative, enabling request-driven autoscaling and scale-to-zero capabilities. This project aims to evaluate the system-level performance (latency, throughput, resource usage) of server-based and serverless deployment strategies for classical machine learning prediction workloads using KServe. The study focuses on CPU-only inference services based on scikit-learn and XGBoost models. In the first phase, representative prediction machine learning models will be trained to generate inference workloads. In the second phase, these models will be deployed using KServe under two configurations: (i) Kubernetes-based deployments with Horizontal Pod Autoscaling (HPA), and (ii) Knative-based serverless deployments with request-driven autoscaling. A stream of controlled query workloads will be generated to simulate different traffic patterns. The evaluation will focus on latency, throughput, autoscaling responsiveness, CPU and memory utilization, and cold-start overhead. The results will highlight the trade-offs between performance, scalability, and resource efficiency in server and serverless ML serving environments, and how each deployment strategy can be adapted depending on the type of workloads.
References: Clipper: A Low-Latency Online Prediction Serving SystemSOCK: Rapid Task Provisioning with Serverless-Optimized ContainersSelfTune: Tuning Cluster ManagersHorizontal Pod AutoscalingKnative Technical OverviewKServe Documentation
Estimating Inference Latency of Deep Learning Models Using Roofline Analysis
Accurate estimation of inference latency is critical for meeting service-level objectives (SLOs) in large language model (LLM) serving systems. While classical ML prediction methods can be leveraged for the estimation task, their accuracy highly depends on the type of selected features. On the other hand, analytical performance models, such as the Roofline analysis, provide a hardware-aware upper bound on achievable performance; however, their applicability for latency estimation remains an open question. This project investigates how Roofline analysis can be integrated with ML prediction methods for improved estimation of end-to-end inference latency of LLM queries on a single GPU. A small set of representative LLMs will be selected, and inference latency will be measured under controlled conditions (Sequence length, batch size). Roofline-related metrics, such as arithmetic intensity and memory bandwidth utilization, will be collected using GPU profiling tools. These metrics will be used to estimate processing time and to build regression models that predict end-to-end inference latency. The evaluation will analyze prediction error, sensitivity to model size and input length, and the limitations of Roofline-based estimation.
References: Predicting LLM Inference Latency: A Roofline-Driven ML Method
An Online, Continuous, Self-Adaptive Pipeline for Water Distribution Network State Estimation
Deploying machine-learning pipelines in real-world systems is challenging due to data distribution drift and inherent instability. Conventional machine-learning models typically rely on fixed weights optimized for a specific training distribution, which leads to performance degradation when exposed to unseen and noisy data in practice. To address this limitation, this group project develops a framework that supports online learning and introduces an evaluation benchmark that more closely reflects real-world operating conditions.
Specifically, the project consists of two main components:
1. A pipeline built around a pretrained Graph Neural Network (GNN) to estimate unknown hydraulic measurements from a limited set of sensors deployed across a water distribution network. This component focuses on implementing a Test-Time Training strategy that adapts model weights using only incoming test inputs.
2. A benchmarking platform that simulates real-world steady-state snapshots, incorporating hydraulic measurements such as pressure, demand, and network topology across multiple water distribution systems. The benchmark is designed to evaluate the robustness and adaptability of machine-learning pipelines under what-if analyses and out-of-distribution conditions.
References: Test-Time Training with Self-Supervision for Generalization under Distribution Shifts.
Multivariate State Estimation in Drinking Water Distribution Networks
Monitoring water distribution networks plays the main role in ensuring safe drinking water delivery to millions of residents in the urban area. Traditionally, this task relies on physics-based mathematical simulations; however, such models require a large number of parameters and frequent recalibration to maintain accuracy consistent with sensor measurements. As an alternative, recent studies have proposed data-driven approaches based on Graph Neural Networks (GNNs), which leverage pressure measurements from a limited set of sensors at known locations to infer pressure values at unmonitored nodes in the network. Building on this idea, the project extends the existing univariate method to a multivariate framework, aiming to jointly estimate multiple hydraulic quantities, including pressure, demand, flow rate, head loss, and others. The candidate is expected to have a basis of machine-learning foundation and proficiency in one of the deep learning frameworks (PyTorch, TensorFlow).
Reference: Graph Neural Networks for Pressure Estimation in Water Distribution Systems**.**
Foundation models have become a game-changer in several fields due to their strong generalization capabilities after some form of model adaptation, with fine-tuning being the most common approach. In this project, we aim to evaluate the effectiveness of Low-Rank Adaptation (LoRA) methods in terms of model performance, model size, and memory usage. While conventional full fine-tuning often yields high accuracy, LoRA can represent a more sustainable yet still effective alternative for model adaptation.
In this project, the student will implement a LoRA-based approach to adapt a pre-trained GNN-based model to new, unseen target datasets in the context of Water Distribution Networks (WDNs). The pre-trained model has been trained on several WDNs for pressure reconstruction, and the goal is to adapt it to make predictions on unseen WDN topologies with different operating conditions. The LoRA-based adaptation will be compared against a conventional full fine-tuning approach.
Wastewater systems can naturally be represented as graphs, where nodal pressure dynamics depend on both hydraulic conditions and network topology. Recent studies have demonstrated that standard deep learning models, such as multilayer perceptrons and convolutional neural networks, can achieve high pressure prediction accuracy when augmented with node centrality metrics (e.g., degree, betweenness, and closeness). These metrics implicitly encode structural information without explicitly modeling graph connectivity. However, despite promising results, such approaches have not yet been systematically compared with Graph Neural Networks (GNNs), which explicitly learn relational dependencies and are considered the natural baseline for topology-aware learning in WDNs.
In this project, the student will investigate the extent to which centrality-aware feature engineering can serve as an alternative to GNN-based models for pressure prediction in WDNs. The project will involve a comparative study of centrality-aware neural networks and GNN architectures applied to benchmark water distribution networks under varying demand patterns and operating conditions. The analysis will focus on prediction performance, robustness to topology variations, generalization across different networks, and computational complexity.
The project scope will be adjusted according to the study level. Bachelor-level projects will focus on reproducing existing baselines and conducting a controlled comparison between centrality-aware models and a single GNN architecture. Master-level projects may extend the analysis to include cross-network transfer learning, partial observability scenarios, and ablation studies to identify which aspects of network topology are captured by centrality metrics versus explicit graph representations. The outcomes of this project aim to clarify the practical trade-offs between implicit and explicit topology encoding in machine learning models for water infrastructure systems.>
Reference: (Not exhaustive) Centrality-Aware Machine Learning for Water Network Pressure Prediction.
A Systematic Review of Graph Neural Network Applications in Wastewater Systems
Graph Neural Networks (GNNs) have shown strong performance in graph-related data. If we are looking at the wastewater systems, they are able to be represented as network topology which is suitable for GNN to works on. Existing studies apply GNNs to a variety of tasks, including state estimation, forecasting, anomaly detection, and sensor placement. However, the literature remains scattered, and the overview of methods, datasets, and research gaps is still lacking. In this project, the student will conduct a systematic literature review of GNN-based approaches in wastewater systems following a structured review protocol (e.g., PRISMA). The student will identify, screen, and categorize relevant studies based on application domain, learning task, data sources (real vs synthetic), and evaluation settings. Special attention will be given to limitations related to sensor availability, uncertainty handling, and generalization across networks. The outcome of the project is a structured taxonomy and comparative analysis of existing GNN-based wastewater studies, highlighting open challenges and opportunities for future research, particularly in sensor placement and digital twin development. Project depth will be adjusted to BS or MS level.
References: Graph Neural Network Empowers Intelligent Education: A Systematic Review From an Application Perspective.The PRISMA 2020 Statement: An Updated Guideline for Reporting Systematic Reviews.A Comprehensive Survey on Graph Neural Networks.
Uncertainty Quantification in GNN-based Water Level Estimation
Graph Neural Networks (GNNs) have shown strong performance in estimating hydraulic states in partially observed wastewater networks. However, most existing approaches focus on point predictions and provide limited insight into prediction reliability, which is crucial for decision-making. In this project, the student will (1) do literature review on conformal prediction for graph-structured data, emphasizing the challenges of applying CP to graphs (e.g., exchangeability issues in node/edge settings and dependence in graph neighborhoods), and (2) implement a split conformal prediction pipeline on top of a pre-trained GNN for water level estimation in wastewater networks. Recent work such as Conformalized GNNs (CF-GNN) and conformal prediction sets for GNNs provide practical designs and evaluation protocols for graph settings. The student will evaluate uncertainty quality using empirical coverage and interval width across sensor-masking ratios and analyze how uncertainty varies across sparse sensor conditions. Project depth will be adjusted to BS or MS level.
Organisations are increasingly concerned with environmental sustainability for various reasons (e.g., legislative, economic, ecological, or social). Quantifying sustainability performance across different dimensions is necessary for fulfilling legislative requirements and evaluating improvement efforts. In this project you will extend existing model checking approaches so that they can deal with business process models in which key environmental indicators have been attached to tasks and subprocesses along with possible target values for those indicators that should be enforced during process execution.
Runtime Compliance Checking for Camunda 8
Supervisors:Heerko Groefsema,
Michel Medema Date: 2025-12-05 Type: bachelor-project/master-internship/master-project Description:
Organisations are increasingly concerned with environmental sustainability for various reasons (e.g., legislative, economic, ecological, or social). Quantifying sustainability performance across different dimensions is necessary for fulfilling legislative requirements and evaluating improvement efforts. In this project you will integrate an existing compliance checking tool into the Camunda 8 platform.
Verification of Security and Privacy concepts in BPMN Choreography diagrams
Where process models define the flow of activities of participants, choreographies describe interactions between participants. Within such interactions, the security and privacy related concepts of separation of duties and division of knowledge are important. The former specifies that no one person has the privileges to misuse the system, either by error or fraudulent behavior, while the latter defines the absence of total knowledge within a single person, such that the knowledge can not be abused. The problem is, how do we specify such concepts and what kind of model is required to verify these concepts? In this project we ask the student to devise an approach to formally specify and verify these concepts given a BPMN Choreography Diagram.
References: OMG. Business process model and notation (BPMN) version 2.0, 2011.Pullonen, Pille & Matulevičius, Raimundas & Bogdanov, Dan. (2017). PE-BPMN: Privacy-Enhanced Business Process Model and Notation. 40-56.BPMVerification package
The practice of checking conformance of business process models has revolutionized the industry through the amount of insight it creates into the process flows of businesses. Conformance checking entails matching an event log (which details events of past executions) against a business process model (which details the prescribed process flow) through a so called alignment. Any deviation from the prescribed process flow is detected and reported. Generally, alignments are obtained by matching the so called token replay of process models (e.g., Petri nets) against events in logs. Our Transition Graphs are also obtained from token replays, but offer further insight into parallel executions than regular Reachability Graphs. As a result, we are interested in the applicability of obtaining alignments using Transition Graphs, especially when matched against event logs that include lifecycle events and thus offer parallel execution data. In this project we ask the student to implement and evaluate the applicability of such an approach.
References: H. Groefsema, N.R.T.P. van Beest, and M. Aiello (2016) A Formal Model for Compliance Verification of Service Compositions. IEEE Transactions on Service Computing.Carmona, Josep, et al. "Conformance checking." Switzerland: Springer.[Google Scholar] (2018).BPMVerification package