Dilek Düştegör

Vertical Federated Learning Framework

Supervisors: Revin Alief, Dilek Düştegör, Alexander Lazovik
Date: 2026-01-28
Type: master-project/master-internship
Description:

(Requirement: Availability for six months for full time internship at TNO)
Big Data and Data Science (AI & ML) are increasingly popular topics because of the advantages they can bring to companies. The data analysis is often done in long-running processes or even with an always-online streaming process. This data analysis is almost always done within different types of limitations: from users, business perspective, from hardware and from the platforms on which the data analysis is running. At TNO we are looking into ways of developing solutions for vertical federated learning framework which allows the separation of concerns between local models, making analysis on local data, and central model which learns from many local models and updates local models when necessary. We have applied federated learning on horizontal approaches applied in multiple domains like energy, industry. Vertical Federated Learning (VFL) enables multiple parties to collaboratively train a machine learning model over vertically distributed datasets without data privacy leakage.
At TNO we are looking into ways of developing solutions for vertical federated learning framework which allows the separation of concerns between local models, making analysis on local data, and central model which learns from many local models and updates local models when necessary. We have applied federated learning on horizontal approaches applied in multiple domains like energy, industry. Vertical Federated Learning (VFL) enables multiple parties to collaboratively train a machine learning model over vertically distributed datasets without data privacy leakage.
Internship Role and Responsibilities:
- Your challenge would be to investigate and experiment on vertical federated learning approach and apply it to the energy or industry domain. Develop a scalable federated learning platform using the state-of-the-art approach.
- Evaluation of real-world scenarios and benchmarking data.
- Research on state of the art of heterogeneous edge computing and federated learning frameworks and scenarios.

Building a Simulation Pipeline for Anomalous Time-Series Data in Water Networks

Supervisors: Samer Ahmed, Dilek Düştegör
Date: 2026-01-09
Type: bachelor-project/master-project/bachelor-internship/master-internship
Description:

Reliable anomaly detection in water distribution networks requires diverse and well-structured data covering both normal and faulty operating conditions. In this project, the student will design and implement an automated simulation and data-generation pipeline that produces time-series data (e.g. pressure, flow, demand) for water networks under a variety of scenarios, including leaks and sensor malfunctions. The work focuses on wrapping and orchestrating existing simulation tools (such as EPANET/WNTR), systematically varying configurations and fault parameters, and organising outputs into reproducible, machine-learning-ready datasets. The project is well suited for Bachelor or Master students with a background in computer science, data science, or AI, who are comfortable programming in Python and willing to independently learn domain-specific tools through documentation and examples. Prior knowledge of water networks is not required. References:
DiTEC-WDN Dataset DiTEC-WDN: A Large-Scale Dataset of Hydraulic Scenarios across Multiple Water Distribution Networks LeakG3PD: A Python Generator and Simulated Water Distribution System Dataset EPANET WNTR

An Online, Continuous, Self-Adaptive Pipeline for Water Distribution Network State Estimation

Supervisors: Huy Truong, Dilek Düştegör
Date: 2026-01-09
Type: bachelor-project/master-project/master-internship
Description:

Deploying machine-learning pipelines in real-world systems is challenging due to data distribution drift and inherent instability. Conventional machine-learning models typically rely on fixed weights optimized for a specific training distribution, which leads to performance degradation when exposed to unseen and noisy data in practice. To address this limitation, this group project develops a framework that supports online learning and introduces an evaluation benchmark that more closely reflects real-world operating conditions. Specifically, the project consists of two main components: 1. A pipeline built around a pretrained Graph Neural Network (GNN) to estimate unknown hydraulic measurements from a limited set of sensors deployed across a water distribution network. This component focuses on implementing a Test-Time Training strategy that adapts model weights using only incoming test inputs. 2. A benchmarking platform that simulates real-world steady-state snapshots, incorporating hydraulic measurements such as pressure, demand, and network topology across multiple water distribution systems. The benchmark is designed to evaluate the robustness and adaptability of machine-learning pipelines under what-if analyses and out-of-distribution conditions. References:
Test-Time Training with Self-Supervision for Generalization under Distribution Shifts.

Graph Reasoning Models

Supervisors: Huy Truong, Dilek Düştegör
Date: 2026-01-09
Type: master-project/master-internship
Description:

Graph Neural Networks (GNNs) have emerged as promising approaches in processing graph-based systems. GNNs leverage a message passing mechanism to update node features given neighborhood information. However, this mechanism often paired with several issues, particularly for over-smoothing, a phenomenon in which GNNs encode similar representations for all nodes in the graph. This hinders the scalability, constraining these models’ depth to a shallow level. This work explores a recursive approach to extend the number of layers virtually while measuring the impact of over-smoothing in this specific setting. The new approach is validated in the context of the water domain. Students interested in joining this project should have a basis of machine-learning knowledge and be familiar with one of deep-learning frameworks (PyTorch, Tensorflow). References:
Less is More: Recursive Reasoning with Tiny Networks. Assessing the performances and transferability of graph neural network metamodels for water distribution systems**.** Hierarchical Reasoning Model**.**

Centrality-Aware Learning vs Graph Neural Networks for State Estimation in Wastewater Systems

Supervisors: Revin Naufal Alief, Dilek Düştegör
Date: 2026-01-05
Type: bachelor-project/student-colloquium/master-internship/master-project
Description:

Wastewater systems can naturally be represented as graphs, where nodal pressure dynamics depend on both hydraulic conditions and network topology. Recent studies have demonstrated that standard deep learning models, such as multilayer perceptrons and convolutional neural networks, can achieve high pressure prediction accuracy when augmented with node centrality metrics (e.g., degree, betweenness, and closeness). These metrics implicitly encode structural information without explicitly modeling graph connectivity. However, despite promising results, such approaches have not yet been systematically compared with Graph Neural Networks (GNNs), which explicitly learn relational dependencies and are considered the natural baseline for topology-aware learning in WDNs. In this project, the student will investigate the extent to which centrality-aware feature engineering can serve as an alternative to GNN-based models for pressure prediction in WDNs. The project will involve a comparative study of centrality-aware neural networks and GNN architectures applied to benchmark water distribution networks under varying demand patterns and operating conditions. The analysis will focus on prediction performance, robustness to topology variations, generalization across different networks, and computational complexity. The project scope will be adjusted according to the study level. Bachelor-level projects will focus on reproducing existing baselines and conducting a controlled comparison between centrality-aware models and a single GNN architecture. Master-level projects may extend the analysis to include cross-network transfer learning, partial observability scenarios, and ablation studies to identify which aspects of network topology are captured by centrality metrics versus explicit graph representations. The outcomes of this project aim to clarify the practical trade-offs between implicit and explicit topology encoding in machine learning models for water infrastructure systems.>
Reference: (Not exhaustive)
Centrality-Aware Machine Learning for Water Network Pressure Prediction.

A Systematic Review of Graph Neural Network Applications in Wastewater Systems

Supervisors: Revin Naufal Alief, Dilek Düştegör
Date: 2026-01-05
Type: bachelor-project/student-colloquium/master-internship/master-project
Description:

Graph Neural Networks (GNNs) have shown strong performance in graph-related data. If we are looking at the wastewater systems, they are able to be represented as network topology which is suitable for GNN to works on. Existing studies apply GNNs to a variety of tasks, including state estimation, forecasting, anomaly detection, and sensor placement. However, the literature remains scattered, and the overview of methods, datasets, and research gaps is still lacking. In this project, the student will conduct a systematic literature review of GNN-based approaches in wastewater systems following a structured review protocol (e.g., PRISMA). The student will identify, screen, and categorize relevant studies based on application domain, learning task, data sources (real vs synthetic), and evaluation settings. Special attention will be given to limitations related to sensor availability, uncertainty handling, and generalization across networks. The outcome of the project is a structured taxonomy and comparative analysis of existing GNN-based wastewater studies, highlighting open challenges and opportunities for future research, particularly in sensor placement and digital twin development. Project depth will be adjusted to BS or MS level. References:
Graph Neural Network Empowers Intelligent Education: A Systematic Review From an Application Perspective. The PRISMA 2020 Statement: An Updated Guideline for Reporting Systematic Reviews. A Comprehensive Survey on Graph Neural Networks.

Uncertainty Quantification in GNN-based Water Level Estimation

Supervisors: Revin Naufal Alief, Dilek Düştegör
Date: 2026-01-05
Type: bachelor-project/student-colloquium/master-internship/master-project
Description:

Graph Neural Networks (GNNs) have shown strong performance in estimating hydraulic states in partially observed wastewater networks. However, most existing approaches focus on point predictions and provide limited insight into prediction reliability, which is crucial for decision-making. In this project, the student will (1) do literature review on conformal prediction for graph-structured data, emphasizing the challenges of applying CP to graphs (e.g., exchangeability issues in node/edge settings and dependence in graph neighborhoods), and (2) implement a split conformal prediction pipeline on top of a pre-trained GNN for water level estimation in wastewater networks. Recent work such as Conformalized GNNs (CF-GNN) and conformal prediction sets for GNNs provide practical designs and evaluation protocols for graph settings. The student will evaluate uncertainty quality using empirical coverage and interval width across sensor-masking ratios and analyze how uncertainty varies across sparse sensor conditions. Project depth will be adjusted to BS or MS level.

References: Uncertainty Quantification over Graph with Conformalized Graph Neural Networks. Non-exchangeable Conformal Prediction for Temporal Graph Neural Networks.

Sparse Sensor Placement for Graph-Based State Estimation

Supervisors: Revin Naufal Alief, Dilek Düştegör
Date: 2026-01-05
Type: bachelor-project/student-colloquium/master-internship/master-project
Description:

This project studies sparse sensor placement in large networked systems to support accurate state estimation using graph neural networks (GNNs). The focus is on identifying critical sensor locations using graph-theoretic properties and invariants, and evaluating their impact on learning-based estimation performance. Students will implement and compare sensor placement strategies and assess their effectiveness under limited sensing. Project depth will be adjusted to BS or MS level. Graph Neural Networks for Sensor Placement: A Proof of Concept towards a Digital Twin of Water Distribution Systems. INSPIRE-GNN: Intelligent Sensor Placement to Improve Sparse Bicycling Network Prediction via Reinforcement Learning Boosted Graph Neural Networks Graph Neural Networks for Evaluating the Reliability and Resilience of Infrastructure Systems: A Systematic Review of Models, Applications, and Future Directions

Wastewater Systems Benchmark Dataset Development

Supervisors: Revin Naufal Alief, Dilek Düştegör
Date: 2026-01-05
Type: bachelor-project/student-colloquium/master-internship/master-project
Description:

This project aims to develop an academic benchmark dataset for wastewater systems to support reproducible research. The work includes a literature review, followed by dataset creation through merging existing data or simulating diverse operating scenarios. A key outcome is a modular, automated data-generation pipeline with clear documentation, suitable for data-driven modeling and analysis tasks. Large-Scale Multipurpose Benchmark Datasets for Assessing Data-Driven Deep Learning Approaches for Water Distribution Networks Benchmarking dataset for leak detection and localization in water distribution systems LeakDB: A Benchmark Dataset for Leakage Diagnosis in Water Distribution Networks Benchmarking Dataset for Leak Detection and Localization in Water Distribution Systems

Node masking in Graph Neural Networks

Supervisors: Huy Truong, Dilek Düştegör
Date: 2025-01-21
Type: bachelor-project
Description:

Working with data in the real world often leads to missing information problems, which can negatively affect the performance of deep learning models. However, in proper ways, it can boost the expressiveness of Graph Neural Network (GNN) models in node representation learning through a technique known as Node Masking. In particular, it hides arbitrary nodal features in a graph and instructs the GNN to recover the missing parts. The student can explore diverse masking strategies, such as zero masking, random node replacement, mean-neighbor substitution, shared learnable embedding, and nodal permutation. These options above should be compared and evaluated in a graph reconstruction task that applies to a water distribution network. This study will focus on finding a generative technique that effectively enhances the performance of GNN models in semi-supervised transductive learning. Students interested in joining this project should possess a machine-learning background and a deep-learning framework.
References:
Hou, Zhenyu, Xiao Liu, Yuxiao Dong, Chunjie Wang, and Jie Tang. "GraphMAE: Self-Supervised Masked Graph Autoencoders." arXiv preprint arXiv:2205.10803(2022). Abboud, Ralph, Ismail Ilkan Ceylan, Martin Grohe, and Thomas Lukasiewicz. "The surprising power of graph neural networks with random node initialization." arXiv preprint arXiv:2010.01179 (2020). Hajgató, Gergely, Bálint Gyires-Tóth, and György Paál. "Reconstructing nodal pressures in water distribution systems with graph neural networks." arXiv preprint arXiv:2104.13619 (2021). He, Kaiming, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. "Masked autoencoders are scalable vision learners." In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16000-16009. 2022.