Huy Truong

Dataset Management

Supervisors: Huy Truong, Andrés Tello
Date: 2025-08-25
Type: bi
Description:

Have you ever managed a large dataset? This project provides an opportunity to handle a dataset with over 8,000 downloads each month. You will reorganize the dataset by task, document it thoroughly, and create a user-friendly interface and leaderboard. The project also involves working with HPC clusters, Hugging Face libraries, and GitHub Pages for documentation. Basic Python skills and familiarity with Linux commands are required.

DiTEC project- Unsupervised Learning for Customer Profiles in Water Distribution Networks

Supervisors: Huy Truong, Dilek Düştegör
Date: 2025-01-21
Type: bachelor
Description:

Researchers studying drinking water distribution networks often rely on large-scale synthesized datasets. However, the current simulation generating these datasets faces limitations in retrieving metadata that enhances dataset accessibility. Moreover, this missing metadata, including customer profiles at each node in the network, plays a crucial role in classifying customer types and estimating their demand, particularly during peak seasons. To address this gap, the student could apply unsupervised clustering algorithms such as K-NN, K-means, or DBSCAN to identify and retrieve these customer profiles. This project requires a candidate with a solid background in Machine Learning and interest in building robust data pipelines. They will be eventually employed to extract the missing metadata for a large-scale dataset, enabling water experts to analyze water network and benchmark customer behavior efficiently.
References:
Tello, A., Truong, H., Lazovik, A., & Degeler, V. (2024). Large-scale multipurpose benchmark datasets for assessing data-driven deep learning approaches for water distribution networks. Engineering Proceedings, 69(1), 50. https://doi.org/10.3390/engproc2024069050.

Node masking in Graph Neural Networks

Supervisors: Huy Truong, Dilek Düstegör
Date: 2025-01-21
Type: bachelor
Description:

Working with data in the real world often leads to missing information problems, which can negatively affect the performance of deep learning models. However, in proper ways, it can boost the expressiveness of Graph Neural Network (GNN) models in node representation learning through a technique known as Node Masking. In particular, it hides arbitrary nodal features in a graph and instructs the GNN to recover the missing parts. The student can explore diverse masking strategies, such as zero masking, random node replacement, mean-neighbor substitution, shared learnable embedding, and nodal permutation. These options above should be compared and evaluated in a graph reconstruction task that applies to a water distribution network. This study will focus on finding a generative technique that effectively enhances the performance of GNN models in semi-supervised transductive learning. Students interested in joining this project should possess a machine-learning background and a deep-learning framework.
References:
Hou, Zhenyu, Xiao Liu, Yuxiao Dong, Chunjie Wang, and Jie Tang. "GraphMAE: Self-Supervised Masked Graph Autoencoders." arXiv preprint arXiv:2205.10803(2022). Abboud, Ralph, Ismail Ilkan Ceylan, Martin Grohe, and Thomas Lukasiewicz. "The surprising power of graph neural networks with random node initialization." arXiv preprint arXiv:2010.01179 (2020). Hajgató, Gergely, Bálint Gyires-Tóth, and György Paál. "Reconstructing nodal pressures in water distribution systems with graph neural networks." arXiv preprint arXiv:2104.13619 (2021). He, Kaiming, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. "Masked autoencoders are scalable vision learners." In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16000-16009. 2022.

Multimodality Graph Foundation Models

Supervisors: Huy Truong
Date: 2025-01-21
Type: colloquium
Description:

Test-Time Training

Supervisors: Huy Truong
Date: 2025-01-21
Type: colloquium
Description:

DiTEC project- Building a collection of Graph Self-Supervised Learning tasks

Supervisors: Huy Truong
Date: 2025-04-03
Type: internship
Description:

Self-supervised learning (SSL) has shown great potential in enhancing the capabilities of large foundation models, but its application to graph modalities remains underexplored. This project aims to investigate popular SSL tasks across node-level, link-level, and graph-level challenges, as well as more complex graph representation learning approaches. As a researcher of the project, the candidate will develop a framework that enables users to train deep learning models using these tasks on independent datasets. The final deliverables will include the implementation code and a report detailing the problem and the proposed solution. The ideal candidate should have a background in machine learning, and experience with at least one deep learning framework.
References:
Liu, Yixin, et al. "Graph self-supervised learning: A survey." *IEEE transactions on knowledge and data engineering* 35.6 (2022): 5879-5900. Wu, Lirong, et al. "Self-supervised learning on graphs: Contrastive, generative, or predictive." *IEEE Transactions on Knowledge and Data Engineering* 35.4 (2021): 4216-4235.

DiTEC project- Inverse problem in Water Distribution Networks

Supervisors: Huy Truong
Date: 2025-04-03
Type: internship
Description:

Water researchers have relied on simulations to monitor the behavior of Water Distribution Networks. These simulations require a comprehensive set of parameters- such as elevation, demand, and pipe diameters-to determine hydraulic states accurately. This increases labor cost, time consumption and, therefore, poses a significant challenge. But what if we could reverse the process and let AI infer the missing pieces? Building on this idea, the project explores an innovative approach: leveraging data-driven deep learning methods to predict initial input conditions based on available output states. As a researcher on this project, the candidate will select and train a cutting-edge Graph Neural Network on a massive dataset. As a result, the model should be able to predict initial conditions while considering the structural and physical constraints. The candidate will submit the implementation code and a report detailing the problem and the proposed solution. The ideal candidate should have a background in machine learning and be familiar with at least one deep-learning framework.
References:
Truong, Huy, et al. "DiTEC-WDN: A Large-Scale Dataset of Hydraulic Scenarios across Multiple Water Distribution Networks." (2025).

DiTEC project- Bio-inspired Water Network Design

Supervisors: Huy Truong
Date: 2025-04-03
Type: internship
Description:

Designing Water Distribution Networks (WDN) has been portrayed as a complex, labor, and time-consuming process. To alleviate this, the project aims to automate the design using Evolution Strategy (ES). In particular, these algorithms should search and optimize values of hydraulic parameters, such as nodal elevation, pump speed, and pipe length, to construct a complete simulation configuration. This configuration should follow the local, structural, and physical restrictions (i.e., multi-objective optimization). As a researcher on this project, the candidate will explore an ES framework to develop the optimization algorithm and apply it to a water distribution domain. As such, the candidate should be familiar with machine-learning experiments. As deliverables, the candidate should submit the report and implementation code that generates optimized configurations. These configurations will help water researchers simulate, analyze, and understand the WDN’s behavior and enhance the monitoring capability of these systems in practice.
References:
Gad, Ahmed Fawzy. "Pygad: An intuitive genetic algorithm python library." *Multimedia tools and applications* 83.20 (2024): 58029-58042. Toklu, Nihat Engin, et al. "Evotorch: Scalable evolutionary computation in python." *arXiv preprint arXiv:2302.12600* (2023). Lange, Robert Tjarko. "evosax: Jax-based evolution strategies, 2022." *URL http://github.com/RobertTLange/evosax* 7 (2022).

Can we train a Neural Network with Forward-Forward “harmoniously”?

Supervisors: Huy Truong
Date: 2025-04-03
Type: internship
Description:

Back Propagation(BP) is a de facto approach to training neural network models. Nevertheless, it is biologically implausible and requires complete knowledge (i.e., tracks the entire flow of information from start to end of a model) to perform a backward pass. Instead, an alternative approach called Forward-Forward(FF) can replace the backward pass with an additional forward one and update the model weights in an unsupervised fashion. In particular, FF performs forward passes using positive and negative inputs, respectively, and, therefore, employs the difference between the two activation versions at each layer in the neural network to compute the loss and update weights. Here, the project studies the behavior of FF employing different losses: (1) cross-entropy and (2) harmonic loss. Also, it is valuable to study the relevance between harmonic loss and FF in terms of distance metrics or geometric properties in an embedding space. As a deliverable, the candidate should submit a detailed report and implementation code. For primary requirements, the candidate should be familiar with one of the deep learning frameworks and have experience in setting up machine learning experiments.
References:
Hinton, Geoffrey. "The forward-forward algorithm: Some preliminary investigations." *arXiv preprint arXiv:2212.13345* (2022). Baek, David D., et al. "Harmonic Loss Trains Interpretable AI Models." *arXiv preprint arXiv:2502.01628* (2025).