In collaboration with an industrial partner, we have access to sensor data collected from various machines. We need a curious student to look at the dataset from multiple possible angles and see if datamining / data science techniques can identify any pattern. This is an ideal project if you want to learn about datamining / data science techniques.
Mining sales data to identify patterns (with industrial partner)
In collaboration with an industrial partner, we have access to their sales data. We need a curious student to look at the dataset from multiple possible angles and see if data mining / data science techniques can identify any pattern. This is an ideal project if you want to learn about datamining / data science techniques.
Automated Dataset Generator for Wastewater System Simulations
Automating dataset generation is essential for streamlining the preprocessing pipeline, enabling faster iterations and ensuring scalability as wastewater systems become increasingly complex. This project involves designing and implementing a program to automate dataset generation for wastewater simulations. The program will take inputs such as simulation parameters, geographic data, and network structures (e.g., pipe diameters and node connections) and output datasets compatible with Graph Neural Network (GNN) models. The student will focus on creating a flexible, user-friendly interface for defining input parameters, ensuring compatibility with graph-based models and simulation outputs, and testing the tool across multiple simulation scenarios. The expected outcome is a reusable dataset generation tool that significantly enhances the efficiency of wastewater simulation workflows.
References: Infoworks ICM ExchangeInfoworks Ruby Scripts
Optimizing Graph Neural Networks for Water Level Estimation
Researchers studying drinking water distribution networks often rely on large-scale synthesized datasets. However, the current simulation generating these datasets faces limitations in retrieving metadata that enhances dataset accessibility. Moreover, this missing metadata, including customer profiles at each node in the network, plays a crucial role in classifying customer types and estimating their demand, particularly during peak seasons. To address this gap, the student could apply unsupervised clustering algorithms such as K-NN, K-means, or DBSCAN to identify and retrieve these customer profiles. This project requires a candidate with a solid background in Machine Learning and interest in building robust data pipelines. They will be eventually employed to extract the missing metadata for a large-scale dataset, enabling water experts to analyze water network and benchmark customer behavior efficiently.
References: Tello, A., Truong, H., Lazovik, A., & Degeler, V. (2024). Large-scale multipurpose benchmark datasets for assessing data-driven deep learning approaches for water distribution networks. Engineering Proceedings, 69(1), 50. https://doi.org/10.3390/engproc2024069050.