This project aims to conduct a pilot study utilizing Large Language Models (LLMs) to extract and analyze data on historical tortoiseshell trade from the Dutch East India Company archives. The intern is expected to build an LLM solution to extract all information related to marine life, including but not limited to location, quantity, date, type. This solution would involve dealing with large machine-translated files, serving LLM, prompt engineering, and documentation of the process. This is part of a larger project, where the extracted data will be analyzed within the framework of historical ecology, focusing on quantities, temporal patterns, and geographic distribution (in collaboration with Willemien de Kock (Faculty of Arts) and Emin Tatar (CIT)).
References: 5 million scans VOC archives online and searchable.
DiTEC project- Building a collection of Graph Self-Supervised Learning tasks
Water researchers have relied on simulations to monitor the behavior of Water Distribution Networks. These simulations require a comprehensive set of parameters- such as elevation, demand, and pipe diameters-to determine hydraulic states accurately. This increases labor cost, time consumption and, therefore, poses a significant challenge. But what if we could reverse the process and let AI infer the missing pieces? Building on this idea, the project explores an innovative approach: leveraging data-driven deep learning methods to predict initial input conditions based on available output states. As a researcher on this project, the candidate will select and train a cutting-edge Graph Neural Network on a massive dataset. As a result, the model should be able to predict initial conditions while considering the structural and physical constraints. The candidate will submit the implementation code and a report detailing the problem and the proposed solution. The ideal candidate should have a background in machine learning and be familiar with at least one deep-learning framework.
References: Truong, Huy, et al. "DiTEC-WDN: A Large-Scale Dataset of Hydraulic Scenarios across Multiple Water Distribution Networks." (2025).
Back Propagation(BP) is a de facto approach to training neural network models. Nevertheless, it is biologically implausible and requires complete knowledge (i.e., tracks the entire flow of information from start to end of a model) to perform a backward pass. Instead, an alternative approach called Forward-Forward(FF) can replace the backward pass with an additional forward one and update the model weights in an unsupervised fashion. In particular, FF performs forward passes using positive and negative inputs, respectively, and, therefore, employs the difference between the two activation versions at each layer in the neural network to compute the loss and update weights. Here, the project studies the behavior of FF employing different losses: (1) cross-entropy and (2) harmonic loss. Also, it is valuable to study the relevance between harmonic loss and FF in terms of distance metrics or geometric properties in an embedding space. As a deliverable, the candidate should submit a detailed report and implementation code. For primary requirements, the candidate should be familiar with one of the deep learning frameworks and have experience in setting up machine learning experiments.
References: Hinton, Geoffrey. "The forward-forward algorithm: Some preliminary investigations." *arXiv preprint arXiv:2212.13345* (2022).Baek, David D., et al. "Harmonic Loss Trains Interpretable AI Models." *arXiv preprint arXiv:2502.01628* (2025).
Leveraging Structural Similarity for Performance Estimation of Deep Learning Training Jobs