Kawsar Haghshenas
DL Jobs Generator
Supervisors:
Kawsar Haghshenas,
Mahmoud Alasmar
Date: 2024-11-01
Type: bi
Description:
Implementation of Hardware Monitoring APIs
Supervisors:
Kawsar Haghshenas,
Mahmoud Alasmar
Date: 2024-11-01
Type: bi
Description:
Benchmarking AI Workloads on GPU Cluster
Supervisors:
Kawsar Haghshenas,
Mahmoud Alasmar
Date: 2025-01-21
Type: bachelor
Description:
References:
Gao, Wanling, et al. "Data motifs: A lens towards fully understanding big data and ai workloads." Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques. 2018., https://arxiv.org/abs/1808.08512 Xiao, Wencong, et al. "Gandiva: Introspective cluster scheduling for deep learning." 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). 2018., https://www.usenix.org/conference/osdi18/presentation/xiao Yang, Charlene, et al. "Hierarchical roofline performance analysis for deep learning applications." Intelligent Computing: Proceedings of the 2021 Computing Conference, Volume 2. Springer International Publishing, 2021., https://arxiv.org/abs/2009.05257
Cluster Scheduling for DLT workloads
Supervisors:
Kawsar Haghshenas,
Mahmoud Alasmar
Date: 2025-01-27
Type: colloquium
Description:
Wencong Xiao, Romil Bhardwaj, Ramachandran Ramjee, Muthian Sivathanu, Nipun Kwatra, Zhenhua Han, Pratyush Patel et al. "Gandiva: Introspective Cluster Scheduling for Deep Learning." In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), pp. 595-610. 2018. Weng, Q., Yang, L., Yu, Y., Wang, W., Tang, X., Yang, G., & Zhang, L. (2023). Beware of Fragmentation: Scheduling {GPU-Sharing} Workloads with Fragmentation Gradient Descent. In 2023 USENIX Annual Technical Conference (USENIX ATC 23) (pp. 995-1008). Zhang, X., Zhao, H., Xiao, W., Jia, X., Xu, F., Li, Y., ... & Liu, F. (2024). Rubick: Exploiting Job Reconfigurability for Deep Learning Cluster Scheduling. arXiv preprint arXiv:2408.08586. Lai, F., Dai, Y., Madhyastha, H. V., & Chowdhury, M. (2023). {ModelKeeper}: Accelerating {DNN} training via automated training warmup. In 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23) (pp. 769-785).
Estimating Deep Learning GPU Memory Consumption
Supervisors:
Kawsar Haghshenas,
Mahmoud Alasmar
Date: 2023-12-11
Type: colloquium
Description:
Yanjie Gao, Yu Liu, Hongyu Zhang, Zhengxian Li, Yonghao Zhu, Haoxiang Lin, and Mao Yang. "Estimating GPU memory consumption of deep learning models." In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 1342-1352. 2020. Haiyi Liu, Shaoying Liu, Chenglong Wen, and W. Eric Wong. "TBEM: Testing-Based GPU-Memory Consumption Estimation for Deep Learning." IEEE Access, 10, pp.39674-39680. 2022. Lu Bai, Weixing Ji, Qinyuan Li, Xilai Yao, Wei Xin, and Wanyi Zhu. "Dnnabacus: Toward accurate computational cost prediction for deep neural netw." arXiv preprint arXiv:2205.12095. 2022. Yanjie Gao, Xianyu Gu, Hongyu Zhang, Haoxiang Lin, and Mao Yang. "Runtime performance prediction for deep learning models with graph neural network." In IEEE/ACM 45th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP), pp. 368-380. 2023.