Master of Application Informatics
2021 - 2024
University of Göttingen, Germany
Master Thesis:
Investigated ONNX as a unified middleware to bridge fragmented AI ecosystems (PyTorch, TensorFlow) by evaluating its portability and interoperability across the full AI development lifecycle. Built a custom Transformer Encoder from scratch in PyTorch to classify newly submitted tickets in the GWDG support system and predict the most suitable technical supporter. The data pipeline encompassed dialogue extraction and concatenation, stopword removal and stemming, statistical word-length analysis with Gaussian-fitted cutoff thresholds, word embedding, and multi-strategy dataset splitting. Hyperparameters — learning rate, batch size, number of attention heads, and number of encoder layers — were systematically explored over 300 epochs on the GWDG cluster using A100 GPUs, yielding an optimal configuration (batch size 1, lr 5e-7, 1 head, 4 layers). The model achieved 65% accuracy in recognizing ticket owners among 144 candidates from full conversations, and 22% accuracy when predicting candidates solely from the first submitted question. For portability, the model was exported in five formats — PyTorch state parameters, whole model, checkpoint, TorchScript-ONNX, and Dynamo-ONNX — and retrained across Python, C, C++, Rust, and JavaScript ONNX Runtime environments on both GPU and CPU. TorchScript-ONNX outperformed Dynamo-ONNX in Python but showed the reverse pattern in other languages; Rust exhibited over 2x inference latency compared to C/C++ and JavaScript due to FFI overhead. After reconfiguring the learning rate via ONNX Runtime's exposed API, retraining accuracy improved to 70% (owner recognition) and 25% (candidate prediction), surpassing the original PyTorch baseline. A distributed ONNX-based framework supporting (hyper-)federated learning across heterogeneous devices and programming languages was proposed as a direct outcome of the portability and interoperability findings.
Research Intern
2023.10 - 2024.05
- Implemented the ResNet model family in pure Golang without external AI framework dependencies, deployed compiled binaries on the GWDG cluster using Singularity containers with GPU support. Benchmarked model performance and investigated distributed learning strategies across multiple nodes.
- Tested HPC benchmarks including IO500, HPL, HPCG, and STREAM to assess system performance. Implemented the MiniBUDE benchmark using OpenMP, Julia, CUDA, OpenACC, and OpenCL on both CPU and GPU architectures.
- Managed clusters, performed performance analysis, and developed parallel computing solutions with MPI and CUDA. Designed and implemented a distributed learning system using Golang with MPI for inter-process communication.