Jakob Johannes Bauer

Research keywords: NLP, AI in Education, Evaluations, Alignment, Reinforcement Learning

Bio:

Hi, I am Jakob Johannes Bauer, and I moved to Edinburgh in 2025 to begin my PhD in Responsible Natural Language Processing, funded by the UKRI. Before moving here I did my MSc in Computer Science at ETH Zurich, where I specialized in Machine Intelligence and Data Management with a focus on NLP. My master thesis explored reinforcement learning for interpretability, and I carried out several research projects in the NLP & Education Lab.

Before that, I studied Software and Information Engineering at the Vienna University of Technology, where I graduated with distinction. My bachelor thesis focused on combining symbolic AI and NLP to generalize visual question answering on graph-related problems.

Alongside my studies, I gained four years of part-time industry experience in fine-tuning and deploying large language models, as well as full-stack development. I have also completed two internships, contributing to start-ups applying LLMs in real estate and aviation. I received the Huawei Seeds for the Future award and was part of the category winning teams at the Datathon and HackZurich events.

I love scouting, snowboarding and cozy books and I have been engaging in volunteer work throughout my life.

PhD research:

My vision is to design reliable AI systems that enable teachers, as well as students.

Through my involvement with the Scouts, I experienced the challenges created by mainstream LLM tools in the educational sector first-hand. Instead of being tools that assist students during their study, these models often shortcut the learning experience. This motivated me to research how language technologies can be built, that genuinely support, rather than undermine, the human learning experience.

My current focus is on evaluation heuristics. Seeing how current methods fail to capture the real-world educational and social impact of text, my research aim is to create consistent and interpretable evaluation frameworks that better reflect on the interaction of language models and humans in practice. I’ve seen many papers in the NLP & Education domain use other LLMs to rate educational value, which hinders training towards systems that truly optimize for learning.

Supervisors: Mirella Lapata