Language as Data (winter term)
This course introduces students to the characteristics of language data and the associated challenges for representation learning. Natural language is a powerful and creative means of communication: it follows rules only to make exceptions, and it evolves over time and from domain to domain. Language signals are highly ambiguous and the form—meaning mapping can only be interpreted in context. In this course, students acquire the conceptual knowledge to analyze structure in language data and understand the methodological assumptions underlying representation learning in large language models.
Language Modeling Research and Evaluation (LaMoRE, winter term)
New language models are released almost every month these days. In the technical reports, the quality of these models is evaluated on hundreds of datasets and languages. But what do these averaged numbers mean? And what can we infer about the strengths and weaknesses of the model? This course mixes theoretical discussions on evaluation concepts, practical sessions focused on data and model analysis, and invited talks by guest researchers sharing their perspectives on what language models can and cannot (yet) do and how to measure it. For this course, you do not need to know the technical details of language modeling architectures but need to bring a general interest in language modeling research and the willingness to do finegrained data analysis.
Advanced Natural Language Processing (every semester)
In this seminar, we review and discuss recent advances in the field. Students need to have foundational knowledge in natural language processing and/or machine learning to be able to read and understand state-of-the-art research papers. Every semester, we will focus on a methodological subtopic.
Winter term 24/25: In neural models, information processing is widely distributed over the neurons. In order to adapt models to new tasks and domains, modular solutions can help us encapsulate processes and better control the information flow. Modularity is particularly relevant for transfer learning, for example in multilingual models. You can choose to focus a) more on the methodological aspect of modularity or b) more on multilingual models and cross-lingual transfer.
Interpretability & Bias in Machine Learning (summer term)
In this advanced lecture, we learn to better understand the strengths and weaknesses of machine learning models. ML models influence societal processes when they are used to predict highly complex scenarios for political decisions, medical advice, scientific discovery, or educational funding. In these high-stakes scenarios, we need to be transparent about the conditions and values which are under- or misrepresented in the model because they remain invisible for identifying patterns and predicting trends. In this course, students are introduced to a range of interpretability methods which can provide partial insights on the information encoded in the model. They learn to pay attention to different types of biases that arise during the application and development of ML models and practice strategies for discovering, quantifying, and countering undesirable bias.
For the future, we also plan seminars on data science with cognitive signals and educational language technology.
Theses
We are always looking for motivated students to work with us. If you are interested, reach out to claplab@uni-goettingen.de.
Have a look at our list of potential thesis topics or pitch your own idea based on a recent paper or a shared task.