Motivation
Linguistic diversity is one of the fundamental rights of the European Union and we are optimistic that cross-lingual AI models can play an important role in facilitating it. Cross-lingual models based on neural networks are trained on terabytes of data and have recently reached large performance gains. As the computationally expensive training of such models can only be afforded by large companies, the evaluation of cross-lingual models is driven by commercial incentives and focuses on the average quantitative performance across more than a hundred languages. The intricacies of application scenarios for low-resource languages or economically insignificant purposes are largely being overlooked and individual differences between users are underestimated. When we want to use cross-lingual models for human-centered scenarios such as cognitive modeling, language education, or use cases in the digital humanities, we quickly encounter their limitations.
In this workshop, we want to bring together leading scholars from linguistics, cognitive science, and computer science to develop a more diverse and human-centered perspective on cross-lingual models. We want to integrate typological theories about differences between language families, cognitive models of multilingual processing, and computational approaches toward increasing diversity in language technology.
Important Details
Date: March 26th to March 27th, 2025
Location: Göttingen, Germany
Participation in the workshop is free of charge but the number of participants is limited.
Registration is open (see below).
Program Highlights
Keynote Speakers
We are thrilled to announce two keynote presentations:
-
Debora Nozza (Bocconi University, Milano)
Topic: Subjectivity in NLP and Cross-Lingual Hate Speech
Debora Nozza will present her ERC-funded project focusing on subjectivity in NLP and discuss her research on cross-lingual hate speech detection. -
Yuval Pinter (Ben-Gurion University of the Negev)
Topic: Challenges in Tokenization Across Languages
Yuval Pinter will explore tokenization challenges in multiple languages and their impact on what models can learn.
Interactive Sessions
- Poster Session – A great opportunity for participants to present their work and receive feedback.
- Idea Pitches – Have an idea but need expertise? Join focused small-group discussions to exchange knowledge.
- Workshop Dinner – Continuing our tradition of fostering personal interaction, we organize a social dinner on Wednesday evening.
Posters
- Zebulon Goriely: From babble to words: Pre-training language models on continuous streams of phonemes
- Bastian Bunzeck: Small Language Models Also Work With Small Vocabularies: Probing the Linguistic Abilities of Grapheme- and Phoneme-Based Baby Llamas
- Iza Škrjanec: Evaluating Cognitive Plausibility in Multilingual Tokenization Strategies: Insights from Language Models and Reading Times
- Francesca Padovani: What is the real benefit of using Child Directed Language for Language Modeling?
- David Reich: The Pupil Becomes the Master: Eye-Tracking Feedback for Tuning LLMs
- Marianne de Heer Kloots: Investigating language learning trajectories in self-supervised speech models
- Kathy Hämmerl: Understanding Cross-Lingual Alignment
- Vera Neplenbroek: Cross-Lingual Transfer of Debiasing and Detoxification in Multilingual LLMs: An Extensive Investigation
- Siyao Peng: VariErr NLI: Separating Annotation Error from Human Label Variation
- Verena Blaschke: What Do Dialect Speakers Want? A Survey of Attitudes Towards Language Technology for German Dialects
- Franziska Weeber: Cross-Lingual Political Biases in Multilingual Large Language Models
- Jesujoba O. Alabi: AfriHuBERT: A self-supervised speech representation model for African languages
- Suchir Salahan: Less is More: Pre-Training Cross-Lingual Small-Scale Language Models with Cognitively-Plausible Curriculum Learning Strategies
- Akari Haga: BabyLM Challenge: Exploring the Effect of Variation Sets on Language Model Training Efficiency
- Jaap Jumelet: Extending BLiMP To Many Languages
- Yevgen Matusevych: Evaluating Cross-Lingual Transfer from English and Chinese
Workshop Schedule
Wednesday, March 26, 2025
13:30 - 14:00 | Opening
14:00 - 15:00 | Debora Nozza: Subjectivity in Cross-Lingual NLP
15:00 - 15:30 | Coffee Break ☕
15:30 - 16:00 | Lucie Flek: Stereotypes in Multilingual Models
16:00 - 18:00 | Poster Session
18:00 | Walking Tour 🚶♂️
19:00 | Workshop Dinner 🍽
Thursday, March 27, 2025
08:30 - 09:00 | Walk-in Coffee ☕
09:00 - 10:00 | Yuval Pinter: Cross-Lingual Challenges in Tokenization
10:00 - 10:30 | Idea Pitches: The Question
10:30 - 10:50 | Coffee Break + Group Formation
10:50 - 12:00 | Group Work Session 1: Finding Common Ground
12:00 - 13:00 | Lunch 🍽
13:00 - 13:30 | Nivedita Mani: Cognitive Models of L1 Acquisition
13:30 - 14:15 | Group Work Session 2: Preparing Pitches
14:15 - 14:55 | Group Pitches: The Project to Find the Answer
14:55 - 15:00 | Closing Remarks
Organization Committee
Lisa Beinborn (University of Goettingen)
Richard Diehl Martinez (University of Cambridge)
Urja Khurana (Vrije Universiteit Amsterdam)
Eva Beck (University of Goettingen)
Contact
If you have questions about the program, you can send an e-mail to humanclaim@googlegroups.com.
Curious about previous editions? 2024, 2023.