Motivation
Linguistic diversity is one of the fundamental rights of the European Union and we are optimistic that cross-lingual AI models can play an important role in facilitating it. Cross-lingual models based on neural networks are trained on terabytes of data and have recently reached large performance gains. As the computationally expensive training of such models can only be afforded by large companies, the evaluation of cross-lingual models is driven by commercial incentives and focuses on the average quantitative performance across more than a hundred languages. The intricacies of application scenarios for low-resource languages or economically insignificant purposes are largely being overlooked and individual differences between users are underestimated. When we want to use cross-lingual models for human-centered scenarios such as cognitive modeling, language education, or use cases in the digital humanities, we quickly encounter their limitations.
In this workshop, we want to bring together leading scholars from linguistics, cognitive science, and computer science to develop a more diverse and human-centered perspective on cross-lingual models. We want to integrate typological theories about differences between language families, cognitive models of multilingual processing, and computational approaches toward increasing diversity in language technology.
Important Details & Registration
Start: 🕒 Wednesday, March 26th, 2025, 1:30 PM 📍 Startraum, Göttingen
End: 🕒 Thursday, March 27th, 2025, 3 PM 📍 Tagungszentrum an der Sternwarte, Göttingen
Participation in the workshop is free of charge but the number of participants is limited.
We only have a few open spots left. Register quickly by sending an e-mail with your full name to evailse.beck@stud.uni-goettingen.de
Please note that food and drinks can only be guaranteed for those who registered earlier through the main system.
Keynote Speakers
We are thrilled to announce two keynote presentations:
Program
Wednesday, March 26, 2025
13:30 - 13:50 | Opening |
13:50 - 14:50 | Debora Nozza: Subjectivity in Cross-Lingual NLP |
14:50 - 15:00 | Social Clustering |
15:00 - 15:30 | Lucie Flek: Stereotypes in Multilingual Models |
15:30 - 16:00 | Break |
16:00 - 18:00 | Poster Session |
Thursday, March 27, 2025
08:30 - 09:00 | Walk-in Coffee ☕ |
09:00 - 10:00 | Yuval Pinter: Cross-Lingual Challenges in Tokenization |
10:00 - 10:30 | Idea Pitches: The Question |
10:30 - 10:50 | Coffee Break + Group Formation |
10:50 - 12:00 | Group Work Session 1: Finding Common Ground |
12:00 - 13:00 | Lunch 🍽 |
13:00 - 13:30 | Nivedita Mani: Cognitive Models of L1 Acquisition |
13:30 - 14:15 | Group Work Session 2: Preparing Pitches |
14:15 - 14:55 | Group Pitches: The Project to Find the Answer |
14:55 - 15:00 | Closing Remarks |
Posters
- Zebulon Goriely: From babble to words: Pre-training language models on continuous streams of phonemes
- Bastian Bunzeck: Small Language Models Also Work With Small Vocabularies: Probing the Linguistic Abilities of Grapheme- and Phoneme-Based Baby Llamas
- Iza Škrjanec: Evaluating Cognitive Plausibility in Multilingual Tokenization Strategies: Insights from Language Models and Reading Times
- Francesca Padovani: What is the real benefit of using Child Directed Language for Language Modeling?
- David Reich: The Pupil Becomes the Master: Eye-Tracking Feedback for Tuning LLMs
- Marianne de Heer Kloots: Investigating language learning trajectories in self-supervised speech models
- Kathy Hämmerl: Understanding Cross-Lingual Alignment
- Vera Neplenbroek: Cross-Lingual Transfer of Debiasing and Detoxification in Multilingual LLMs: An Extensive Investigation
- Siyao Peng: VariErr NLI: Separating Annotation Error from Human Label Variation
- Verena Blaschke: What Do Dialect Speakers Want? A Survey of Attitudes Towards Language Technology for German Dialects
- Franziska Weeber: Cross-Lingual Political Biases in Multilingual Large Language Models
- Jesujoba O. Alabi: AfriHuBERT: A self-supervised speech representation model for African languages
- Suchir Salhan: Less is More: Pre-Training Cross-Lingual Small-Scale Language Models with Cognitively-Plausible Curriculum Learning Strategies
- Akari Haga: BabyLM Challenge: Exploring the Effect of Variation Sets on Language Model Training Efficiency
- Jaap Jumelet: Extending BLiMP To Many Languages
- Yevgen Matusevych: Evaluating Cross-Lingual Transfer from English and Chinese
- Jan Batzner: GermanPartiesQA: Benchmarking Commercial Large Language Models for Political Bias and Sycophancy
Organization Committee
Contact
If you have questions about the program, you can send an e-mail to humanclaim@googlegroups.com.
Curious about previous editions? Visit the websites for HumanCLAIM 2023 and HumanCLAIM 2024.
Funding
The workshop is funded by zukunft.niedersachsen, the joint science funding program of the Lower Saxony Ministry of Science and Culture and the Volkswagen Foundation.