Motivation

Linguistic diversity is one of the fundamental rights of the European Union and we are optimistic that cross-lingual AI models can play an important role in facilitating it. Cross-lingual models based on neural networks are trained on terabytes of data and have recently reached large performance gains. As the computationally expensive training of such models can only be afforded by large companies, the evaluation of cross-lingual models is driven by commercial incentives and focuses on the average quantitative performance across more than a hundred languages. The intricacies of application scenarios for low-resource languages or economically insignificant purposes are largely being overlooked and individual differences between users are underestimated. When we want to use cross-lingual models for human-centered scenarios such as cognitive modeling, language education, or use cases in the digital humanities, we quickly encounter their limitations.

In this workshop, we want to bring together leading scholars from linguistics, cognitive science, and computer science to develop a more diverse and human-centered perspective on cross-lingual models. We want to integrate typological theories about differences between language families, cognitive models of multilingual processing, and computational approaches toward increasing diversity in language technology.

Important Details & Registration

Start: 🕒 Wednesday, March 26th, 2025, 1:30 PM 📍 Startraum, Göttingen
End: 🕒 Thursday, March 27th, 2025, 3 PM 📍 Tagungszentrum an der Sternwarte, Göttingen

Participation in the workshop is free of charge but the number of participants is limited.
We only have a few open spots left. Register quickly by sending an e-mail with your full name to evailse.beck@stud.uni-goettingen.de
Please note that food and drinks can only be guaranteed for those who registered earlier through the main system.

Keynote Speakers

We are thrilled to announce two keynote presentations:

	Debora Nozza (Bocconi University, Milano) Topic: Subjectivity in NLP and Cross-Lingual Hate Speech Debora Nozza will present her ERC-funded project focusing on subjectivity in NLP and discuss her research on cross-lingual hate speech detection.
	Yuval Pinter (Ben-Gurion University of the Negev) Topic: Challenges in Tokenization Across Languages Yuval Pinter will explore tokenization challenges in multiple languages and their impact on what models can learn.

Program

Wednesday, March 26, 2025

13:30 - 13:50	Opening
13:50 - 14:50	Debora Nozza: Subjectivity in Cross-Lingual NLP
14:50 - 15:00	Social Clustering
15:00 - 15:30	Lucie Flek: Stereotypes in Multilingual Models
15:30 - 16:00	Break
16:00 - 18:00	Poster Session

Thursday, March 27, 2025

08:30 - 09:00	Walk-in Coffee ☕
09:00 - 10:00	Yuval Pinter: Cross-Lingual Challenges in Tokenization
10:00 - 10:30	Idea Pitches: The Question
10:30 - 10:50	Coffee Break + Group Formation
10:50 - 12:00	Group Work Session 1: Finding Common Ground
12:00 - 13:00	Lunch 🍽
13:00 - 13:30	Nivedita Mani: Cognitive Models of L1 Acquisition
13:30 - 14:15	Group Work Session 2: Preparing Pitches
14:15 - 14:55	Group Pitches: The Project to Find the Answer
14:55 - 15:00	Closing Remarks

Posters

Zebulon Goriely: From babble to words: Pre-training language models on continuous streams of phonemes
Bastian Bunzeck: Small Language Models Also Work With Small Vocabularies: Probing the Linguistic Abilities of Grapheme- and Phoneme-Based Baby Llamas
Iza Škrjanec: Evaluating Cognitive Plausibility in Multilingual Tokenization Strategies: Insights from Language Models and Reading Times
Francesca Padovani: What is the real benefit of using Child Directed Language for Language Modeling?
David Reich: The Pupil Becomes the Master: Eye-Tracking Feedback for Tuning LLMs
Marianne de Heer Kloots: Investigating language learning trajectories in self-supervised speech models
Kathy Hämmerl: Understanding Cross-Lingual Alignment
Vera Neplenbroek: Cross-Lingual Transfer of Debiasing and Detoxification in Multilingual LLMs: An Extensive Investigation
Siyao Peng: VariErr NLI: Separating Annotation Error from Human Label Variation
Verena Blaschke: What Do Dialect Speakers Want? A Survey of Attitudes Towards Language Technology for German Dialects
Franziska Weeber: Cross-Lingual Political Biases in Multilingual Large Language Models
Jesujoba O. Alabi: AfriHuBERT: A self-supervised speech representation model for African languages
Suchir Salhan: Less is More: Pre-Training Cross-Lingual Small-Scale Language Models with Cognitively-Plausible Curriculum Learning Strategies
Akari Haga: BabyLM Challenge: Exploring the Effect of Variation Sets on Language Model Training Efficiency
Jaap Jumelet: Extending BLiMP To Many Languages
Yevgen Matusevych: Evaluating Cross-Lingual Transfer from English and Chinese
Jan Batzner: GermanPartiesQA: Benchmarking Commercial Large Language Models for Political Bias and Sycophancy

Impressions from HumanCLAIM 2024

Organization Committee


Lisa Beinborn (University of Goettingen)	Richard Diehl Martinez (University of Cambridge)	Urja Khurana (Vrije Universiteit Amsterdam)	Eva Beck (University of Goettingen)

Contact

If you have questions about the program, you can send an e-mail to humanclaim@googlegroups.com.

Curious about previous editions? Visit the websites for HumanCLAIM 2023 and HumanCLAIM 2024.

Funding

The workshop is funded by zukunft.niedersachsen, the joint science funding program of the Lower Saxony Ministry of Science and Culture and the Volkswagen Foundation.

HumanCLAIM Workshop

The Human Perspective on Cross-Lingual AI Models