HumanCLAIM Workshop

Motivation

Linguistic diversity is one of the fundamental rights of the European Union and we are optimistic that cross-lingual AI models can play an important role in facilitating it. Cross-lingual models based on neural networks are trained on terabytes of data and have recently reached large performance gains. As the computationally expensive training of such models can only be afforded by large companies, the evaluation of cross-lingual models is driven by commercial incentives and focuses on the average quantitative performance across more than a hundred languages. The intricacies of application scenarios for low-resource languages or economically insignificant purposes are largely being overlooked and individual differences between users are underestimated. When we want to use cross-lingual models for human-centered scenarios such as cognitive modeling, language education, or use cases in the digital humanities, we quickly encounter their limitations.

In this workshop, we want to bring together leading scholars from linguistics, cognitive science, and computer science to develop a more diverse and human-centered perspective on cross-lingual models. We want to integrate typological theories about differences between language families, cognitive models of multilingual processing, and computational approaches toward increasing diversity in language technology.

Participation in the workshop is free of charge but the number of participants is limited. Registration is open.

Program

January 11th (Thursday), 2024, 9.30 am - 16.30 pm
Location: Interactive workspace 3D@VU

9.00-9.30 Coffee Walk-In, Catching up
9.30-9.40 Opening (Lisa Beinborn)
9.40-10.00 Richard Diehl Martinez: Lessons learned from the BabyLM challenge
10.00-10.15 Social Clustering: Find your position in representational space
10.15-11.00 Miryam de L’Honeux: Fairness in Multilingual NLP
11.00-11.45 Arianna Bisazza: Can modern LMs be truly polyglot? Language learnability & inequalities in NLP
12.00-14.00 Poster Session and light lunch
14.00-14.15 Pitch: Cross-Lingual Representational Units
14.15-15.45 Flying Paper Rotation
15.45-16.00 Coffee break
16.00-16.30 Presentation of Findings
16.30-17.30 Feierabendbier and Clean-up
from 17.30 onwards: jointly moving to dinner location

Posters

Rochelle Choenni: How do languages influence each other? Studying cross-lingual data sharing during LLM fine-tuning
Sandro Pezzelle: A Psycholinguistic Analysis of BERT’s Representations of Compounds
Carina Kauf: A Better Way to Do Masked Language Model Scoring
Gabriele Sarti: Quantifying the Plausibility of Context Reliance in Neural Machine Translation
Marcel Fekete: Cross-lingual Differences in Subnetworks for Syntactic Phenomena
Deborah Jakobi: MultiplEYE: Enabling multilingual eye-tracking data collection for human and machine language processing research
Andrea Horbach and Marie Bexte: Crosslingual Content Scoring in Five Languages
Lea Krause and Urja Khurana: Confidently Wrong: Exploring the Calibration and Expression of (Un) Certainty of Large Language Models in a Multilingual Setting
Elja Meijer: Exploring the Application of NLP in Narrative Patterns of Adult Attachment
Benjamin Minixhofer: CompoundPiece: Evaluating and Improving Decompounding Performance of Language Models
Lisa Beinborn: Analyzing Cognitive Plausibility of Subword Tokenization
Jinbiao Yang: Unsupervised Text Segmentation Predicts Eye Fixations During Reading
Jirui Qi: Cross-Lingual Consistency of Factual Knowledge in Multilingual Language Models
Kushal Tatariya: Emotion Classification for Code-Mixed Data (Hinglish)
Yuqing Zhang: Endowing Neural Language Learners with Human-like Biases (A Case Study on Dependency Link Minimization)
Wessel Poelman: A Call for Consistency in Reporting Typological Diversity
Zeb Goriely: POS Smoothing - Rare Word Representation Learning by Smoothing Over Word Classes

—

Organization Committee

Lisa Beinborn (Vrije Universiteit Amsterdam)
Richard Diehl Martinez (University of Cambridge)
Urja Khurana (Vrije Universiteit Amsterdam)

The workshop is sponsored by the Network Institute.

Contact

If you have questions about the program, you can send an e-mail to humanclaim@googlegroups.com.