Motivation
Linguistic diversity is one of the fundamental rights of the European Union and we are optimistic that cross-lingual AI models can play an important role in facilitating it. Cross-lingual models based on neural networks are trained on terabytes of data and have recently reached large performance gains. As the computationally expensive training of such models can only be afforded by large companies, the evaluation of cross-lingual models is driven by commercial incentives and focuses on the average quantitative performance across more than a hundred languages. The intricacies of application scenarios for low-resource languages or economically insignificant purposes are largely being overlooked and individual differences between users are underestimated. When we want to use cross-lingual models for human-centered scenarios such as cognitive modeling, language education, or use cases in the digital humanities, we quickly encounter their limitations.
In this workshop, we want to bring together leading scholars from linguistics, cognitive science, and computer science to develop a more diverse and human-centered perspective on cross-lingual models. We want to integrate typological theories about differences between language families, cognitive models of multilingual processing, and computational approaches toward increasing diversity in language technology.
Participation in the workshop is free of charge but the number of participants is limited. Registration is open.
Program
January 11th (Thursday), 2024, 9.30 am - 16.30 pm
Location: Interactive workspace 3D@VU
9.00-9.30 Coffee Walk-In, Catching up
9.30-9.40 Opening (Lisa Beinborn)
9.40-10.00 Richard Diehl Martinez: Lessons learned from the BabyLM challenge
10.00-10.15 Social Clustering: Find your position in representational space
10.15-11.00 Miryam de L’Honeux: Fairness in Multilingual NLP
11.00-11.45 Arianna Bisazza: Can modern LMs be truly polyglot? Language learnability & inequalities in NLP
12.00-14.00 Poster Session and light lunch
14.00-14.15 Pitch: Cross-Lingual Representational Units
14.15-15.45 Flying Paper Rotation
15.45-16.00 Coffee break
16.00-16.30 Presentation of Findings
16.30-17.30 Feierabendbier and Clean-up
from 17.30 onwards: jointly moving to dinner location
Posters
- Rochelle Choenni: How do languages influence each other? Studying cross-lingual data sharing during LLM fine-tuning
- Sandro Pezzelle: A Psycholinguistic Analysis of BERT’s Representations of Compounds
- Carina Kauf: A Better Way to Do Masked Language Model Scoring
- Gabriele Sarti: Quantifying the Plausibility of Context Reliance in Neural Machine Translation
- Marcel Fekete: Cross-lingual Differences in Subnetworks for Syntactic Phenomena
- Deborah Jakobi: MultiplEYE: Enabling multilingual eye-tracking data collection for human and machine language processing research
- Andrea Horbach and Marie Bexte: Crosslingual Content Scoring in Five Languages
- Lea Krause and Urja Khurana: Confidently Wrong: Exploring the Calibration and Expression of (Un) Certainty of Large Language Models in a Multilingual Setting
- Elja Meijer: Exploring the Application of NLP in Narrative Patterns of Adult Attachment
- Benjamin Minixhofer: CompoundPiece: Evaluating and Improving Decompounding Performance of Language Models
- Lisa Beinborn: Analyzing Cognitive Plausibility of Subword Tokenization
- Jinbiao Yang: Unsupervised Text Segmentation Predicts Eye Fixations During Reading
- Jirui Qi: Cross-Lingual Consistency of Factual Knowledge in Multilingual Language Models
- Kushal Tatariya: Emotion Classification for Code-Mixed Data (Hinglish)
- Yuqing Zhang: Endowing Neural Language Learners with Human-like Biases (A Case Study on Dependency Link Minimization)
- Wessel Poelman: A Call for Consistency in Reporting Typological Diversity
- Zeb Goriely: POS Smoothing - Rare Word Representation Learning by Smoothing Over Word Classes
Organization Committee
Lisa Beinborn (Vrije Universiteit Amsterdam)
Richard Diehl Martinez (University of Cambridge)
Urja Khurana (Vrije Universiteit Amsterdam)
The workshop is sponsored by the Network Institute.
Contact
If you have questions about the program, you can send an e-mail to humanclaim@googlegroups.com.