CILVR Seminar: Towards Language Technology for a Truly Multilingual World?

Speaker: Ivan Vulić

Location: 60 Fifth Avenue, Room 7th Floor Open Space
Videoconference link: https://nyu.zoom.us/j/97986595706

Date: Tuesday, May 3, 2022

Language technology tools such as Google Translate or virtual assistants (Siri, Alexa) were components of collective SciFi-inspired imagination not many years ago. Today, they are an essential driver of the digital AI transformation, used by hundreds of millions of people. A key challenge in multilingual NLP and IR is developing general language-independent architectures that will be equally applicable to any language. However, this ambition is hindered by the large variation in 1) structural and semantic properties of the world’s languages, as well as 2) raw and task data scarcity for many different languages, tasks, and application domains. As a consequence, existing language technology is still largely limited to a handful of resource-rich languages, leaving the vast majority of the world’s 7,000+ languages and their speakers behind, thus amplifying the problem of the “digital language divide”. In this talk, I will introduce and discuss the importance of addressing multilingualism and bringing language technology also to minor and low-resource languages and communities. I will introduce a range of recent techniques, breakthroughs and lessons learned that aim to deal with such large cross-language variations and low-data learning regimes. I will also demonstrate that low-resource languages, despite very positive research trends and results achieved in recent years, still lag behind major languages in terms of performance, resources, overall representation in NLP/IR research and other key aspects, and will outline several crucial challenges for future research in this area.