The learnability consequence and sources of Zipfian distributions

Speaker: Inbal Arnon

Location: 60 Fifth Avenue, Room 7th floor open space

Date: Monday, September 23, 2024

While the world’s languages differ in many respects, they share certain commonalities: these can provide crucial insight into our shared cognition and how it impacts language structure. In this talk, I explore the learnability consequences and sources of one of the most striking commonalities across languages: the way word frequencies are distributed. Across languages, words follow a Zipfian distribution, showing a power law relation between a word’s frequency and its rank. The source of this distribution has been heavily debated with ongoing controversy about what it can tell us about language. Here, I propose that such distributions confer a learnability advantage, leading to enhanced language learning in children, and the creation of a cognitive pressure to maintain similarly skewed distributions over time. In the first part of the talk, I will examine the learnability consequences of Zipfian distributions, focusing on their greater unigram predictability. In the second part, I explore the learnability sources of Zipfian distributions to ask whether learning biases can help explain why such distributions are so common in language. I will present joint work with Prof. Simon Kirby suggesting that this foundational distributional property of language can emerge through cultural transmission characterised by whole-to-part learning. I will end by discussing implications of these findings for large language models.