Speaker: Barend Beekhuizen
Title: Learning relational meanings from situated caregiver-child interaction: A computational approach
The difficulty of learning the relational meanings of words like verbs and prepositions has long been acknowledged(Gentner 1978; Gleitman 1990). This acquisition problem has been explored using human subjects (Hirsh-Pasek & Golinkoff 2006 and papers therein) and computational experiments (Siskind 1996, Alishahi & Stevenson 2008), and substantive progress has been made in understanding the acquisition of relational meaning. However, the nature of the available relational meaning in both approaches is to some extent artificial: in lab settings, the noise and variation is controlled and limited, while computational models often do not take the actual situational context into account (exceptions being Fleischman & Roy 2005 and Frank et al. 2009). In this talk, we discuss the acquisition problem using situational contexts from natural, interactional data and computational modeling techniques. We investigate the sources of the learning difficulty and discuss information that is known to affect the process. We believe this combination of situational data and computational techniques presents an important methodological direction for the (cognitive) linguistic enterprise, as we can approximate the source of the meaning closely.
On the basis of natural data (video recordings of caregiver-child dyads playing a game), we first present the magnitude of the problem. In learning the mapping between a linguistic item L and a meaning M that is grounded in a part of a situation, the learner faces three (related) subproblems. First, it may be that in the situation co-occurring with L, the meaning M is absent. Second, it may be that in a lot of situations not co-occurring with L, the meaning M is present. Finally, we often find other aspects of the situation, relating to other, irrelevant, meanings to be systematically present in the situations co-occurring with L.
Next, we describe a computational model of cross-situational word learning (Fazly et al. 2010), which has been shown to perform well on natural language data with synthetic meanings. Using the natural, situated data, we find that when the model's only source of information is the set of situations holding at the moment of speech, it will learn little about both the meanings of nouns and verbs. However, the child has more sources of information at its disposal and we discuss the effects of these. Here we consider the child's insight in typical social interactions (i.c., gamerelated intentions, Tomasello 2001), the emergent distributional knowledge of word classes (Fazly & Alishahi 2010; Mintz 2003) and the selective attention to different aspects of the perceived situation (Alishahi et al. 2012, Nematzadeh et al. 2012). Combining these, we arrive at a usage-based computational learner that uses cues from different domains, in line with the approach suggested by Hollich et al. (2000). Taking a computational modeling approach and using natural linguistic and situational data, we can show the extent to which each cue plays a role in learning different sorts of meaning (referring to objects, their properties, static relations and behavioural actions), thus extending our understanding of the driving factors behind the acquisition of word meanings.