Prediction in speech and language processing.

c o r t e x x x x ( 2 0 1 5 ) 1 e7

Available online at www.sciencedirect.com

ScienceDirect Journal homepage: www.elsevier.com/locate/cortex

Special issue: Editorial

Prediction in speech and language processing Alessandro Tavano a,b,* and Mathias Scharinger a,b a b

Institute of Psychology, University of Leipzig, Leipzig, Germany Max Planck Institute for Empirical Aesthetics, Frankfurt am Main, Germany

One of the main tenets of current theories of brain function is that sensory evidence is not sufficient to explain perceptual processing and decision-making, and, more generally, to correctly infer the causal structure of the world (cf. Friston, 2005; Gregory, 1980). This observation cannot be simply attributed to noise during information transduction or information loss along the deputy neural pathways. Rather, a percept is assumed to result from the integration of the available sensory evidence with prior assumptions, gathered from diverse sources such as contextual stimulus statistics (rule extraction), long-term memory or implicit Gestalt principles (Bar, 2009; Friston, 2005; Summerfield & de Lange, 2014; Wacongne et al., 2011; Winkler, Denham, & Nelken, 2009). This idea is developed within the framework of predictive coding. Predictive coding refers to a theory or a family of theories that sees neural responses as determined by the interaction of sensory evidence with statistical priors at each cortical level, as modeled by hierarchical Bayesian inference (Friston, 2005; Mumford, 1999; Rao & Ballard, 1999). Predictive coding aims at explaining perception, behavior as well as the generation and update of beliefs and movement patterns (Dayan, Hinton, Neal, & Zemel, 1995; Friston, 2008; Friston, Kilner, & Harrison, 2006; Lee & Mumford, 2003). In this view, percepts e whether they are experimental tone sequences, Gabor patches, Telemann's chaconnes or complex auditory and social objects such as a request of information in a noisy urban environment e can be conceived of as hypotheses about what caused a specific sensory input. The brain is assumed to computeinferences about the most likely sensory environment in an effort to minimize prediction error, occurring when bottom-up sensory evidence does not match with top-down predictions or priors (Friston, 2005). It follows that perceptual hypotheses are continuously tested and updated, relating lower-level sensory areas to higher-level cognitive areas. The idea that

perceptual processes operate by bridging sensation with prior expectations can be traced back to Helmholtz's suggestion that perceptions subconsciously draw on information from long-term memory (Helmholtz, 1909). Notably, the inclusion of priors can overcome the information processing bottleneck of our senses. Our sensory environment is structured in an extremely rich and complex way, such that sensory information processing cannot possibly be based on every imaginable detail. Top-down priors help select the level of precision relevant for the task at hand (cf. Nahum, Nelken, & Ahissar, 2010; Purves, Monson, Sundararajan, & Wojtach, 2014; Skipper, 2014), ultimately determining perceptual decision-making (Summerfield & de Lange, 2014; Summerfield et al., 2006). An implicit understanding of predictive processes has commonly been assumed in traditional psycho- and neurolinguistic priming paradigms (Kotz, Cappa, von Cramon, & Friederici, 2002; Sass, Krach, Sachs, & Kircher, 2009), in which long-term memory traces for words are pre-activated by a specific context (e.g. meaning-related preceding word or utterance). In a similar vein, sentences can provide a predictive context for their unfolding in time: for instance, sentenceclosing words can be anticipated by semantic/sentential context (Kutas, DeLong, & Smith, 2011; Kutas & Hillyard, 1984; Lau, Holcomb, & Kuperberg, 2013; Van Petten & Luka, 2012). Pre-activation of memory traces is also related to frequency effects in speech and language processing (Connine, Titone, & Wang, 1993). Frequency effects refer to a faster activation and retrieval of words that have a high frequency of occurrence compared to those that have a low frequency of occurrence. Frequency effects become particularly apparent when examining native speakers of a language who are more familiar with their own vocabulary than with a non-native one. They can better rely on top-down inferences than a non-native

* Corresponding author. Present address: Max Planck Institute for Empirical Aesthetics, Gru¨neburgweg 14, 60322 Frankfurt am Main, Germany. E-mail addresses: [email protected], [email protected] (A. Tavano). http://dx.doi.org/10.1016/j.cortex.2015.05.001 0010-9452/© 2015 Elsevier Ltd. All rights reserved.

Please cite this article in press as: Tavano, A., & Scharinger, M., Prediction in speech and language processing, Cortex (2015), http://dx.doi.org/10.1016/j.cortex.2015.05.001

2

c o r t e x x x x ( 2 0 1 5 ) 1 e7

speaker such that in conditions where bottom-up sensory evidence is compromised (e.g. in speech-in-noise tasks), performance is better in the native than in the non-native language (Golestani, Rosen, & Scott, 2009). However, only recently have speech and language processes been explicitly analyzed within a predictive coding framework (e.g. Bendixen, Scharinger, Strauß, & Obleser, 2014; Li, Lu, & Zhao, 2014; Rothermich & Kotz, 2013; Scharinger, Bendixen, Trujillo-Barreto, & Obleser, 2012; Sohoglu, Peelle, Carlyon, & Davis, 2012). A predictive coding approach seems particularly relevant for speech and language comprehension (Gagnepain, Henson, & Davis, 2012). Predictive processes facilitate the recognition of rapid speech transition dynamics that provide access to word traces stored in long-term memory (Tavano et al., 2012). More generally, both dynamic acoustic changes in speech patterns as well as the combinatorial power of sentence and utterance formation clearly require powerful feed-forward mechanisms to efficiently process upcoming speech sounds and words in real time. Indeed, humans can effortlessly process an average of 300 syllables per minute (Greenberg, Hitchcock, & Chang, 2003), readily deal with missing material in input (Petkov & Sutter, 2011; Zimmerer, Scharinger, & Reetz, 2014) and are able to predict utterance durations ahead of time by means of prosodic information (e.g. fundamental frequency of speech sounds) as proposed by Klatt (1976) and modeled by Goubanova and King (2008). Signal-related predictions operate at the earliest stages of neural processing (Baess, Widmann, € ger, & Jacobsen, 2009; Bendixen, SanMiguel, & Roye, Schro € ger, 2012), but active inferencing goes well beyond a Schro pure facilitation of stimulus detection/recognition and perceptual decision-making (Kok, Rahnev, Jehee, Lau, & de Lange, 2012; Summerfield & de Lange, 2014; Summerfield, Trittschuh, Monti, Mesulam, & Egner, 2008). For example, Kiebel, Daunizeau, and Friston (2009) showed how predictions link smaller, faster lower-order elements (phonemes) to form larger, slower higher-order units (syllables), providing a schematic description of implicit hierarchy building via perceptual inferencing (i.e. the update of priors at fast time scales). Furthermore, the extraction of larger units can exploit the sensitivity of simple physiological processes such as repetition suppression to changes in stimulus statistics (Summerfield et al., 2008), so that two successive stimulus tokens are processed as one if the repeated token is predictable (Tavano, Widmann, Bendixen, Trujillo-Barreto, & € ger, 2014). Schro A principled distinction can be drawn between preactivating neural structures to process the content or identity of highly expected, specific linguistic units (e.g. syllabic transitions, word meaning; cf. Gagnepain et al., 2012) and aligning in phase with regard to their predicted onset time (e.g., reflecting the statistical distribution of word boundaries; cf. Giraud & Poeppel, 2012). This Special Issue brings together researchers and experts from the fields of linguistics, cognitive neuroscience and biological psychology with the goal of distinguishing the effects of inherently predictive neural mechanisms (knowing “when”) from the effects of stimulusdriven probabilistic expectancies (knowing “what next”). Inherently predictive mechanisms track the quasi-periodic nature of speech segments over time and account for the

point in time of the next salient stimulus or stimulus feature, while variations in stimulus probabilities can be used to anticipate the identity of upcoming events (Schwartze, € ger, & Kotz, 2012; Tavano et al., 2014). For Tavano, Schro instance, the quasi-periodic opening and closing of the mouth during articulation, together with the concurrent movements of the articulators (e.g. lips) are informative about when a given speech feature is likely to occur (Arnal, Morillon, Kell, & Giraud, 2009; Rothermich & Kotz, 2013). Cortical oscillations, instantiated by the synchronous firing of neuronal assemblies, can phase lock to the slow dynamics of speech envelope (i.e. syllabic rate) during the attentive learning of new stimuli (Luo & Poeppel, 2007), suggesting that oscillatory dynamics could be considered a cortical correlate of perceptual learning, that is, the update of priors at slow time scales (Friston, 2005). Inherent temporal predictions are perhaps best reflected in the modulation of brain processes entrained to frequencies at or near the stimulus rate (Arnal et al., 2014; Lakatos, Karmos, Mehta, Ulbert, & Schroeder, 2008; Lakatos et al., 2013). Specifically, the alignment of neuronal processing to the spectrotemporal characteristics of speech may favor the selection of more relevant parts of the speech stream that co-occur with the peak of oscillatory activity, while less relevant information is attenuated or inhibited (Arnal & Giraud, 2012; Arnal, Wyart, & Giraud, 2011; Giraud et al., 2007; Giraud & Poeppel, 2012). Recent work also highlights the importance of subcortical and subtentorial brain structures to both temporal and featurebased aspects of predictive processing (Malmierca, Anderson, & Antunes, 2015; Schwartze & Kotz, 2013). The contributions included in this Special Issue demonstrate the explanatory power of predictive processes at all levels of the linguistic hierarchy (speech sounds, words, sentences), paralleled by the hierarchical architecture of the brain and the directional message-passing between processing levels (e.g. low-level sensory levels and high-level cognitive levels). The articles present experimental data from healthy and impaired human populations, acquired using cuttingedge brain imaging techniques, amongst them functional Magnetic Resonance Imaging (fMRI), electroencephalography (EEG) and magnetoencephalography (MEG), as well as interruptive methods such as repetitive transcranial magnetic stimulation (rTMS) and transcranial direct current stimulation (TDCS). Section 1 subsumes papers that specifically test timing-related and content-related linguistic predictions. In a sentence such as “Sherry had to read lips because she was…” (from Block & Baldwin, 2010), the final word is most likely “deaf”: its meaning can be predicted from the sentence context and its word class (adjective), which follows the verb (“was”). The comprehension benefit stemming from a highly predictable final word, termed predictability gain, becomes particularly relevant in situations where bottom-up sensory evidence is sub-optimal, as with spectrally degraded stimuli. Previous research has suggested that the angular gyrus in human parietal cortex supports this predictability gain (Obleser & Kotz, 2010). The study by Hartwigsen, Golombek, and Obleser (in this issue) demonstrates for the first time that the left angular gyrus has a causal role in determining a predictability gain during sentence comprehension. In their study with acoustically degraded but highly predictable sentences of the type illustrated above, the authors temporarily

Please cite this article in press as: Tavano, A., & Scharinger, M., Prediction in speech and language processing, Cortex (2015), http://dx.doi.org/10.1016/j.cortex.2015.05.001

c o r t e x x x x ( 2 0 1 5 ) 1 e7

disrupted processing in the left angular gyrus using rTMS and observed a reduction of predictability gain. In other words, disrupting left angular gyrus resulted in impaired performance of predictable sentence comprehension, strongly suggesting that it supplies predictive top-down information in situations of compromised bottom-up sensory signals. Predictability gains may also stem from regular alterations of stressed and unstressed syllables during sentence comprehension. In their article, Kotz and Schmidt-Kassow (in this issue) examined the neural bases of such a temporal predictability gain. The authors presented sentences with rhythmic and/or syntactic expectancy violations to patients with a selective structural damage to the basal ganglia and to a healthy control group. There is evidence that the basal ganglia interact with the thalamus and sensory cortical areas, pacing stimulus perception at long time intervals (

The role of the insula in speech and language processing.

The processing of speech, gesture, and action during language comprehension.

Four central questions about prediction in language processing.

Language and Speech in Autism.

Speech and language defects.

Cross-Language Activation Begins During Speech Planning and Extends Into Second Language Speech.

Noise reduction improves memory for target language speech in competing native but not foreign language speech.

Speech and language support: How physicians can identify and treat speech and language delays in the office setting.

[Specific developmental disorders of speech and language].

[Prosody, speech input and language acquisition].

White matter tracts of speech and language.

Speech-Language Dissociations, Distractibility, and Childhood Stuttering.

Telehealth technology applications in speech-language pathology.

Natural Language Processing for Cohort Discovery in a Discharge Prediction Model for the Neonatal ICU.

Prediction and constraint in audiovisual speech perception.

Speech language pathologists' opinions of constraint-induced language therapy.

"Rate My Therapist": Automated Detection of Empathy in Drug and Alcohol Counseling via Speech and Language Processing.

The roles of family history of dyslexia, language, speech production and phonological processing in predicting literacy progress.

Unilateral hearing loss in children: speech-language and school performance.

A study of developmental speech and language disorders in twins.

Native language affects rhythmic grouping of speech.

Assessment of speech and language skills in children.

Bullying: what speech-language pathologists should know.

Speech behaviour--a foundation of language.