Phonetics & Phonology Forum is a weekly talk and discussion series featuring presentations on all aspects of phonology and phonetics.
It is well-known that temporal patterns in speech occur on multiple timescales, but we understand less clearly how these patterns on different timescales interact with each other. Speech gestures normally occur on a relatively fast timescale. Previous work has shown that there exists a gestural c-center effect in complex syllable onsets (e.g. [spa]), whereby initiation of tongue blade and lip movements associated with [s] and [p] are equally displaced in opposite directions from initiation of the tongue body movement associated with the vowel [Browman, C. and Goldstein, L., Phonetica 45, 140-155. (1988)]. In contrast, speech rhythms occur on a relatively slower timescale. In metronome-driven phrase repetition tasks, rhythmic timing is more variable for higher-order target ratios of intervals between stressed syllables and phrases [Cummins, F. and Port, R., Journal of Phonetics 26, 145-171. (1998)].
An experiment was conducted to investigate how the above gestural and rhythmic patterns interact. Gestural kinematics were recorded using electromagnetic articulometry during a repetition task in which subjects uttered the phrase "take on a spa" to two-beat metronome rhythms of varying difficulty. Significantly greater variability in relative timing of tongue and lower lip movements associated with [s] and [p] was observed with the more difficult and variable target rhythms, which demonstrates that rhythmic and gestural systems interact in a non-trivial way. This means that--in terms of an analogy--the "speech box step" is performed with more temporal variability during the more difficult "speech waltz" than during the "speech rumba".
A dynamical model will be presented that is capable of simulating the observed covariability patterns. Building upon previous work from other researchers, the model treats phrase, foot, syllable, and gestural systems as limit-cycle oscillators, which synchronize through multi-frequency phase-coupling. In the presence of noise, stronger coupling between rhythmic systems results in lower intergestural variability. The model shows how hierarchical temporal patterns involving prosodic and gestural structure can be usefully conceptualized with dynamical systems.
The talk presents a study which investigates the electrophysiological signature for Categorical Perception (CP) of lexical tones. Mandarin level and rising tones and nonspeech pitches counterparts are used. Two time regions of MisMatch negativity (MMN) corresponding to level and rising portion of pitch contours are identified. Only the first region of MMN is modulated by deviant types. It is concluded that 1) CP effect of lexical tones has been observed in the pre-attentive stage, mainly from the level portion; 2) in the same stage, brain activates differently to lexical tones from nonspeech pitches; 3) we attribute all the difference observed to language experience but further studies are needed.
Drawing on evidence from (1) listeners' tolerance for wide phonetic variability in speech perception, (2) the gestural nonidentity of speech sounds, and (3) the language specificity of linguistic phonetics, I will argue that phonetic coherence is based in the linguistic equivalence rather than a (hypothetical) gestural or auditory equivalence of speech sounds. Further, consideration of talkers' and listeners' sensitivity to context suggests that the linguistic frame of reference for speech is highly adaptable and is sensitive to the contexts in which speech is produced.
Previous studies have examined the role that L2 word form plays in production and recall abilities (deGroot, 2006; Ellis and Beaton, 1993). This talk will focus on whether phonological form also affects L2 learners' ability to learn meanings. In particular, we ask whether hard-to-pronounce words, defined as having phones/phone combinations not present in the learner's native language, are more difficult to learn meanings for, and further, if learnability differences are due to interference from difficulty with production or more general representational difficulties.
We exposed participants to Polish word-novel object pairs, some easy- and some hard-to-pronounce, and tested their ability to match words with their meanings. Results showed that upon initial testing, only participants who repeated words aloud showed more difficulty with hard-to-pronounce words. Further experiments showed that this effect may result from forcing learners to attend to the difficulty of these word forms, as the effect could be reproduced with two other means of drawing attention to the words, subvocal repetition and hearing another English speaker repeat the words. In a final experiment, participants were engaged in an articulatory suppression task during learning; these participants also showed more difficulty learning meanings for hard-to-pronounce words, demonstrating that this effect cannot simply be attributed to interference from producing difficult words. The results of this study suggest that more difficult phonological forms lead to weaker representations of the word form, which is more difficult to link with meaning in memory.
It is generally recognized that front-articulated stops are more compatible with voicing than back-articulated stops because the larger oral volume and compliance of front stops allows for longer trans-glottal airflow. Phonological patterns suggest that retroflex stops may be an exception to this pattern and may be more compatible with voicing than their dental/alveolar counterparts. An experiment was conducted in which an artificial leak was created in three speakers, and the effect of the abrupt closure of this leak on voicing was measured for [b d ɖ g]. The results of this experiment show that [b d g] follow the pattern of longer voicing for more front articulations. The retroflex was an exception for two speakers voicing for [ɖ] persisted longer than for [d], and for one speaker [ɖ] voicing persisted longer than for [b d]. We believe that the greater surface area presented by the concave shape of the tongue during retroflexes as compared to dentals/alveolars allows for greater passive cavity expansion (i.e. compliance) and is a possible explanation of the observed pattern.
In addition to the differences found during the prolonged steady-state stops, another factor may favor a longer voicing in natural connected speech: some forward movement of the tongue apex during the retroflex articulation. That is, the movement over time of retroflex sounds may by itself change vocal tract volume during the stop closure.
Ao is a Tibeto-Burman language spoken in Nagaland (far north-eastern India). In addition to presenting our current hypotheses about the segment inventory, phonological processes, and the tonal system, we will discuss the following issues in more depth: the status of word-final glottal stop, VOT distinctions, syllable structure, and variability in the pronunciation of vowels.
A strict reading of exemplar-based models of speech perception and production (e.g. Goldinger 1998) assumes phonetic convergence is a natural cognitive reflex. At first pass, Pickering and Garrod (2004) also adopt the view that convergent speech behavior at all levels — phonetic, syntactic, lexical — is automatic and without social consideration. Research supporting the automatic view has found that phonetic convergence interacts with speaker-independent factors like response latency and word frequency (Goldinger 1998). Recently, Nielsen (2007) found an interaction between convergence and abstract phonological knowledge. On the other hand, Communication Accommodation Theory (CAT; Giles & Coupland 1991) has always maintained that convergent and divergent speech behavior arises from talkers wanting to reinforce socially meaningful differences. Bourhis and Giles (1977) find that convergence and divergent speech patterns predictably occur in certain social situations. Within a more experimental tradition, Namy et al. (2002) find that female participants converged more than male participants in a shadowing experiment. Pardo (2006), however, found male participants to converge more than female participants in a collaborative map task. These results suggests that some aspects of phonetic convergence are affected by social factors.
In this talk I report on an experiment designed to replicate Bourhis & Giles (1977) when solely looking at phonetic behavior. Participants who were native speakers of New Zealand English were either insulted (the negative condition) or flattered (the positive condition) by a speaker of Australian English in the midst of completing a word repetition task. The stimuli consisted of words involved in the ear/air merger and monophthongs from the lexical sets KIT, DRESS, TRAP, BARN, THOUGHT, and STRUT. After the speech production task, participants completed an implicit association task (IAT; Greenwald et al., 1998) that examined implicit biases towards Australia. Thus far, the speech behavior in the monophthongs has been analyzed. Talkers imitated the Australian only in the DRESS and TRAP vowels in the shadowing block. These two vowel sets involve some of the greatest differences between Australian and New Zealand English, but do not correspond to the most salient accent differences (Bayaard, 2000). The experimental condition did not influence the results. Participants' scores on the IAT did, however, correlate with the degree of convergence. Participants were most likely to imitate the Australian talker if their IAT score reflected a positive association with Australia. This correlation was apparent in both the shadowed and post-task blocks. The results of this experiment have implications for how social cognition should be reflected in exemplar theories of speech production and perception.
Cross-linguistically, geminates tend to occur in certain kinds of phonetic environments: e.g. intervocalically, and after a short stressed vowel (Thurgood 1993). They also show cross-linguistic preferences in manner of articulation: low sonority and voiceless geminates are encountered more frequently than voiced and sonorant geminates (Podesva 2000, 2002). These principles are categorical in some languages, while in others they emerge as statistical tendencies. In Russian, geminates adjacent to stressed vowels are less likely to degeminate than those adjacent to unstressed vowels. Among geminates adjacent to stressed vowels, the ones preceded by stress are less likely to degeminate than the ones followed by stress. Consonants degeminate less often if they are word-initial compared to word-final. Intervocalic geminates are more protected from degemination than pre-consonantal geminates. Among different manners of articulation, stops and fricatives degeminate less often then nasals and liquids (Kasatkin and Choj 1999, Dmitrieva 2007).
Such uniformity in geminate distribution and typology across languages warrants an explanation. In this study, I explore the hypothesis that certain phonetic environments and manners of articulation provide articulatory and perceptual advantages for the realization and preservation of the contrast between geminates and singletons. Previous researchers have suggested that the lower ratio between geminate and singleton duration may lead to potential contrast neutralization (Aoyama and Reid 2006, Blevins 2004). I propose that it is not only the relative durations of the geminates and singletons that determine the quality of the contrast, but also the location of the perceptual boundary between the two categories. Based on experimental evidence from Russian, I show that just as the durations of the geminate and singletons varies depending on the phonetic environment and the type of consonant, so does the location of perceptual boundary. Moreover, the placement of the perceptual boundary in relation to the average duration of the geminates and singletons is crucial in predicting the amount of articulatory effort necessary to realize the contrast and the likelihood of a perceptually driven neutralization of the contrast. The prediction that geminates that are favored cross-linguistically will emerge as the easiest to maintain and the least susceptible to neutralization was supported in the majority of the cases. The results show that this method of evaluating the quality of the contrast is more reliable than the previously proposed geminate/singleton ratio and can provide an explanation for certain cross-linguistic universals in geminate distribution and typology.
Lenition is commonly understood as an articulatorily driven phenomenon, with lenited forms involving less articulatory effort than unlenited forms. Although intuitively plausible, this line of reasoning remains somewhat speculative because of the lack of experimental methods that are able to test articulatory effort directly. In addition, other potential sources of phonetic grounding for lenition (such as perception and neutralization avoidance) have been neglected. This paper examines the spirantization of intervocalic voiced stops in the light of these latter two types of phonetic pressure.
The perceptual experiment reported here tests whether the relative perceptibility of the segments involved in spirantization can account for the typology of lenition, given the framework of the P-map. The results show that perception is able to explain some, but not all, of the typological facts. Data from a cross-linguistic survey of lenition shows that a language's segment inventory interacts with its potential to spirantize, but that the effects of inventory are not enough to fill the gaps left by a perceptual account. I conclude that articulation may very well play a role in lenition, especially in the tendencies left unexplained by perceptual or systemic pressures. However, to the extent that perception *does* explain patterns of lenition, we must obtain more direct evidence of the articulatory effort involved before positing it as an explanation for the same facts.
What is the role of the higher CNS (cortex, cerebellum, thalamus, basal ganglia) in controlling speech? Recent models of non-speech motor control would suggest the higher CNS monitors and controls the dynamic state of the articulators — i.e., articulator positions, velocities, or any other information needed to predict their future behavior in the current task. Such state information is not directly or instantly available from sensory feedback, and is instead hypothesized to be estimated within the higher CNS by a prediction/correction process where it (a) predicts the articulators' next state based on efference copy of the previous articulatory controls, (b) compares incoming sensory feedback with feedback expected from the predicted state, and (c) uses the difference to correct its state prediction. It then generates new articulator controls based on this corrected state estimate. Models based in this way on state feedback control (SFC) have been especially successful at predicting the ways people flexibly optimize their movements for different tasks, but they have received little attention thus far in speech motor control research. In this talk, I will describe an SFC model of speech motor control and compare it will Guenther's well-known DIVA model of speech motor control.
In this study, two children's speech data are examined for the development of phonological neighborhoods in the third year of life. The analysis shows that neighborhood density increases over time, but it is not necessarily the case that children acquire words from dense neighborhoods in adult lexicon first. Moreover, in the initial stage of acquisition, words that are added to the lexicon are from denser neighborhoods than words that are already acquired, but after a certain stage, as the backbone of the lexicon is formed, the trend is reversed. Our analysis suggests that several different forces are at play in early acquisition, leading the development of phonological neighborhoods in different directions.
This presentation investigates the contribution of phonetic information to motor planning through a speech adaptation experiment. As they spoke, subjects heard their voices through a pair of headphones. The experiment proceeded in four stages: (1) subjects' auditory feedback was unaltered; (2) subjects' feedback was slowly altered up to a set maximum; (3) Feedback alteration was held at that maximum shift; (4) subjects' feedback was returned to normal. Previous work demonstrates that when auditory feedback is shifted, subjects change their speech to oppose the feedback shift (e.g., Burnett, Freedland, Larson & Hain, 1998; Houde & Jordan, 2002, Purcell & Munhall, 2006).
Here, we explore the relative importance of F0, F1, and F2 to the representation of 'head'. Subjects participated in three speech adaptation experiments -- one for each formant -- on three different days. Formants were altered in randomized order, and all data was analyzed relative to a control condition where the subject went through the experiment without auditory feedback alteration.
Several trends emerge from the data. First, compensation is never complete. A subject might lower his or her F1 by 50 Hz in response to a total F1 feedback increase of 100 Hz . Second, compensation is more complete for small feedback shifts than for large feedback shifts, even when shifts are smaller than the baseline vowel space. Third, compensation is more complete for F1 shifts than for F2 shifts. Fourth, speakers appear to be tracking formant ratios rather than absolute values of formants.
These results support theories of speech production that incorporate both acoustic and sensorimotor feedback. Because auditory feedback is altered while motor feedback is not, feedback from these two sources can conflict. For small shifts in auditory feedback, the amount of potential conflict is small and the normal motor feedback does not affect compensation. But for large shifts in auditory feedback, the amount of conflict is large. Abnormal acoustic feedback pushes the articulatory system to compensate, and normal motor feedback pushes the articulatory system to remain in its current configuration, damping the compensatory response.
Many previous researchers have examined factors influencing the way in which phonological categories from a first (L1) and second (L2) language are linked, particularly in loanword adaptation and L2 acquisition (e.g. Hyman 1970, Flege 1987, Best 1994, Kang 2003, Kenstowicz 2003, Peperkamp and Dupoux 2003, Broselow 2004, LaCharité and Paradis 2005, Yip 2006, Heffernan 2007). One question that remains is how cross-linguistic "equivalence classification" between categories occurs when it is not just one L2 sound, but two L2 sounds that stand to be assimilated to an L1 sound. One L2 sound can be assimilated, but what about the other one? In this case learners must create at least one new category if they are to preserve a contrast between the two L2 sounds.
I address the question of novel L2 category formation by examining the acquisition of the three-way Korean laryngeal contrast among lenis, fortis, and aspirated stops by 27 native speakers of American English having no previous experience with the language. Here I report results from an imitation experiment in which learners repeated a two-dimensional continuum of Korean syllables differing in the primary cues to the Korean laryngeal contrast (cf. Kim 2004, among many others): voice onset time (VOT) and fundamental frequency (f0) onset. How quickly do learners develop the extra category they need to effect a three-way laryngeal contrast?
Acoustic analyses of speakers' productions indicate that after three weeks (= 39 hours) of immersion instruction, novice learners generally still command only two categories (short-lag VOT and long-lag VOT), while native speakers make use of three categories: short-lag VOT (fortis); medium/long-lag VOT + low f0 onset (lenis); and long-lag VOT + high f0 onset (aspirated). However, the most striking difference between these two groups is that for novice learners, productions within each cluster of responses show much better correspondence to the parameters of the stimuli than is the case for native speakers: without strong categorical scaffolding for L2, novice learners continue to attend to L2 speech at a phonetically detailed level, while native speakers are predisposed to simply categorize what they hear and disregard phonemically irrelevant aspects of the signal.
These results are consistent with one of the fundamental postulates of Flege's (1995) Speech Learning Model – that adults retain, rather than lose, the perceptual mechanisms used in learning their L1 sound system. Where novice learners in this study seem to come up short is in the move from perception of details to abstraction to categories. These findings suggest that, left to its own devices, L2 category development is rather slow, and that explicit phonetic instruction is probably required to achieve the formation of a new L2 category that is similar in structure to that of native speakers (cf. Catford and Pisoni 1970).
Participants in Linguistics 210 will present articles from the July, 2008 issue of Journal of Phonetics, which is a special issue titled "Phonetic Studies of North American Indigenous Languages", edited by Joyce McDonough and Doug Whalen.