Phonetics & Phonology Forum is a weekly talk and discussion series featuring presentations on all aspects of phonology and phonetics.
This study addresses the issue of individual variation in mental representations of speech sounds and considers how this variation might contribute to sound change. Ohala (1981) proposed that hypo-corrective misperception could occur when a listener fails to employ compensation for coarticulation. However, findings from later studies (Beddor and Krakow, 1999; Harrington et al, 2008) suggest that there could be yet another source of 'apparent' hypo-correction—namely, individual variation in category boundaries and compensation norms.
Against this background, Kataoka (2009) reported, among others, that speakers of American English exhibited a systematic variation in identification of the vowels taken from /i/-/u/ continua in that a group of listeners ('Fronters') who had the /i/-/u/ category boundary closer to the /i/-end than the rest of the listeners ('Backers') in one condition consistently had it this way in other three conditions.
In this talk I will present the results of the repetition experiment with the same listeners, in which the listeners were asked to listen to the vowel stimulus from the /i/-/u/ continuum either in the [d_t] or [b_p] context or in isolation and repeat the vowel. The results show that: 1) the listeners repeated the vowel more faithfully when the stimuli were presented in isolation than in [C_C] contexts; 2) for ambiguous vowel stimuli, repeated vowels had lower mean F2 when stimuli were in [d_t] context than in [b_p] context; and 3) Fronter's category boundary was closer to /i/-end than Backer's boundary (from /bVp/ stimuli but not from /dVt/ stimuli). However, in all cases the differences were small and none reached to the statistical significance. Plans for re-running of the same experiment with the improved stimuli and the theoretical implications will be discussed.
In the early 1980s I made the claim that Gokana, an Ogoni (Niger-Congo) language of Nigeria, does not organize its consonants and vowels into syllables. This was a radical and, in principle non-welcome position, given the centrality of the syllable in almost all phonological work at the time. Still, as Dick Hayward pointed out many years later, my extensive treatment of Gokana barely caused a ripple (and I may be exaggerating):
Hyman's account of the Nigerian language Gokana and in particular his well-argued claim that Gokana represents a case where invocation of the syllable buys nothing insightful for explaining the phonology of the language should have disturbed profoundly the settled orthodoxy surrounding the universality of the syllable. That a vowel (the quintessential syllable nucleus) is not guaranteed syllable membership is a very strong proposal, but one has little sense that it has attracted overmuch comment.…In my view it would be unfortunate if Gokana were to be regarded simply as an interesting oddity, rather than as the limiting case in a clinal situation in which many languages may participate to some degree in the course of their phonologies. (Hayward 1997:78)
While no one responded to the claim of no syllables in Gokana, the proposal of Hyman (1983, 1985) to establish moras as a central building block in phonology did gain currency, and was particularly welcome by specialists of Japanese, long viewed as exclusively moraic in its prosodic structure. Since that time work on the syllable has gone in opposite directions: While Kubozono (2003) has presented evidence that the syllable may in fact play a role in Japanese, scholars such as Steriade (1999) and Blevins (2003) have argued that the syllable is less needed elsewhere, e.g. to account for phonotactic constraints and, more recently, certain rhythmic effects (Steriade 2009). It seems that the status of the syllable is thus once again up for grabs, as has been the case in its rocky "on-again, off-again" past.
In this talk I take a new look at the Gokana facts and my original claim to ask the question in my title, motivated in part by a possible indirect (but ambiguous) piece of evidence which I have now discovered 25+ years later (I'm slow sometimes!). The talk will end by situating the issue within the context of recent discussions of universals vs. diversity (Evans & Levinson 2009), with my claim that Gokana and English are at the opposite ends of the "clinal situation" which Hayward suspected in the above quote.
Blevins, Juliette. 2003. The independent nature of phonotactic constraints: an alternative to syllable-based approaches. In Caroline Féry & Ruben van de Vijver (eds), The syllable in optimality theory, 375-403. Cambridge University Press.
Evans, Nicholas & Stephen C. Levinson. 2009. The myth of language universals: Language diversity and its importance for cognitive science. Behavior and Brain Sciences 32.429-492 (including commentaries and response).
Hayward, R. J. 1997. External sandhi in the Saho noun phrase. Afrikanistische Arbeitspapiere 50.53-80. Institut für Afrikanistik, Universität zu Köln.
Hyman, Larry M. 1983. Are there syllables in Gokana? In Jonathan Kaye, Hilda Koopman, Dominique Sport-iche & André Dugas (eds), Current approaches to African linguistics (vol. 2), 171-179. Dordrecht: Foris.
Hyman, Larry M. 1985. A theory of phonological weight, 26-32. Dordrecht: Foris. (Reprinted, 2003, Stanford: CSLI).
Kubozono, Haruo. 2003. The syllable as a unit of prosodic organization in Japanese. In Caroline Féry & Ruben van de Vijver (eds), The syllable in optimality theory, 92-122. Cambridge University Press.
Steriade, Donca. 1999. Alternatives in syllable-based accounts of consonantal phonotactics. In Osamu Fujimura, Brian Joseph, and Bohumil Palek (eds), Proceedings of LP 1998, v.1, 205-246. Prague: Charles University and Karolinum Press.
Steriade, Donca. 2009. Units of representation for linguistic rhythm. Sapir Lecture, LSA Linguistics Institute, UC Berkeley, July 6, 2009.
Little is known about the acoustics or the articulatory gestures underlying children's early phonological and morphological representations, and how these develop over time. Yet this could help inform our understanding of when and how knowledge of language emerges. In this talk I present preliminary findings on several aspects of this issue. The first part of the talk shows that mothers make systematic differences in their use of some landmark cues to voicing and place contrasts, and that these diminish over time, suggesting early enhancement of acoustic cues to feature contrasts in early child-directed speech. Children have exhibit some of the same feature contrasts as mothers at 1;6, though others were not fully acquired by 2;6. The second part of the paper explores these issues further with respect to 2-year-olds' acquisition of the plural morpheme. Here we find that acoustic cues to the morpheme are largely intact, despite cluster simplification, especially in utterance medial contexts. Finally, we present preliminary data from ultrasound recordings of a 2-year-old, showing that the articulatory gestures for lexical vs. morphemic final clusters differ, despite the content of the cluster being the same. These findings are discussed in terms of some of the articulatory/planning mechanisms that may underlie children’s early grammars, and the implications for understanding the nature of children’s phonological and morphological development more generally.
Across the world's languages, /θ/ is a rare phoneme, occurring in fewer than 5% of languages (UPSID, Maddieson & Precoda 1990). In addition to its rarity cross-linguistically, it is a volatile sound in English often undergoing patterns of stopping to [t] or fronting to [f], particularly in coda position (Dubois & Horvath 1998, Wells 1982). The vulnerability of /θ/ has been claimed to be due to articulatory difficulty (Wells 1982) and to its perceptual similarity with /f/ (Labov et al. 1968). Articulatory difficulty is an unlikely reason as Edwards & Beckman (in press) find that in Greek, where /θ/ occurs more frequently, it is acquired earlier. However, /f/ and /θ/ are highly perceptually confusable (Miller & Nicely 1955). This is likely due to their spectral similarity (Tabain 1998) and consequently listeners rely heavily on formant transitions to identify them (Harris 1985); listeners have been found to make use of semantic and visual information as well (Jongman et al. 2003). This latter aspect may help account for a notable asymmetry: the sound change /θ/ > /f/ is common, while /f/ > /θ/ is rare or nonexistent. We propose this is due to asymmetries in visual cues and cross-talker cue variability in /θ/ production. Support for this hypothesis comes from a series of four experiments that explored the weighting of audio and visual cues in /f/ and /θ/ identification using stimuli from multiple talkers.
The stimuli consisted of recordings of ten talkers (5 = male) producing /f/ and /θ/ in CV, VC, and VCV contexts in /i a u/ environments. Audio and video were recorded separately, but in the same session. Participants were assigned to an Audio Condition (A; n = 27), a Video Condition (V; n = 16), an Audio-Video Condition (AV; n = 16), and an Audio Cross-spliced Condition (AC; n = 16). In each they participated in a 2AFC classification task blocked by talker. Sensitivity (d') was calculated on the identification data, and analyses using reaction time were conducted on the correct responses. To briefly summarize the results, /u/ increased sensitivity to the contrast, likely due to lip-rounding exaggerating the differences in formant transitions. In the V Condition, /u/ led to decreased sensitivity likely because the lip-rounding obscured the visibility of the tongue gesture. In the AV Condition, /u/ was again facilitative; this indicates that listeners weighted the auditory cues more highly than the visual cues in identification. A trained phonetician coded the presence vs. absence of a visible tongue gesture for each token. This measure was highly correlated with sensitivity in the V Condition (Spearman = 29.2, p < 0.01, rho = 0.82), but was not significant for the AV Condition. These results suggests that listeners use visual cues when necessary, but that they weight auditory cues more highly when discriminating speech sounds. Overall our findings indicate that /θ/ identification is more variable, both in visual and audio-only conditions, which may contribute to its volatility across time.
Speakers encountering long-lag voice onset time (VOT) for the first time in their second language (L2) produce VOTs between their L1 and L2 values. Native-like long-lag productions are conditioned by speaker proficiency factors such as age of acquisition and experience, showing significant production differences between late bilinguals, early bilinguals, and native L2 speakers. Thus far, analyses have focused on mean VOTs across speaker groups. The current study investigates the full distributional properties of VOT in bilinguals (e.g. variance, skewness) in addition to means, providing a more informative picture of bilingual acquisition. VOT production data were collected from French-English bilinguals (age of English onset 0-15 years) and submitted to a distribution-based analysis. Results show that while speaker groups differ predictably in mean VOT, there is a surprising amount of overlap between speaker groups and behavioral differences other than mean VOT. New behavioral differences lead to new hypotheses about the representation of target L2 values. Ultimately, these results may fill in some missing blanks between perception and production, and suggest, for example, that differences in perceived accent and comprehensibility diverge due to the degree of overlap with native L2 VOTs.
Rapid phonetic and phonological change is a pervasive characteristic of endangered languages, and is especially prevalent when these languages are primarily learned by adults. There are three proposed accounts for change: transfer effects from learners’ first languages, internally motivated regularization towards universally unmarked features, and intensification of socially salient features. Transfer effects are expected because they are commonly seen in second language (L2) learning, and may be especially prominent in endangered languages, where learners have little access to fluent speakers and few opportunities for feedback. However, Cook (1995) contends that transfer is not the primary cause for change in endangered languages, but argues instead that changes are internally motivated and result in a process of grammatical regularization. Both of these accounts are acknowledged by Wolfram (2002), who makes the additional claim that external social factors have an effect on endangered languages, with some socially salient features intensified by language learners.
This paper presents research examining these accounts in the context of the possible effects of adult learning on Oregon Northern Paiute, an endangered Uto-Aztecan language of the Western Numic branch. In addition to demonstrating support for the three accounts described above, the findings suggest a fourth possibility for language change: transfer of socially salient features from another non-related but geographically close endangered language. Fluent speakers' reactions to non-speakers' productions are also presented and discussed in an examination of which changes will likely result in perceivably accented speech.
Cook, E. (1995). Is there convergence in language death? Evidence from Chipewyan and Stoney. Journal of Linguistic Anthropology, 5(2), 217-231.
Wolfram, W. (2002). Language death and dying. In J.K. Chambers, P. Trudgill, & N. Schilling-Estes (Eds.), The handbook of language variation and change (pp.764-787). Malden, MA: Blackwell Publishers.
Much research in second-language (L2) speech has investigated how L2 learners acquire laryngeal categories that differ from the laryngeal categories of their first language (L1), but most of this work has concentrated on languages with two laryngeal categories that differ between L1 and L2 in terms of the same primary cue: voice onset time, or VOT (French and English: Caramazza et al. 1973, Flege 1987; Spanish and English: Flege & Eefting 1988; Italian and English: Flege et al. 1995; Portuguese and English: Major 1996). In the present study, I examine how L2 learners come to produce a laryngeal contrast that requires the use of a second phonetic dimension in addition to VOT—namely, the three-way Korean laryngeal contrast among lenis, fortis, and aspirated stops, which in initial position differ primarily in terms of VOT and fundamental frequency (F0) onset (cf. Han & Weitzman 1970, Kim 2004, inter alia). How do learners use (or not use) onset F0 in conjunction with VOT to realize this three-way contrast?
In a five-week longitudinal study, 26 adult native speakers of American English taking intensive beginning Korean classes in South Korea completed a reading task in Korean in which they pronounced word-initial lenis, fortis, and aspirated stops in a low vowel context. Results of acoustic analyses show that while most learners are eventually successful at producing a full three-way contrast, there is wide variation in the way in which they produce it. Whereas learners’ Korean teachers generally produce the contrast in a manner consistent with the literature on Korean (which shows a trading relation between VOT and F0 for the lenis and aspirated stops), learners themselves come to produce the contrast in several different ways. One group (n=9) ends up distinguishing the three categories by producing two one-way contrasts—some learners contrasting lenis and fortis stops in terms of F0 and lenis and aspirated stops in terms of VOT, and other learners distinguishing between lenis and fortis stops on the basis of VOT and between lenis and aspirated stops on the basis of F0. In another group (n=6), learners do not use F0, and instead contrast the three laryngeal categories solely in terms of VOT. Other learners show still different patterns of contrast realization, occasionally using both dimensions at the same time.
In this paper, I describe the range of variation in phonetic spaces that learners construct for this novel laryngeal contrast, show how these differ from the results of cross-linguistic perception studies on English speakers hearing Korean (cf. Francis & Nusbaum 2002, Schmidt 2007), and conclude that a perseverative kind of "equivalence classification" (Flege 1987, 1995) plays a large role in how learners (given no explicit instruction on how to produce the L2 laryngeal contrast) link L2 laryngeal categories to L1 laryngeal categories.
Caramazza, A., Yeni-Komshian, G. H., Zurif, E. B., & Carbone, E. (1973). The acquisition of a new phonological contrast: The case of stop consonants in French-English bilinguals. Journal of the Acoustical Society of America, 54, 421–428.
Flege, J. E. (1987). The production of "new" and "similar" phones in a foreign language: Evidence for the effect of equivalence classification. Journal of Phonetics, 15, 47–65.
Flege, J. E. (1995). Second language speech learning: Theory, findings, and problems. In Speech Perception and Linguistic Experience: Theoretical and Methodological Issues in Cross-Language Speech Perception, edited by W. Strange, 233–272. Baltimore, MD: York.
Flege, J. E. & Eefting, W. (1988). Imitation of a VOT continuum by native speakers of English and Spanish: Evidence for phonetic category formation. Journal of the Acoustical Society of America, 83, 729–740.
Flege, J. E., Munro, M. J., & MacKay, I. R. A. (1995). Effects of age of second-language learning on the production of English consonants. Speech Communication, 16, 1–26.
Francis, A. L. & Nusbaum, H. C. (2002). Selective attention and the acquisition of new phonetic categories. Journal of Experimental Psychology: Human Perception and Performance, 28, 349–366.
Han, M. S. & Weitzman, R. S. (1970). Acoustic features of Korean /P, T, K/, /p, t, k/, and /ph, th, kh/. Phonetica, 22, 112–128.
Kim, M. (2004). Correlation between VOT and F0 in the perception of Korean stops and affricates. In Proceedings of the 8th International Conference on Spoken Language Processing (INTERSPEECH-2004), 49–52. Jeju Island, Korea: International Speech Communication Association.
Major, R. C. (1996). L2 acquisition, L1 loss, and the critical period hypothesis. In Second-Language Speech: Structure and Process, edited by A. James & J. Leather, 147–159. Berlin: Mouton de Gruyter.
Schmidt, A. M. (2007). Cross-language consonant identification: English and Korean. In Language Experience in Second Language Speech Learning: In Honor of James Emil Flege, edited by O.-S. Bohn & M. J. Munro, Chapter 11, 185–200. Amsterdam, The Netherlands: John Benjamins Publishing.
Language-specific patterns of gestural coordination have been claimed to be organized around achieving PARALLEL TRANSMISSION, by "choosing" optimal systemic gestural overlap settings that maximize overlap while maintaining recoverability of the overlapped speech gestures (Liberman et al. 1967, Mattingly 1981, Wright 1996). While parallel transmission has largely been discussed as a synchronic goal of language structure, like all substantive constraints it could also logically emerge through conditions on sound change. In this talk I argue for the latter source, based on evidence from overlap and assimilation of labial codas in Korean, a case that I claim involves a synchronically non-optimal balance between overlap and gestural recoverability. I argue that, as in other cases of unnatural processes in phonology, the principle behind both the exceptions and the rule becomes clear when considered in a diachronic context.
In Korean, the issue of non-optimal gestural overlap settings comes down to the differential behavior of codas in /p.k/ and /k.p/ clusters: labial codas are highly overlapped and optionally assimilate in /p.k/, but dorsal codas maintain low overlap and do not assimilate in /k.p/ (Son 2008). Hypothesizing that this pattern was driven by a general diachronic pressure towards greater overlap in CC clusters, I ran a series of experiments to investigate the perceptual effects of overlap variability in /p.k/ and /k.p/. The results indicate that labial VC formant transitions are quite stable and recoverable across all levels of overlap in /p.k/, but dorsal transitions are "quantal", becoming abruptly labial-like at moderate levels of overlap in /k.p/. I propose that the Korean pattern can be explained by a constraint against sound changes that are perceptually non-gradual, thus resisting the overlap drive in /k.p/. In /p.k/, however, shortening closure durations lead gradually to confusability with singleton /k/, feeding labial coda assimilation in spite of strong VC formant transitions.
In order to minimize speech errors, talkers use feedback from multiple sources to make online adjustments to their articulatory plans. It is possible to induce such adjustments experimentally by altering the auditory feedback that the speech motor control system receives. Speakers are known to compensate for even subtle alterations in F0, F1, and F2.
Individual variation in responses to auditory feedback shifts suggests that top-down knowledge from a speaker's phonological and lexical systems might be involved in the otherwise low-level adjustment process. This study compares compensatory responses to altered auditory feedback in two parts of the vowel space in order to measure the effects of phonological and lexical neighbors on speech motor control.
Rather than being a report of research done (aside from library research), I am intending this talk to be a something of a discussion of research that is still in the planning stages and to invite critical comments from the Phorumanians. There is evidence that one way voiced stops can overcome the aerodynamic voicing constraint (AVC) is to enlarge the pharynx, exposing more compliant surface area to the impinging oral pressure and thus keeping the oral pressure sufficiently below subglottal pressure. This can have the effect diachronically of causing following vowels with some pharyngeal constriction to shift to front vowels (in the N. Sarawakan languages, in Cambodian, and, arguably in earlier Armenian via Adjarian’s Law). In other languages, voiced stops have induced ATR (advanced tongue root) harmony (e.g., in certain Akan languages). ATR, like front rounding, vowel nasalization, voice quality, etc. are, I would maintain, “second class features” (okay, I need a better term), that only arise distinctively as a result of the phonologization of a secondary cue from “first class features”, like voicing or nasal consonants. The implication is that all ATR harmony had to arise from something else and I speculate that this was voiced stops. In some languages, ATR harmony is so ancient that it may not be possible to verify this. I will also present some speculations on how the link between voiced stops and ATR could have implications for the reconstruction of the Proto-Indo-European stop system.
Typologies of sound change have mainly drawn either a two-way distinction between articulatorily and perceptually grounded changes (neogrammarians, Bloomfield, Kiparsky) or a three-way distinction among perceptual confusion, hypocorrective changes, and hypercorrective changes (Ohala, Blevins). We seek to develop a typology of asymmetric sound change patterns based on biases emerging from speech production and perception: biases in motor planning, gestural mechanics (including gestural overlap and gestural blend), and perceptual planning; we suggest that most asymmetric sound change types emerge from biases in motor planning and gestural mechanics. Finally, we sketch some features of a theory linking speech production and perception biases to the emergence of new speech norms.