Phonetics & Phonology Forum is a weekly talk and discussion series featuring presentations on all aspects of phonology and phonetics.
Selectively processing task-relevant stimuli while ignoring irrelevant stimuli is critical for producing goal-directed behavior. To understand the mechanisms involved in selective attention, we used electrocorticography in epilepsy patients to track the spatiotemporal pattern of event-related high-gamma cortical activation during phonological target detection task. Simultaneous monitoring of multiple areas in the lateral hemisphere revealed a highly ordered, but overlapping temporal progression of phasic activity across the lateral cortex surface in the following sequence: 1) speech-sound-specific sensory processing in the posterior superior temporal gyrus (STG) and superior ventral premotor cortex, 2) task-dependent processing ventrolateral prefrontal cortex (PFC), 3) action planning in the inferior parietal lobule and ventral premotor cortex, and 4) motor response execution and proprioception in the hand sensorimotor cortex. STG activation was modestly greater for target stimuli during active behavior than passive listening, representing sensory selectivity under general attentional modulation. In contrast, PFC was mostly target-selective and highly enhanced by specific task demands, supporting a role in guiding behavioral relevant processing. These results demonstrate the utility of high gamma cortical activity as a powerful tool to evaluate the sensory, cognitive, and motor processes underlying everyday goal-directed human behavior.
The f0 peak sometimes occurs after the syllable with which it is associated, and the peak alignment varies, depending on several factors such as lexical tone target, neighboring tones, focus, and so forth. This study investigates the effect of weak tones on the alignment of f0 peaks with three tone types (i.e., H, M, and R) of South Kyungsang Korean, spoken in the southeastern part of Korea. When three tone types are followed by one or two syllables unstressed suffixes, R was found to have the maximum amount of peak delay and M was found to have the minimum amount, i.e., the peak came in the second syllable, following the R-toned syllable, but the peak came in the syllable following the H-toned syllable. This peak delay was not found for M. Thus, it is argued that the tone alternation patterns in suffixed words are not random; rather, they systematically reflect the phonetic implementation of each tonal target. For example, the peak is in the final portion of a syllable in R, and it takes more time for the peak to be fully realized. This effect is clearly implemented when the following tone is weak as in suffixed words, while it is hardly realized in word final position as in unsuffixed words.
A network of speech motor areas in frontal cortex can be identified based on lesion-deficit correlations, cortical stimulation studies, and functional neuroimaging. In this talk I will describe this network and present some recent studies on the roles of these regions in speech perception and production.
In the first part of the talk, I will discuss the controversial idea that speech motor areas play a role in speech perception. I will present two functional MRI studies and one repetitive transcranial magnetic stimulation study which support this hypothesis. Based on a review of the literature, it appears that the role for motor regions may be limited to perception in "degraded" acoustic conditions, which is facilitated by an additional top-down source of information (i.e. motor representations).
In the second part of the talk, I will present our current work in progress on speech production. We are attempting to identify brain areas that are differentially recruited as articulatory complexity increases. Our goal is to study the functional status of these regions in patients with apraxia of speech due to neurodegenerative conditions or stroke.
This will be a trial run of a presentation I will make in Madrid on March 3 as part of Fonhispania 2009, where six of us have been invited to a workship entitled "New Approaches to the Phonetics-Phonology Interface" to inaugurate a new phonetics and phonology laboratory:
As can be seen from the list of invitees, a range of views will be represented. My own talk will focus on the symbiosis of phonetics and phonology, but start from the following observations concerning the current state of phonology, which I describe as:
* diverse, disjointed, unclear boundaries, disparate goals
* a lot of introspection, stock-taking, critiques, healthy diversity of views and agendas
* largely oriented towards the surface due to optimality theory and technology
* cutting-edge research tends to be experimental, instrumental, quantitative, computational
* increasing rejection of the the basic concepts and methodologies of the structuralist-generative heritage, ultimately denying that phonology is anything like we used to think
Given the above trends, one could justifiably ask: What does a traditional phonologist such as myself have to offer a new phonetics-phonology laboratory?
After establishing what is meant by "traditional phonology" I consider two questions concerning the phonetic vs. phonological properties of pre-vocalic nasal + voiced consonants (NDV):
* the question of why NDV devoices to NTV in the Sotho-Tswana subgroup of Bantu, where an in-progress phonetic investigation by Maria Josep Solé et al on Shekgalagari confirms this allegedly "anti-phonetic" process.
* the question of why NDV has variable behavior with respect to pitch-lowering effects in various African languages, sometimes patterning with the voiced obstruents /b, d, g/ as F0-depressors, sometimes not.
The discussion of these examples will establish that both traditional phonological and phonetic analyses are necessary to resolve such questions. While traditional phonology has centered around the development and application of theories and methodologies to help gain insight into the nature of phonological systems as they function in a grammar, many who deny structural phonology do so either because of differing agendas or because they wish to look at speech sounds at a different "level". One question I raise is whether phonologists have become too literal in approaching linguistics as a branch of cognitive science. I intend for my talk to be provocative, but also to present the above two real examples concerning NDV, where rigorous (structural) phonological and (instrumental) phonetic analyses must cooperate if we wish to understand what is a possible phonological system--and why.
Listener's identification of speech sounds are influenced by both perceived and expected characteristics due to the influence of surrounding sounds. For example, a vowel ambiguous between /i/ and /e/ is heard more often as /e/ when the precursor sentence has low F1 but it is heard as /i/ when the precursor has high F1 (Ladefoged & Broadbent 1957), and a greater degree of vowel undershoot is perceptually accepted in fast speech than in slow speech (Lindblom & Studdert-Kennedy 1967). Later, Ohala and Feder (1994) showed that American listeners judge a vowel stimulus which is ambiguous between /i/ and /u/ more frequently as /u/ in alveolar context than in bilabial context, and do so both when the context is provided as acoustic signal and when it is cognitively "restored" under the condition that the acoustic signals for contexts were replaced with white noise. In this talk I will report the results of a perception experiment with native speaker of American English, aiming to extend Ohala & Feder's study with additional measures to reveal the locus of perceptual compensation in the human speech processing system.
In the experiment listeners identified words containing a vowel test sound in a /bip/ to /bup/ continuum and /dit/ to /dut/ continuum. The stimuli were presented in the following conditions: 1) with or without a precursor phrase before each word, and 2) fast, medium or slow speech rate. The continuum's phoneme boundary was shifted in a manner consistent with a perceptual compensation for the alveolar context that causes fronting of /u/, but not in the same magnitude across the conditions. The degree of boundary shift was non-significant when the stimuli were presented in isolation, while it became significant when the same stimuli were presented with precursor phrase. Further, greater degree of boundary shift was observed when the speech rate of whole sentence (precursor and target stimuli) was increased. In addition, for an ambiguous stimulus, faster reaction time (RT) was observed in alveolar context than in bilabial context in majority of the conditions.
That the perceptual compensation occurs but the degree of compensation varies across conditions suggest that listeners might use both cognitively based categorical compensation and mechanically based gradient compensation. Further, fairly consistent consonant effect on RT seems to indicate that perceptual compensation might facilitate phoneme decision making as well as shifting the category boundary. The results of the current study will shed some light on how human listeners benefit both bottom-up processing by using the surrounding acoustic signals and top-down processing using the acoustic images of the speech sounds which are stored in long-term memory. The theoretical implications regarding the relationship between speech production and speech perception will be discussed.
Spontaneous phonetic imitation – the phenomenon where interacting talkers come to be more similar-sounding – may be an important mechanism in dialect convergence and historical sound change. Recent research has been concerned with whether spontaneous imitation is an automatic (and hence unavoidable) process, or whether it is mediated by social factors (e.g., Giles & Coupland, 1991; Goldinger, 1998; Pickering & Garrod, 2004; Pardo, 2006). In this talk I briefly present the "clean" results from a project that investigates phonetic imitation of vowels. The results show that talkers convergence on the first and second formants of the model talker in the task, but that not all vowels are imitated to a significant degree. In this study of American English, only the low vowels /a/ and /æ/ exhibit strong convergence effects.
What will make this talk different from previous versions is that I will also reveal the ugly data from the crossed audio-visual conditions where the Black talker's image is presented with the White talker's voice and vice versa. The pattern of imitation from these conditions does not reflect participants' behaviors in the natural conditions. In the White voice/Black image condition participants exhibit imitation earlier than in the other conditions and the degree of imitation lessens across shadowing blocks (as opposed to increasing across blocks in the natural conditions). In the Black voice/White image condition participants diverge in the first shadowing block, only to slowly imitate after subsequent exposure. It is my hope that Phorum audience members will be familiar with audio-visual work that provided *funny* results, so that this data can be properly interpreted and understood.
A commonly used approach to investigating how speech is planned is to analyze how controlled variables affect the reaction time to initiate an utterance. A much less commonly used approach is to analyze how quickly speech can be halted. I present preliminary results of an experiment that investigates how quickly speakers can stop speaking in mid-utterance. Two main questions that I address are (1) does the stress of an upcoming syllable influence the time it takes to stop speaking and (2) does the rhythmicity of a sentence influence stop-RT?
Review of some rhythmical properties of speech that I have examined over the years. I will play a number of short clips of rhythmically spoken speech that I have collected, show how my own thinking has evolved and make some suggestions for younger researchers.
It has been shown extensively in the literature of word perception that the existence of similar-sounding words (i.e. phonological neighbors) in the lexicon inhibits auditory recognition of the target word (Luce & Pisoni 1998). However, the effect of phonological neighborhood density on word production is less investigated. Previous studies have proposed two accounts. One is the facilitation account, i.e. similar-sounding neighbors contribute to the activation of the target word and thus facilitate production (Vitevitch 1997, 2002). The other one is the hyperarticulation account, i.e. speakers hyperarculate words from dense neighborhoods for the sake of listeners (Wright 1997, Munson & Soloman 2004, Scarborough 2002).
The current work explores the effect of neighborhood density in spontaneous speech production, using the Buckeye corpus. The variable under investigation is word duration in CVC monomorphemic content words. The two existing accounts will have opposite predictions for duration (facilitation -> shortening; hyperarticulation -> lengthening). A mixed-effect model is built to test the effect of neighborhood size and average neighbor frequency on word duration, while controlling for other linguistic and nonlinguistic factors. Current results show that neighborhood size has a robust facilitative effect on word duration – words with more neighbors are produced faster than those with fewer neighbors. Furthermore, comparison of neighborhood measures calculated from different dictionaries reveals that when more realistic word frequency measures are used, the effect of average neighbor frequency also reaches significance, in the same direction. Together these results provide evidence for the facilitation account, i.e. the more neighbors a word has, and the more frequent these neighbors are, the faster it is to produce this word.
Yue dialects, i.e. Cantonese, Bobai and Xinyi, have a process whereby a rising tone replaces the lexical tone of the head noun to derive diminutive forms, referred to as Pinjam (changed tones) in the literature. Chao (1947) and Benedict (1942) noticed that, in Cantonese, this derived rising tone has a slightly longer duration than the lexical rising tone. The same phenomenon is observed in Bobai and Xinyi, where the derived rising tones are longer than the lexical rising tone (Wang 1932, Ye & Tang 1982). We conducted a phonetic study to validate the post-1940 data, and found that the average duration of the lexical rising tone is 0.256 seconds, whereas the average duration of the derived rising tone is 0.518 seconds. Even the longest lexical rising tone averages shorter in duration than the shortest derived rising tone.
Establishing a correspondence between the Mandarin diminutive suffix [-ɻ] and the Cantonese high rising Pinjam, Chao (1959) used mora to describe this additional length, suggesting that the Cantonese mora is a suffix taking the form of a high tone rather than sound segments. This conjecture, capable of explaining the additional length associated with the Pinjam, is contrary to current theories according to which tones, being suprasegmental objects, have no temporal basis of their own. How to solve this paradox?
Following O'Melia (1939) and Whitaker (1956) according to whom the additional length is to compensate the elided diminutive suffix [ɲ] in Bobai, a more conservative dialect compared to Cantonese, we claim that tones, rather than vowels, lengthen in order to fill the vacuum left by the elision of the neighboring syllable. A conjecture based on segmental compensatory lengthening will encounter one problem: if the additional tonal duration had to be explained by the compensatory lengthening of vowels, no change in length would be expected to occur in closed syllables. However, the additional length is observed in both open and closed syllables in Pinjam. If we posit that the codas in Yue are moraic, we will encounter another problem: it is difficult to understand why, in the case of three entering tones, tonal duration is shortened by final stops in closed syllables in Cantonese, whereas the same final stops are capable of bearing a long rising tone in Pinjam. It remains the only possibility: it is tone that lengthens under syllable elision, not vowel. In other words, the vowel lengthens under the pressure of the tone, not the tone under the pressure of the vowel.
Positional augmentation constraints are potentially neutralizing, since they call for a prominent property to be realized in a strong position (e.g., "If stressed, then heavy"), and since that property therefore cannot contrast with its absence (e.g., no contrast for vowel length in stressed syllables) (Smith 2005). Because augmentation constraints make no counterbalancing demands of weak positions, the same contrast which they neutralize in strong positions can be maintained in weak positions.
This sort of pattern – which I will refer to as Strong-Position Neutralization, or SPN – is either altogether absent or exceedingly rare, and one of my central goals is to account for this.
My proposal involves two formal components, one related to markedness and one to faithfulness, each based on a very general idea:
1. No constraint demands augmentation only.
2. Contrasts are generally more likely to survive in prominent positions than in weak, because contrast cues are bolstered by prominence correlates.
The first idea gives rise to the formal proposal that all markedness constraints which correlate prominence with phonetic correlates are biconditionals; e.g., "If and only if a syllable is prominent, it is relatively long." Thus a demand for augmentation is also a demand for a corresponding reduction. This is built on Liberman and Prince's (1977) notion that stress is only defined relationally in a local domain: the strong is only defined relative to its weak neighbors.
The second idea gives rise to the formal proposal that all cue-based faithfulness constraints for a given feature outrank all general faithfulness constraints for the same feature. This means that, e.g., DEP-µ/INTENSE, "In a relatively intense syllable, don't add a mora," outranks MAX-µ, "Don't remove a mora," despite the fact that they monitor different types of correspondence. This gives preference to moraic contrasts in strong positions (i.e., positions of relatively high intensity).
With both formal components in place, SPN does not emerge as a prediction, while other attested patterns of positional neutralization do. The components of the theory also make other interesting predictions, which I believe are borne out.
Liberman, Mark, and Alan Prince. 1977. On stress and linguistic rhythm. Linguistic
Smith, Jennifer. 2005. Phonological augmentation in prominent positions. Outstanding Dissertations in Linguistics. New York and London: Routledge.
The progression of regional vowel shifts has been of central concern in sociophonetic research (Labov, Yeager & Steiner 1972; Labov, Ash & Boberg 2006). Researchers' aim has been to determine which linguistic and social constraints propel or inhibit particular sections of a given change-in-progress. For example, Labov (2001) has argued that speaker ethnicity interacts with the adoption of sound change: White speakers further the advancement of vocalic chain shifts while non-White speakers do not.
In the first half of this talk I will consider these issues with respect to the California Vowel Shift (Eckert 2008). Based on sociolinguistic interviews and semi-ethnographic fieldwork in a neighborhood of San Francisco, I analyze vowel production patterns across speakers of varying ethnicities. The data collected thus far indicate that Asian Americans are not only indistinguishable from their White counterparts for some vowels (the fronting of /u/ and /o/), but are, in fact, leading in broader regional sound changes (the merger of /a/ and /ɔ/). However, these results are not surprising in for San Francisco, where the social history and current demographics suggest that equating regional sound change with White speech patterns is inappropriate. Rather, since regional variation is inextricably tied to social variation, the particular social constraints on sound change must be determined with respect to a given community.
Building on this initial analysis, I will then discuss one of the most important linguistic constraints inhibiting all of these back vowel shifts: the presence of a following /l/. These data present a interesting situation for the analysis of coda-/l/ syllables, because /l/ is undergoing variable vocalization to a semi-vowel, and vocalization appears to be favored by preceding back vowels. Vocalization is also favored by Asian Americans, suggesting a heritage language substrate effect. While the potential impact of vocalization on the progression of the California Vowel Shift remain to be seen, particularly given the notorious methodological challenges in measuring degree of vocalization accurately, its interaction with ethnicity suggests some probable outcomes.
The earliest research on "interlanguage" phonologies was based on two related assumptions: the existence of a critical period for language acquisition, and unidirectionality of cross-language influence (going from the first language, L1, to the second language, L2). Recent research has challenged both of these assumptions. Some (e.g. Flege 1987b) have pointed out numerous problems with the enterprise of proving that a critical period exists. Furthermore, there is mounting evidence that cross-language influence can, in fact, go from L2 to L1. Flege (1987a) and Sancier and Fowler (1997), for example, show that voice onset time in L1 voiceless stops shifts towards the phonetic norm of L2 voiceless stops when L1/L2 speakers are immersed in an L2 environment for an extended period of time. In the framework of Flege's (1995) Speech Learning Model, this change in L1 arises from an "equivalence classification" of similar L1 and L2 sounds that ties them to the same higher-level category, thereby allowing both sounds to be affected by input in L1 or L2.
However, we still know very little about how and when equivalence classification occurs. Nearly all of the work in this area examines the pronunciation of highly proficient bilinguals after they have spent a long time in an L2 environment and, moreover, focuses on L1/L2 pairs that share the same alphabet, making it unclear how generalizable these results are to learners of lower levels of proficiency (who normally constitute the majority of the population of adult L2 learners) and to cases of contact between languages that do not overtly equate similar sounds via identical orthography.
Thus, I delve deeper into the nature and time course of L1 phonetic drift by examining the very first weeks of 20 native English speakers' immersion in a Korean language environment. In a weekly elicited production task, these adult acquirers of Korean read the same series of Korean and English words, and acoustic measurements of voice onset time (VOT) and fundamental frequency (f0) onset were taken on their productions of 60 words beginning with stop consonants. These data indicate that learning Korean stops affects the production of English stops in as little as one week. In the case of English voiced stops (which in initial position resemble Korean fortis stops in having a short VOT, but differ in having a lower f0 onset), a repeated-measures ANOVA shows no main effect of time on VOT, but does show a main effect of time on f0 onset. In the case of English voiceless stops (which resemble Korean aspirated stops in having a long VOT and high f0 onset, though not as long or high as the Korean stops), a repeated-measures ANOVA shows a main effect of time on both VOT and f0 onset.
In both of these cases, the pattern of change in the English sounds approximates the characteristics of the Korean sounds to which the English sounds are most phonetically similar. English voiced stops do not change significantly in VOT over time, since they are already similar to Korean fortis stops in this respect; however, their f0 onset rises in approximation to the elevated f0 onset typical of the Korean fortis category. Meanwhile, English voiceless stops become longer in VOT and higher in f0 onset in approximation to the Korean aspirated stops. These results indicate that L1 phonological categories are much more malleable than previously imagined, subject to phonetic drift on a timescale of weeks rather than months or years – even when there is no clear orthographic correspondence between the L1 and L2 sounds. Given that in other tasks most of the learners in this study show command of only two (English) laryngeal categories during this time period, these findings suggest that the equivalence classification that gives rise to this phonetic drift may be rather low-level in nature.