Phonetics & Phonology Forum is a weekly talk and discussion series featuring presentations on all aspects of phonology and phonetics.
The acoustic signal of speech is rich in temporal and frequency patterns. These power fluctuations in time and frequency are called modulations. Spoken words remain intelligible after drastic degradations in either time or frequency. To fully understand the perception of speech, and to be able to reduce the speech signal to essential components, we need to completely characterize how modulations in amplitude and frequency contribute together to the comprehensibility of speech. Hallmark research has distorted speech in time and frequency, but the manipulations have been described only in terms of one domain or the other, without quantifying the remaining and missing portions of the signal. We used a novel sound filtering technique to systematically investigate the spectrotemporal modulations that are crucial for understanding speech. Our conceptually new filtering procedure operates within a framework that completely describes the spectral information left intact by temporal smearing, and the temporal information left after smearing in frequency. Both the modulation-filtering approach and the resulting characterization of speech could be used to reduce the bandwidth of speech while best preserving intelligibility. They could potentially change the fundamental terms by which researchers characterize communication signals.
This talk will describe research done within the framework of an undergraduate honors thesis at UC Berkeley, under the advisorship of Professor Keith Johnson. This study examines the importance of F0 in the process of phonetic convergence using an immediate-repetition or "shadowing task". Previous research has suggested that F0 facilitates the transmission of social information that individuals can use to assess their social orientation in regards to a talker (Gregory et al., 1991, 1996, 2001). Social theories of accommodation assert that this process mediates a subconscious decision to converge (imitate) or diverge in speech. My data supported this hypothesis, where participants who shadowed a talker whose F0 had been high-pass filtered imitated less than participants who shadowed the talker's full range of speech. In addition, several compelling interactions between gender and imitation point to the need for a new, integrative model that allows for social processes to intervene in an exemplar-based perception-production link.
The purpose of this discussion is twofold: 1) to draw together some themes that have emerged in recent work in linguistics, especially from interfaces between phonetics and phonology in sociolinguistics, and 2) to argue that we are in fact facing a state of what Kuhn (1962), in his study The Structure of Scientific Revolutions, called "crisis science." What impact have quantitative and experimental phonetic and sociolinguistic approaches had on the production of this crisis? These themes will be synthesized under an exemplar-theoretic framework in which I posit Frequency, Recency, Expectation, Social saliency, Clustering, and Agency, as the primary components that can help bring psycholinguistic exemplar theory more in line with the sociolinguistic literature. Throughout we consider phonological, phonetic, and sociolinguistic implications for communities of speakers, as well as for the changing paradigms in our field.
In speech perception there is a controversial perceptual phenomenon called compensation for coarticulation. In this phenomenon listeners label speech as if they are subtracting coarticulation caused by a neighboring segment. For example, when an /r/ precedes a segment that is ambiguous between /d/ and /g/ listeners report more /d/'s than they do when an /l/ precedes. This perceptually subtracts the retracting effect of the preceding /r/. Some types of sound change may be due to perceptual parsing such as this.
However, compensation for coarticulation is controversial in the psycholinguistics literature because some see it as evidence for a "gesture recovery" model of speech perception (Mann, 1980), while others (Lotto & Kluender, 1998) have suggested that the effects are due to a low level frequency contrast effect in auditory perception. This latter "auditory contrast" theory gives a quite different view of how speech perception influences sound change.
I will present experimental results that (1) replicate a famous case of compensation for coarticulation (Mann, 1980), (2) extend the effect to much more natural sounding stimuli (where the effect is much weaker even though the frequency contrasts are comparable), and (3) show that compensation for coarticulation is influenced by the perceived context even when the speech signal is unchanged (extending Ohala & Feder's, 1994 study). The overall implication of this work is that compensation for coarticulation must involve at least some gestural or linguistic component, and therefore that the "auditory contrast" theory is insufficient.
Lotto, A.J. & Kluender, K.R. (1998) General contrast effects in speech perception: Effect of preceding liquid on stop consonant identification. Perception and Psychophysics 60, 602-619.
Mann, V.A. (1980) Influence of preceding liquid on stop-consonant perception. Perception and Psychophysics 28, 407-412.
Ohala, J. J. & Feder, D. 1994. Listeners' identification of speech sounds is influenced by adjacent "restored" phonemes. Phonetica 51, 111-118.
In this talk, we will present our preliminary findings on the phonology of Imbabura Quichua, a dialect of Quechua spoken in Ecuador. This work is ongoing and still evolving as we elicit new data; consequently, we will be presenting our best hypotheses to date, along with some discussion of the theoretical and practical difficulties involved in collecting phonetic and phonological data from scratch.
The theory of tone merger in Northern Chinese dialects was first proposed by Wang (1980), and further developed by Lien (1986), the migration of IIb (Yangshang) into III (Qu) being a common characteristic. The present work aims to provide an update on the current state of tone merger in Northern Chinese, with a special focus on Dalian, a less well-known Mandarin dialect spoken in Liaoning province in Northeast China.
According to Song (1963), four lexical tones are observed in citation form, i.e. 312, 34, 213 and 53 (henceforth Old Dalian). Our data obtained from a young female speaker of Dalian (henceforth Modern Dalian) suggest an inventory of three lexical tones, i.e. 51, 35 and 213. The lexical tone 312 in Old Dalian, derived from Ia (Yinping) is merging with the falling contour tone, derived from III (Qu). This tendency is consistent with some dialects spoken in the neighboring Shangdong province, where a reduced tonal inventory of three tones is becoming more and more frequent in the last decade.
However, the tone merger in Modern Dalian is incomplete on two grounds. On the one hand, a slight phonetic difference is observed between these two falling tones: both of them have similar F0 values, but the falling contour derived from Ia (Yinping) has a longer duration compared with the falling contour derived from III (Qu). Nevertheless, the speaker judges the contours to be the same. On the other hand, the underlying contrasts of these two contours surface in tone sandhi contexts, such that the lexical tone 312 of Old Dalian emerges in combination forms in Modern Dalian. A phonological analysis will proposed to account for the apparently complex tone sandhi rules in Modern Dalian.
It is known that vertical displacement of the larynx changes the size of the oropharyngeal cavity and that such changes affect the oral pressure build-up during stop sounds (Rothenberg, 1968, Ohala 1983). A number of studies have observed the relationship between larynx height and stop voicing, with the larynx tending to be lower for voiced than for voiceless stops in Swedish, American English, French, and Thai (e.g., Ewan and Krones 1974). The present study examines the relation between larynx movement during closure and oral pressure build-up in pulmonic and non-pulmonic stops, and its consequences for voicing maintenance. A number of phonological patterns involving prolonged voicing during stops and historical implosivization of voiced stops may be related to adjustments in the timing and rate of larynx movement gestures.
High-speed video recording with subsequent automatic image processing was used to track the vertical and horizontal movement of the larynx in two male American English speakers (trained phoneticians) during the production of utterance-initial and intervocalic fully voiced stops, and intervocalic voiceless and nasal stops, long stops, and implosives in the context of high and low vowels. Oral pressure and acoustic data were collected simultaneously. Measures of larynx displacement (along the diagonal plane) were related to oral pressure values, and amplitude of voicing in the different consonant types and contexts.
Speech categories are multi-dimensional concepts in that there many cues are available to a listener for the categorization of a given contrast. These dimensions must be properly attended to and weighted by listeners (Raphael 2005). This paper presents results from a series of perceptual learning experiments exploring the role of selective attention in dimensional learning and generalization in non-speech stimuli. Specifically, listeners were trained to categorize different regions of a stimulus space using one or two dimensions. They were then asked to categorize novel stimuli in an adjacent region to assess transfer.
Results demonstrate that trained dimensions were preferred for categorizing novel sounds and that any bias towards a dimension significantly increased reliance on that dimension, regardless of its relevance to the task. Interestingly, while most listeners who were trained to categorize using both dimensions in a specific relationship did indeed tend to use both dimensions when categorizing novel stimuli, the relationship between the dimensions did not transfer. In terms of speech categories, these results suggest that listeners will attempt to use learned cues in new contexts, but that the specific relations between cues may be less adaptable.
This project examines the production and perception of long-distance coarticulation in ASL. Five signers were filmed while signing ASL sentences and additionally, were outfitted with motion-capture sensors via which the three-dimensional coordinates of key points of the signer's body (e.g. the palm of each hand) could be recorded during the course of the signing of each sentence. Evidence of coarticulatory effects of one sign on another were found across up to three intervening signs, though they were generally weaker than effects found in analogous spoken-language studies (Magen, 1997; Grosvald, 2009). This difference appears to be due in part to greater variability among these signers in their articulatory behavior, relative to that of users of spoken language.
A perception experiment using stimuli derived from filmed excerpts of the production experiment showed that both deaf signers and hearing non-signers were sensitive to these coarticulatory effects, though again to a lesser degree than has been found for listeners in spoken-language studies. One possible explanation for this difference is that the visual modality offers direct perceptual access to the relevant articulators, so perceivers of spoken languages might rely more than sign language users on extra cues such as coarticulatory information.
Phonological neighborhood density (PND) refers to the number of words that are phonologically similar to a given word (e.g. cat and kit are phonological neighbors of each other). Previous research has shown that unlike other frequency measures, PND has different effects on comprehension and production. High-PND has an inhibitory effect on comprehension but a facilitatory effect on production (Dell & Gordon, 2003 for a review). My work focuses on the effect of PND on spontaneous speech production. In this talk, I will first briefly review my previous findings with PND and word duration. Then I will present current results with PND and vowel production, followed by a discussion on possible interpretations that integrate results from both the duration study and the vowel study.
This presentation describes and applies a sociolinguistic variationist model to the phenomenon vernacularly known as "stuttering." I expand on Herbert Landar's (1961) previously unexplored hypothesis of stuttering forms as special morphemes by modeling these forms using Morphological Doubling Theory (Inkelas and Zoll 2005). The descriptive model proposed here diverges from the conventional, prescriptive ideologies endorsed and constituted by research in speech‐language pathology that relegates this kind of duplication to evidence of cognitive disorder, rather than creativity and variation (Le Page and Tabouret‐Keller 1985). In the descriptive model, I define variational duplication as a morphological process in conversation, or the language of turn and sequence (Ford, Fox, and Thompson 2002), that appears in both plain and expressive morphology (Zwicky and Pullum 1987). Moreover, similar to aggressive reduplication (Zuraw 2002), variational duplication produces constructions that are morpho‐semantically driven duplication outputs; however, this process does not participate in inflection or derivation, but sociolinguistic variation (i.e., two forms, one semantic meaning).
Using the variationist perspective, I re‐analyze previous linguistic data that speech pathologists have uncovered in regards to these forms. However, I diverge from the interpretations from these previous scholars. Whereas they see the findings as evidence for disability and deformation of the same underlying input, I interpret the findings as morphologically‐conditioned phonology and variation by way of a theory of emergent grammar (Hopper 1987, 1988). In sum, I offer the variationist perspective as an alternative and integrated model that relies on inductively grounded observations, as opposed to pre‐existing principles of what is and is not language. Moreover, this model takes seriously the interwoven nature of grammar and interaction (Ochs, Schegloff, and Thompson 1996) and concludes that a study of variational duplication must also contend with properties of interaction. I conclude the talk with future goals that such a model must account for, as part of a formal theory of phonological variation (Anttila 2002), and offer some methodological suggestions for future studies taking the variationist perspective.
Neutralization limits the amount of phonetic distinctness among morphemes, though, in doing so, increases the number of cues to morpheme boundaries. In this talk I summarize some early insights into these "boundary signals" (Trubetzkoy's, and Firth's), as well as more recent work on so-called "transitional probabilities".
Effects of rhythmic structure on English word order have been found in psycholinguistic studies of production, studies of historical change and variation, and theoretical studies of phonology-syntax interactions. Yet, as observed by Rosenbach (2005), much work on English word order has emphasized syntactic complexity, which can be efficiently measured by simply counting graphemic words, and which has been the subject of influential theories of language processing. Several recent multivariable studies of alternative word order choices with English genitive constructions have only used the word count measure on spoken language corpora, where one might expect direct effects of rhythmic structure on vocal communication. And, despite the growing recent interest in prosodic effects on syntactic constituent order, the relationship of phonological and non-phonological factors has yet to be fully explored. This study therefore seeks to fill a methodological gap by comparing rhythmic factors with known semantic, syntactic, and informational predictors in the genitive alternation in spoken English. We consider two important questions: (1) How good are rhythmic properties (metrical weight, clash, lapse) as predictors of construction choice in spoken language? and (2) How important are rhythmic factors relative to semantic, syntactic, and informational predictors?
Using an annotated database of spoken genitive constructions from the Treebank Switchboard corpus, we examined the effect of weak-strong stress alternation across constituent boundaries (i.e., possessor/possessum) on the genitive alternation using logistic regression models. Findings demonstrate that rhythm plays a significant role in determining genitive construction choice. The overall effect of rhythm on the model, however, still remains smaller than that of other known syntactic and semantic predictors. Hence, the importance of both non-phonological and phonological—especially rhythm-based—factors together cannot be discounted in any model of spoken word order alternations and in theories of variation, change, and language processing built around the relative ordering of syntactic constituents.