The Berkeley Phonetics & Phonology Forum (Phorum) is a weekly talk and discussion series featuring presentations on all aspects of phonology and phonetics.
Speakers use auditory feedback to monitor their own speech, making adjustments to keep their output on target. This feedback-based control may occur at a relatively low level, based on pure acoustic input, or it may occur at a higher level, after language-dependent transforms take place in auditory cortex. The two experiments discussed here tested the influence of perceptual categories on auditory feedback control. Linguistic influences were assessed using real-time auditory perturbation to induce differences between what is expected and what is heard. The results demonstrate that auditory feedback control is sensitive to linguistic contrasts learned through auditory experience.
One of the fundamental processes of speech perception is a contextual normalization process in which segments are "parsed" so that the effects of coarticulation are reduced or eliminated. For example, when consonants are said in sequential order (e.g. the [ld] in "tall dot" or the [rg] in "tar got") the tongue positions for the consonants interact with each other. This "coarticulation" is undone in speech perception by a process that is called compensation for coarticulation. The basis of this process is a source of much controversy in the speech perception literature. The studies that I report in this talk probe the compensation for coarticulation process in two ways. The first set of experiments examines the role of top-down expectations, finding that the compensation effect is produced when people think they hear the context, whether the context is present or not. This dissociation of the context effect from any acoustic stimulus parameter indicates that at least a portion of the compensation effect is driven by expectations. The second set of experiments examines the role of articulatory detail in the compensation effect, finding that the compensation effect is driven at least partly by detection of particular tongue configurations. This set of experiments looked at perception in context of the "retroflex" and "bunched" variants of English "r" and found that this low-level articulatory parameter influences perceptual boundaries. The overall picture that emerges is one of listeners who make use of fine-grained articulatory expectations during speech perception.
Automatic speech recognition (ASR) is a computer technology that performs speech-to-text processing, using systems that integrate linguistic knowledge along with statistical methods to learn from data. This talk will describe the modern statistical approach to artificial intelligence, while highlighting specific ways in which language structure is present in ASR systems. In addition, some analysis and discussion will consider the suitability of these engineering assumptions. For example, characteristics of human auditory sensitivity are encoded in a front-end signal processing module, which uses normalization and adaptation techniques to provide robustness against speaker variability and environmental conditions. An ASR system's acoustic model -- as well as other components -- will require a phonemic pronunciation dictionary to define the composition of words; evidence suggests this is fundamentally flawed, although in practice it is a more effective solution than other alternatives. Rather than presenting speech technology in the familiar context of transcription, I think this audience would appreciate exposure to a lesser-known application: HMM-based forced alignment. This automatic procedure can provide an accurate word-level segmentation of transcribed speech, and could enable convenient indexing for large collections of audio or video recordings.
Phorum will be an open discussion of an experimental design that is currently in development. The experiment aims to investigate REA (right ear advantage) and LEA for segmental/prosodic processing in the context of a discrimination task. REA/LEA phenomena arise from biased projections from right and left subcortical auditory systems to the contralateral hemispheres. Segmental speech processing has been associated with the LH/REA and prosodic processing with RH/LEA. The discussion purports to elicit feedback and brainstorming regarding issues in task design and stimuli construction relevant to the experiment.
Preschoolers in the UC Berkeley child development program come from hugely diverse linguistic backgrounds. Of the children tested so far in the present study, approximately half could be considered monolingual English speakers. The non-monolinguals are evenly split between children who speak English and another language at home, and children who speak no English at home.
In this talk, I will compare results from these three groups of children on two speech processing tasks (one perception- and one production-oriented) in a first pass at examining the effects of amount of home English exposure on learning to process English words and sounds. While more subjects are still needed, the results so far suggest that the developmental path toward adult-like speech processing likely varies significantly for children simultaneously acquiring multiple languages.
Spoken words vary in their degree of acoustic prominence, or intelligibility. Discourse-given or predictable words tend to be reduced; new or unpredictable words tend to be acoustically prominent (e.g. Bell et al. 2009; Fowler & Housum 1987). An unresolved question is whether this variation results from audience design or purely speaker-internal constraints. I consider two possible versions of the audience design view: 1) speakers use acoustic reduction in order to mark the discourse status of referents, which is defined with respect to a shared, common-ground discourse model, and 2) speakers model the addressee's comprehension needs, and provide more explicit acoustic input when the word or referent is difficult to identify, such as when it is discourse-new or unpredictable. I present the results of three experiments that test these ideas, and conclude that while audience design has some impact on acoustic reduction, it is not in the ways suggested by either of these accounts. Instead, acoustic reduction is primarily driven by speaker-internal constraints on planning and production.
This will be a workshop-like talk on the application of STRUCTURE  to linguistic data.
STRUCTURE is a model for explaining the distribution of features in a sampled population in terms of an arbitrary number of "ancestral populations". Each specimen is explained as inheriting its features from one or more ancestral populations. Though originally developed for studying population genetics, STRUCTURE can conceivably be applied to clustering and classification problems in historical linguistics and dialectology, particularly when contact-induced borrowing, not internal development, is the primary source of linguistic change.
I will discuss the statistical underpinnings of the model in some detail, and will review Reesink et al's use of STRUCTURE in classifying the languages of Southeast Asia and Oceania . With Tammy Stark, I will also be presenting some preliminary results from applying STRUCTURE to the sound inventories of 300+ South American languages.
Word-final devoicing is a claimed example of domain generalization, a shift in a sound pattern from a larger prosodic domain to a smaller one: it derives diachronically first from the deterioration of voicing utterance-finally, and then is generalized to the end of all words. However, it has never been established that people perform domain generalization. We report on three artificial language learning experiments in which subjects were exposed to a final devoicing pattern only in utterance-final position, and then were tested on whether they apply the pattern to new utterances and utterance-medial words. Results of two of the experiments support domain generalization. We discuss implications of domain generalization for phonological theory.
Debuccalization is a type of sound alternation or sound change that involves lenition of a consonant to a laryngeal consonant. Although it is often discussed in the literature as a subtype of lenition, it is unclear if debuccalization is a unified phenomenon. Any attempt to unify these various debuccalization processes must be able to account for the fact that the same segment weakens to different laryngeals in different languages (e.g. /k/ → [ʔ] in Indonesian, /k/ → [h] in Florentine Italian fast speech).
One possible explanation for the variation in debuccalization involves neutralization avoidance. It is plausible that neutralization causes difficulty in rule learning and processing, and for that reason neutralization avoidance may be a part of the grammar. An artificial grammar experiment was performed to investigate this possible effect. The artificial grammar includes a debuccalization rule modeled on Florentine Italian. Two versions of the same basic language were created — the phoneme inventories were manipulated to make the rule non-neutralizing for Language A, but neutralizing for Language B. Preliminary results of the experiment are presented.