Quantitative Methods in Linguistics
Keith Johnson
Department of Linguistics
University of California, Berkeley
December, 2006
Introduction
Increasingly, linguists handle quantitative data in their
research. Phoneticians, sociolinguistics, psycholinguists and
computational linguists deal in numbers and have for decades. Now
also, phonologists, syntacticians and historical linguists are finding
linguistic research to involve quantitative methods. Consequently,
mastery of quantitative methods is becoming a vital component of
linguistic training.
This book has two foci. First, we will introduce
and discuss general strategies and methods of quantitative analysis as
they apply in several subdisciplines in linguistics. Second, the book
provides detailed instruction in practical aspects of handling
quantitative linguistic data, using a particular statistical package
(R) to discover patterns in quantitative data and to test linguistic
hypotheses. After two introductory chapters, the book is divided into
chapters by subdiscipline of linguistics, though the methods
presented in any one chapter are likely to be relevant in almost any
other subdiscipline. So, though t-test is presented in the phonetics
chapter, for example, you would certainly want to make use of t-tests
in psycholinguistics.
Table of Contents
Front matter - acknowledgements, design of the book, table of contents.
1. Fundamentals of
quantitative analysis -- Observations, distributions, central
tendency, variability.
2. Patterns and tests --
Probability density functions, statistical inference/hypothesis
testing, correlation.
3. Phonetics -- two-sample and
paired t-tests, multiple regression, principle components
analysis.
4. Psycholinguistics --
Analysis of variance, between groups and within groups factors,
repeated measures.
5. Sociolinguistics --
Chi-squared, logistic regression.
6. Historical Linguistics --
Lexicostatistics, language similarity, clustering, and
multidimensional scaling.
7. Syntax -- Magnitude estimation,
linear mixed effects models, mixed effects logistic regression.
References
Appendix 1: Getting started with R
Appendix 2: Data sets and scripts used in examples and exercises.
Useful links
Download the R statistical package from the R Project homepage - follow
the link CRAN to the download mirror sites.
While you are at the R Project site, be sure to look at the manuals page as a
starting point for documentation.
I found "Notes
on the use of R for psychology experiments and questionnaires" by
Jonathan Baron and Yuelin Li to be particularly useful.
Get Isidore Dyen's IndoEuropean language similarity matrix and
read about the Dyen, Kruskal & Black lexicostatistical work at their
project page.
Find out more about the use of magnitude estimation in syntactic
research at the web_exp
project at the University of Edinburgh.
The National Institute for Science and Technology has a useful e-Handbook of
Statistical Methods.
A note about Software
One thing that you should be concerned with in using a book that
devotes space to learning how to use a particular software
package is that software changes at a relatively rapid pace.
In this book, I chose to focus on a software package (called "R") that
is developed under the GNU license agreement. This means that the
software is maintained and developed by a user community and is
distributed not for profit (students can get it on their home
computers at no charge). It is serious software. Originally developed
at AT&T Bell Labs, it is used extensively in medical research,
engineering, and science. This is significant because GNU software
(like Unix, Java, C, Perl, etc.) is more stable than commercially
available software - revisions of the software come out because the
user community needs changes, not because the company needs
cash. There are also a number of electronic discussion lists and
manuals covering various specific techniques using R. You'll find
these resources at the R project
web page.
Biographical note
I have a PhD in Linguistics from Ohio State University (1988) and was
a post-doctoral fellow at Indiana University in Psychology, and at
UCLA in Linguistics. I was on the research staff in the medical
school at the University of Alabama, Birmingham for a year before
moving back to Ohio State to teach linguistics (1994-2004). In 2005 I
joined the Department of Linguistics at UC Berkeley, where I currently
direct the Phonology Lab.
I have edited two volumes of research papers in phonetics and
phonology (Johnson & Mullennix, 1997 and Hume & Johnson, 2001 - both
published with Academic Press). Early in my career I was an editor for
the 3rd edition of Language Files, an introductory linguistics
text published by the Department of Linguistics at Ohio State
(Bissantz & Johnson, 1985), and I have always had an interest in what
makes for a good textbook. My Acoustic and Auditory Phonetics
(2002) seems to have been a success thanks to input from many students
and fellow teachers, and I was pleased with how that book helped open
up acoustic phonetics to phonologists who might not have otherwise
taught acoustics in their courses. I hope that a book on quantitative
methods may help to move Lingusitics as a discipline toward greater
explicitness and rigor.