*Some of this material is based upon work done under the joint auspices of the Centre National de la Recherche Scientifique (CNRS), Paris, through its Laboratoire des Langues et Civilisations à Tradition Orale (LACITO) and the Sino-Tibetan Etymological Dictionary and Thesaurus (STEDT) Project, supported by the National Science Foundation under Grant Nos. BNS-867726 and FD-92-09841 and by the Division of Research Programs of the National Endowment for the Humanities, an independent federal agency, under Grant Nos. RT-20789-87 and RT-21420.

We would like to thank Dan Jurafsky, Jim Matisoff, Boyd Michailovsky, and the three reviewers for their insightful comments. Any errors or omissions which remain are completely our responsibility.

[1]The term `allofamy,' due to Matisoff (1978), refers to relationship `among the various individual members of the same word-family.' English royal and regal, borrowed from French and Latin respectively, are both ultimately traceable to the same PIE root *reg-, and so are co-allofams in Modern English (Matisoff 1978:16-18, Matisoff 1992:160). A word family might contain both native words and words borrowed from related languages; the borrowings may be recent or ancient.

[2]RE is written in SPITBOL, a dialect of SNOBOL4. The current implementation is specific to 80386 and higher microcomputers running MS-DOS. A C++ version is planned.

[3]Upstream in the sense of time. We had originally described the temporal directions of the program as backward and forward. The opposition of upstream and downstream, suggested to us by Professor John Hewson, one of the developers of the first "Electronic Neogrammarian," [Hewson 1973] is much more intuitive.

[4]In fact, the situation is slightly more complicated than is shown here: there are two other possible reconstructions and another possible cognate set which are not shown because of space considerations. This example is discussed in more detail in section 5.1.

5The authors of RE developed this technique independently, and later discovered this methodologically similar computer project on Proto-Algonkian.

6The term REFLEX will be reserved for describing a complete modern form which is the regular descendant of some protoform. OUTCOME will be used for the regular descendent of a protoconstituent.

7There is a great deal more to say about specificity and the complexity of the environmental constraints, so much so that a separate and rather lengthy discussion of it is merited. As currently implemented in RE, context must be stated in terms of immediately adjacent constituents (remote context cannot be used). Also, the context must be stated in terms of constituents (i.e. atoms), or lists of constituents: regular expressions and other possible definitions are not supported. Specificity is measured in a straightforward way: correspondence rules with no context have low specificity (specificity = 0). Rules with a one-sided context have specificity 1. Rules with a contextualizing element on both sides have specificity 2. Only integer specificities are supported.

8The cover symbol Ó is used to permit the upstream reconstruction of Tukche forms in which the tone of the modern form is not precisely known. In the downstream direction, however, it licenses the generation of two possible reflexes.

9While the Taglung form itself is sufficient to determine the `proper' reconstruction in this case, and if the Syang form were not available, it would break the tie between the other competing reconstructions (ìglin, ìgi>=, and ìgin), it is usually difficult to pick out such decisive lexical items from a list of words.

10For a discussion of set-covering and NP-complete problems, see for example Ralston and Reilly 1993:938-941.

11Merger refers to the diachronic process by which the distinction between two (or more) phonemes is lost. Words which were minimal pairs on the basis of this distinction become homophones. Split refers the process by which a phoneme becomes two (usually due to some modification in the context).

12The X which occurs in the Taglung form is a cover symbol meaning "unspecified tone") and is used when the tone of the form is unknown. This allows RE to reconstruct the form under any of the tones. If this cover symbol were left out, RE would reconstruct this form without a prototone (permitted by the canon), and the form would fail to form a set with other forms which do have the tone specified.

[13]It might be possible to apply the results of some recent research in the area, for example Wordnet (Miller 1990), to part of the problem. Indeed, the "semantic formulas" developed for RE are similar structurally and conceptually to the "synsets" of Wordnet. Ultimately, any solution would have to be sensitive not only to synchronic relationships in a single language (like Wordnet) but also to semantic shifts (both universal and language-specific) and the possibility of several different glossing metalanguages (in this case both French and English are used).

14In cases where a set is eliminated as a result of becoming a subset of another set, the reconstructions of the set being eliminated may have to be merged into the larger set.

15Using semantic formulas such as those defined in (7), for example, creates precise cognate sets which are composed only of reflexes which are assured to be semantically compatible (though some likely candidates might be eliminated when the semantic formula is incomplete). Using semantic formulas such as those defined in (8) would remove semantically incompatible reflexes, but leave those which for which semantic compatibility is unspecified.

[16]Allofamy, the relationship between words in a word-family, is described in more detail in section 1.0, especially in the footnote to Figure 1.

17For example English have and Latin hab--ere.