Skip to main content

Distributed for Center for the Study of Language and Information

The Significance of Word Lists

Statistical Tests for Investigating Historical Connections Between Languages

Similar words for similar concepts turn up in many widely scattered languages. Some linguists insist that this is simply due to chance while others claim that many if not all of the world’s languages descended from a single prehistoric language. Yet neither position in this strident controversy has been analyzed or supported with statistics. New computerized statistical techniques can help determine whether or not words in different languages have an ancestral connection. These flexible techniques are explained, broken into steps, and illustrated in a manner that provides the necessary principles to linguists with no background in statistics.

This methodology measures the probabilistic significance of sound correspondences between short word lists. Many rules of thumb invoked by linguists in order to obviate chance resemblance, such as multilateral comparison and emphasizing grammar over vocabulary, are shown to actually decrease the power of quantitative tests. While the procedures presented here are straightforward, the author also details the extensive linguistic work needed to produce word lists that will not yield nonsensical results. Examples analyze the 200 words in 8 languages that are enumerated and detailed in an appendix.

Table of Contents

1. Introduction
2. Statistical Methodology
3. Significance Testing
4. Tests in Different Environments
5. Size of the Word Lists
6. Precision and Lumping
7. Nonarbitrary Vocabulary
8. Historical Connection vs. Relatedness
9. Language-Internal Cognates
10. Recurrence Metrics
11. Conclusions
Appendix: Word Lists

Be the first to know

Get the latest updates on new releases, special offers, and media highlights when you subscribe to our email lists!

Sign up here for updates about the Press