Distribution of English syllables in e-books of Project Gutenberg and the evolution of syllable number in two subcorpora

作者:Guo, Shesen*; Zhang, Ganzhou; Zhai, Run; Song, Zehua
来源:Digital Scholarship in the Humanities, 2015, 30(3): 344-353.
DOI:10.1093/llc/fqu013

摘要

Zipf's law (principle of least effort) is proposed through empirical observation and statistical measurement of word probability and rank. We formulated a hypothesis that the mean number of syllables per word type in texts or speeches by writers and speakers has a certain pattern or regularity. This work reports the distribution of the mean number of syllables per word from all available electronic books at the Project Gutenberg's Web site. We used three dictionaries and built a rule-based algorithm to compute the mean number of syllables per word type in each e-book and observed the distribution of the words with the largest number of syllables in those books. A linear equation between the length of word type and the word's number of syllables was proposed through regression analysis. The pattern of historical evolution of mean number of syllables per word type from two subcorpora was identified, tentatively indicating that the mean number of syllables per word type shows a tendency toward becoming smaller.

全文