A Statistical Study on Chinese Word and Character Usage in Literatures from the Tang Dynasty to the Present

作者:Chen Qinghua*; Guo Jinzhong; Liu Yufan
来源:Journal of Quantitative Linguistics, 2012, 19(3): 232-248.
DOI:10.1080/09296174.2012.685305

摘要

In this paper, we carried out a statistical analysis on the Chinese corpus in the Tang, Song, Yuan, Ming and Qing Dynasties, as well as in the modern time. We found that character and word frequencies change over time so that the word frequency always abides by the Zipf-Mandelbrot law p(r) = C(r + r(0))(-beta), while the character frequency follows the Menzerath-Altmann law P(r) = Ae(-ar)r(-b). In the case of the character frequency distribution, the exponential property increases and the power-law feature declines as time passes by. We also found that more and more compound words were created since the Tang Dynasty. Single-character words show up unevenly in the whole word frequency distribution, with more of them concentrating in the earlier period and decaying exponentially.

全文