A paradoxical property of the monkey book

作者:Bernhardsson Sebastian*; Baek Seung Ki; Minnhagen Petter
来源:Journal of Statistical Mechanics: Theory and Experiment , 2011, P07013.
DOI:10.1088/1742-5468/2011/07/P07013

摘要

A 'monkey book' is a book consisting of a random sequence of letters and blanks, where a group of letters surrounded by two blanks is defined as a word. We compare the statistics of the word distribution for a monkey book to real books. It is shown that the word distribution statistics for the monkey book is different and quite distinct from a typical real book. In particular, the monkey book obeys Heaps' power law to an extraordinarily good approximation, in contrast to the word distributions for real books, which deviate from Heaps' law in a characteristic way. This discrepancy is traced to the different properties of a 'spiked' distribution and its smooth envelope. The somewhat counter-intuitive conclusion is that a 'monkey book' obeys Heaps' power law precisely because its word-frequency distribution is not a smooth power law, contrary to the expectation based on simple mathematical arguments that if one is a power law, so is the other.

  • 出版日期2011-7