A model of language inflection graphs

作者:Fuks Henryk*; Farzad Babak; Cao Yi
来源:International Journal of Modern Physics C, 2014, 25(6): 1450013.
DOI:10.1142/S0129183114500132

摘要

Inflection graphs are highly complex networks representing relationships between inflectional forms of words in human languages. For so-called synthetic languages, such as Latin or Polish, they have particularly interesting structure due to the abundance of inflectional forms. We construct the simplest form of inflection graphs, namely a bipartite graph in which one group of vertices corresponds to dictionary headwords and the other group to inflected forms encountered in a given text. We, then, study projection of this graph on the set of headwords. The projection decomposes into a large number of connected components, to be called word groups. Distribution of sizes of word group exhibits some remarkable properties, resembling cluster distribution in a lattice percolation near the critical point. We propose a simple model which produces graphs of this type, reproducing the desired component distribution and other topological features.

  • 出版日期2014-6

全文