A computational grammar for Persian based on GPSG

作者:Bahrani Mohammad*; Sameti Hossein; Manshadi Mehdi Hafezi
来源:Language Resources and Evaluation, 2011, 45(4): 387-408.
DOI:10.1007/s10579-011-9144-1

摘要

In this paper, we present our attempts to design and implement a large-coverage computational grammar for the Persian language based on the Generalized Phrase Structured Grammar (GPSG) model. This grammatical model was developed for continuous speech recognition (CSR) applications, but is suitable for other applications that need the syntactic analysis of Persian. In this work, we investigate various syntactic structures relevant to the modern Persian language, and then describe these structures according to a phrase structure model. Noun (N), Verb (V), Adjective (ADJ), Adverb (ADV), and Preposition (P) are considered basic syntactic categories, and X-bar theory is used to define Noun phrases, Verb phrases, Adjective phrases, Adverbial phrases, and Prepositional phrases. However, we have to extend Noun phrase levels in X-bar theory to four levels due to certain complexities in the structure of Noun phrases in the Persian language. A set of 120 grammatical rules for describing different phrase structures of Persian is extracted, and a few instances of the rules are presented in this paper. These rules cover the major syntactic structures of the modern Persian language. For evaluation, the obtained grammatical model is utilized in a bottom-up chart parser for parsing 100 Persian sentences. Our grammatical model can take 89 sentences into account. Incorporating this grammar in a Persian CSR system leads to a 31% reduction in word error rate.

  • 出版日期2011-12