Annotation Schemes for Constructing Uyghur Named Entity Relation Corpus

作者:Abiderexiti Kahaerjiang; Maimaiti Maihemuti; Yibulayin Tuergen; Wumaier Aishan*
来源:International Conference on Asian Language Processing (IALP), 2016-11-21 To 2016-11-23.
DOI:10.1109/IALP.2016.7875945

摘要

The Uyghur language is a minority language in China, and it is one of the official languages in Xinjiang Uyghur Autonomous Region of China. Approximately 10 million people use Uyghur in their daily lives and regular use is even found on the internet. However, the lack of an Uyghur named-entity relation corpus constrains Uyghur language extraction applications. In this paper, we will propose such an Uyghur named-entity and Uyghur named-entity relation annotation specifications based on existing guidelines and experiences in other languages for Uyghur corpus construction. By sampling raw text from Uyghur language websites, a small experiment has been cond ucted concerning the practicality of our annotation schemes using an annotation tool. After review, we conclude that this method has practical future applications for other resource-poor minority languages of the world. This schemes will provide a basis for further studies of entity relation corpus construction.