A feature-based approach to better automatic treebank conversion

Zhu Muhua; Zhu Jingbo<sup>*</sup>; Wang Huizhen

doi:10.1007/s10579-013-9234-3

摘要

In the field of constituency parsing, there exist multiple human-labeled treebanks which are built on non-overlapping text samples and follow different annotation standards. Due to the extreme cost of annotating parse trees by human, it is desirable to automatically convert one treebank (called source treebank) to the standard of another treebank (called target treebank) which we are interested in. Conversion results can be manually corrected to obtain higher-quality annotations or can be directly used as additional training data for building syntactic parsers. To perform automatic treebank conversion, we divide constituency parses into two separate levels: the part-of-speech (POS) and syntactic structure (bracketing structures and constituent labels), and conduct conversion on these two levels respectively with a feature-based approach. The basic idea of the approach is to encode original annotations in a source treebank as guide features during the conversion process. Experiments on two Chinese treebanks show that our approach can convert POS tags and syntactic structures with the accuracy of 96.6 and 84.8 %, respectively, which are the best reported results on this task.

出版日期2013-12
单位东北大学

全文

访问全文

收藏分享被引浏览

更新时间：2019-03-28 07:59

A feature-based approach to better automatic treebank conversion

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友