Automatic Extraction of Apparent Semantic Structure from Text Contents of a Structural Calculation Document

作者:Kim Bong Geun; Il Park Sang; Kim Hyo Jin; Lee Sang Ho*
来源:Journal of Computing in Civil Engineering, 2010, 24(3): 313-324.
DOI:10.1061/(ASCE)CP.1943-5487.0000047

摘要

A generic method for the automatic extraction of apparent semantic document structure from a structural calculation document was proposed in this paper. The method consists of two processes: extracting subtitles and classifying depth levels of the subtitles. The subtitles become tree nodes of the apparent semantic structure. A context model of technical documents was built for the subtitle extraction from plain text information. In addition, a formal classification method for the determination of depth levels of the subtitles was developed and used to build a document tree with sequentially ordered subtitles. An application module of the proposed method, which transforms a plain text document into a semistructured XML document, was implemented. Performance of the developed application module was also evaluated with 40 test documents including structural calculation documents, technical reports, and theses.

  • 出版日期2010-6