摘要

Geography Markup Language (GML) has become a de facto standard for encoding and exchanging geographic data. Usually, GML documents are of huge size due to its verbose structures and textual data, hence it is very costly to store and transit them. In this paper, we propose an effective pattern-based approach to compressing GML documents. First, a tree-structured pattern from the GML document under compression is extracted. Then, a tree automaton for matching the document against the extracted pattern is constructed. While doing compression, the GML document is matched against the pattern to generate a bits-stream that represents the difference between the document's structure and the extracted pattern. Meanwhile, we separate document structure from document content and group document content into different streams according to the tags. Spatial coordinate data are compressed by delta encoding. Finally, the extracted pattern, all streams and encodings are forwarded to a text compressor gzip. Extensive experiments on real GML documents show that the proposed approach outperforms the existing XML and GML compression approaches in compression ratio, while keeping an acceptable compression efficiency.