摘要

In this paper we propose novel approaches to the problem of classifying high entropy file fragments. Although classification of file fragments is central to the science of Digital Forensics, high entropy types have been regarded as a problem. Roussev and Garfinkel (2009) argue that existing methods will not work on high entropy fragments because they have no discernible patterns to exploit. We propose two methods that do not rely on such patterns. The NIST statistical test suite is used to detect randomness in 4 KiB fragments. These test results were analysed using an Artificial Neural Network (ANN). Optimum results were 91% and 82% correct classification rates for encrypted and compressed fragments respectively. We also use the compressibility of a fragment as a measure of its randomness. Correct classification was 76% and 70% for encrypted and compressed fragments respectively. We show that newer more efficient compression formats are more difficult to classify. We have used subsets of the publicly available 'GovDocs1 Million File Corpus' so that any future research may make valid comparisons with the results obtained here.

  • 出版日期2013-12