摘要

This paper concerns the fundamental problem of identifying the content nature of a flow-namely text, binary, or encrypted-for the first time. We propose Iustitia, a framework for identifying flow nature on the fly. The key observation behind Iustitia is that text flows have the lowest entropy and encrypted flows have the highest entropy, while the entropy of binary flows stands in between. We further extend Iustitia for the finer-grained classification of binary flows so that we can differentiate different types of binary flows (such as image, video, and executables) and even the file formats (such as JPEG and GIF for images, MPEG and AVI for videos) carried by binary flows. The basic idea of Iustitia is to classify flows using machine learning techniques where a feature is the entropy of every certain number of consecutive bytes. Our experimental results show that the classification can be done with high speed and high accuracy. On average, Iustitia can classify flows with 88.27% of accuracy using a buffer size of 1 K with a classification time of less than 10% of packet interarrival time for 91.2% of flows.