摘要

As Cloud Computing becomes prevalent, more and more sensitive information has been outsourced into cloud. A straightforward methodology that can protect data privacy is to encrypt the data before outsourcing. Recently, many searchable encryption schemes have been proposed to allow users to execute keyword-based search over encrypted data. However, it is different for users to exactly find all the interested files from the huge amounts of data by relying solely on keyword-based search. In information retrieval domain, full-text retrieval is an efficient information retrieval technology that allows efficient searches over massive amount of web data. Unfortunately, when applied in the cloud paradigm, full-text retrieval over encrypted cloud data have not been well studied. The full-text retrieval service requires extracting all the words in the contents of documents. The huge scale of index words cannot be efficiently supported by the existing searchable encryption schemes. Moreover, to protect user's privacy, a privacy-preserved full-text retrieval index is required. These problems make efficient full-text retrieval over a large amount of encrypted cloud data a very challenging task. In this paper, we first establish a set of strict privacy requirements for full-text retrieval in cloud storage systems. To address the challenging problem, we design a Bloom filter based tree index. Our scheme fine-tunes the similarity between the query and encrypted documents by proposing the membership entropies of index words. Our scheme is provably secure through our security analysis. We demonstrate the effectiveness and efficiency of the proposed scheme through extensive experimental evaluation. The experimental results manifest the search operation can be done in 60 milliseconds using an off-the-shelf moderate PC.