摘要

Retrieving relevant data for users in online social network (OSN) systems is a challenging problem. Cassandra, a storage systemused by popular OSN systems, such as Facebook and Twitter, relies on a DHT-based scheme to randomly partition the personal data of users among servers across multiple data centers. Although DHT is highly scalable for hosting a large number of users (personal data), it leads to costly inter-server communications across data centers due to the complex interconnection and interaction among OSN users. In this paper, we explore how to retrieve the OSN content in a cost-effective way by retaining the simple and robust nature of OSNs. Our approach exploits a simple, yet powerful principle called Community-Based Locality (CBL), which posits that if a user has a one-hop neighbor within a particular community, it is very likely that the user has other one-hop neighbors inside the same community. We demonstrate the existence of community-based locality in diverse traces of popular OSN systems such as Facebook, Orkut, Flickr, Youtube, and Livejournal. Based on the observation, we design a CBL-based algorithm to build the content index in OSNsystems. By partitioning and indexing the relevant data of users within a community on the sameserver in the data center, the CBL-based index avoids a significant amount of inter-server communications during searching, making retrieving relevant data for a user in large-scale OSNs efficient. In addition, by using CBL-based scheme we can provide much faster search response and balanced loads. We conduct comprehensive trace-driven simulations to evaluate the performance of the proposed scheme. Results show that ourscheme significantly reduces the network traffic by 73 percent while reduces the query latency by 35 percent compared with existing schemes.