摘要

Providing fault tolerance support to client-to-server applications is critical in the data center and cloud computing environments. Virtualization provides a direct way of achieving high availability by encapsulating the protected applications into the virtual machine and by periodically checkpointing the entire virtual machine (VM) state to the backup replication. However, existing VM replication solutions suffer from either excessive checkpointing overhead and network latency or unnecessary CPU resources consumption in backup replication. In this study, we exploit the ingredients of output packets and consider that the replication system maintains external consistency if the pre-released packets originate the already synchronized states. Furthermore, we transform the active-active primary and slave VM combination into an active-semiactive one by shrinking the number of active virtual CPUs (vCPUs) in the slave VM. The former optimization mechanism improves the performance in read-mostly client-to-server networked applications, whereas the latter one relieves the problem of double scheduling in the slave host. Therefore, we proposed the COLO++ system which is built over COLO and is a non-stop service solution with coarse-grained lock-stepping VMs for client-to-server systems. The two plus signs represent two of the optimizations. Experimental results using COLO++ implemented on KVM and Linux depict that it achieves nearly native VM performance under read-mostly workloads, as well as lower scheduling overhead in backup replication.

全文