An empirical study on implementing highly reliable stream computing systems with private cloud

Liu Yaxiao<sup>*</sup>; Liu Weidong; Song Jiaxing; He Huan

doi:10.1016/j.adhoc.2015.07.009

摘要

Stream computing systems are designed for high frequency data. Such systems can deal with billions of transactions per day in real cases. Cloud technology can support distributed stream computing systems by its elastic and fault tolerant capabilities. In a real deployment environment, such as the pre-treatment system in Chinese top banks, the reliability based on user experience is key metrics for performance. Although many significant works have been proposed in the literature, they have some limitations such as less of architectural focus or difficult to implement in complex projects. This paper describes the reliability issue which is caused by the service downgrade in cloud. We use novel reliability analysis techniques, queuing theory, and software rejuvenation management techniques to build a framework for supporting stream data with low latency and fault tolerance. A real streaming system from a top bank is studied to provide the supporting data. Operational parameters such as rejuvenation window and time-out parameter are identified as key parameters for the design of a distributed stream processing system. An algorithm for reliability optimization, monitoring and forecast is also introduced. The paper also compares the improved result with original issues, which saved millions of money and reputations.

出版日期2015-12
单位清华大学

全文

访问全文

收藏分享被引(10) 浏览

更新时间：2021-04-13 18:59

An empirical study on implementing highly reliable stream computing systems with private cloud

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友