摘要

In Internet service fault management based on active probing, uncertainty and noises will affect service fault management. In order to reduce the impact, challenges of Internet service fault management are analyzed in this paper. Bipartite Bayesian network is chosen to model the dependency relationship between faults and probes, binary symmetric channel is chosen to model noises, and a service fault management approach using active probing is proposed for such an environment. This approach is composed of two phases: fault detection and fault diagnosis. In first phase, we propose a greedy approximation probe selection algorithm (GAPSA), which selects a minimal set of probes while remaining a high probability of fault detection. In second phase, we propose a fault diagnosis probe selection algorithm (FDPSA), which selects probes to obtain more system information based on the symptoms observed in previous phase. To deal with dynamic fault set caused by fault recovery mechanism, we propose a hypothesis inference algorithm based on fault persistent time statistic (FPTS). Simulation results prove the validity and efficiency of our approach.