摘要

Despite the fact that the formation of the CPU multi-core architecture is the physical basis of parallel computation including cloud computation, the early fine-grained parallel algorithms cannot work well enough as expected in the new architecture. Based on the directed acyclic graph, the supernodes are split into blocks. The factorization operation is split into many asynchronously executed small tasks to reduce the influence of bus bandwidth with full use of the floating-point operation capability of all the cores. Numerical simulations on five systems show that the proposed method is very promising for large-scale application because of its significant speedup.

全文