Distributed graph simulation

作者:Wenfei, Fan; Xin, Wang; Yinghui, Wu; Dong, Deng
来源:Proceedings of the VLDB Endowment, 2014, 7(12): 1083-1094.
DOI:10.14778/2732977.2732983

摘要

<jats:p> This paper studies fundamental problems for distributed graph simulation. Given a pattern query <jats:italic>Q</jats:italic> and a graph <jats:italic>G</jats:italic> that is fragmented and distributed, a graph simulation algorithm <jats:italic>A</jats:italic> is to compute the matches <jats:italic>Q</jats:italic> ( <jats:italic>G</jats:italic> ) of <jats:italic>Q</jats:italic> in <jats:italic>G</jats:italic> . We say that <jats:italic>A</jats:italic> is <jats:italic>parallel scalable</jats:italic> in (a) <jats:italic>response time</jats:italic> if its parallel computational cost is determined by the largest fragment <jats:italic>F</jats:italic> <jats:sub> <jats:italic>m</jats:italic> </jats:sub> of <jats:italic>G</jats:italic> and the size | <jats:italic>Q</jats:italic> | of query <jats:italic>Q</jats:italic> , and (b) <jats:italic>data shipment</jats:italic> if its total amount of data shipped is determined by | <jats:italic>Q</jats:italic> | and the number of fragments of <jats:italic>G, independent</jats:italic> of the size of graph <jats:italic>G</jats:italic> . (1) We prove an <jats:italic>impossibility theorem</jats:italic> : there exists <jats:italic>no</jats:italic> distributed graph simulation algorithm that is parallel scalable in <jats:italic>either</jats:italic> response time <jats:italic>or</jats:italic> data shipment. (2) However, we show that distributed graph simulation is <jats:italic>partition bounded, i.e.</jats:italic> , its response time depends only on | <jats:italic>Q</jats:italic> |, | <jats:italic>F</jats:italic> <jats:sub> <jats:italic>m</jats:italic> </jats:sub> | and the number | <jats:italic>V</jats:italic> <jats:sub> <jats:italic>f</jats:italic> </jats:sub> | of nodes in <jats:italic>G</jats:italic> with edges across different fragments; and its data shipment depends on | <jats:italic>Q</jats:italic> | and the number | <jats:italic>E</jats:italic> <jats:sub> <jats:italic>f</jats:italic> </jats:sub> | of crossing edges only. We provide the first algorithms with these performance guarantees. (3) We also identify special cases of patterns and graphs when parallel scalability is possible. (4) We experimentally verify the scalability and efficiency of our algorithms. </jats:p>