摘要

We extend the double cut and join operation (DCJ) paradigm to perform genome rearrangements on pairs of genomes having unequal gene content and/or multiple copies by permitting genes in one genome which are completely or partially unmatched in the other. The existence of unmatched gene ends introduces new kinds of paths in the adjacency graph, since some paths can now terminate internal to a chromosome and not on telomeres. We introduce "ghost adjacencies'' to supply the missing gene ends in the genome not containing them. Ghosts enable us to close paths that were due to incomplete matching, just as null points enable us to close even paths terminating in telomeres. We define generalized DCJ operations on the generalized adjacency graph, and give a prescription for calculating the DCJ distance for the expanded repertoire of operations, which includes insertions, deletions, and duplications. For the case of insertions and deletions, with linear as well as circular chromosomes, we suggest permitting a "nugh'' (half ghost, half null), which can shorten the distance. We give algorithms for the optimal closure, with and without nughs, and give the resulting distance formula in terms of paths. For certain simplest cases, we calculate the number of optimal ways to close the graph.