摘要

Optimized algorithms are indispensable for analyzing large SNP data sets. To date, research has focused on the development of methods for calculating genomic relationship matrices. However, little attention has been given to algorithms for calculating the number of opposing homozygous SNP loci (OH) between genotyped individuals, where this parameter can be used to detect pedigree errors, genotyping errors, mixing of DNA samples, or for paternity tests. A recently proposed approach (LOOP) is sufficient for small data sets but not applicable to larger data sets in terms of number of SNPs and genotyped individuals. In this paper we propose a fast method for the calculation of OH in matrix format (OHM). This method is very fast and easy to implement. For example, it can create the OHM for 12,000 individuals genotyped for 40,000 SNPs with only 12% of the real time used by the LOOP approach. Thus, calculation of OHM from a sequence of matrix manipulations substantially increased the speed for determining the number of opposing homozygous SNP loci between all genotyped individuals of a data set. Given the availability of packages facilitating parallel processing this holds even when using R, and therefore allows inference from OHM even for large data sets.

  • 出版日期2014-8