A model selection approach for multiple sequence segmentation and dimensionality reduction

作者:Castro Bruno M*; Lemes Renan B; Cesar Jonatas; Hunemeier Tabita; Leonardi Florencia
来源:Journal of Multivariate Analysis, 2018, 167: 319-330.
DOI:10.1016/j.jmva.2018.05.006

摘要

In this paper we consider the problem of segmenting n aligned random sequences of equal length m into a finite number of independent blocks. We propose a penalized maximum likelihood criterion to infer simultaneously the number of points of independence as well as the position of each point. We show how to compute exactly the estimator by means of a dynamic programming algorithm with time complexity O(m(2)n). We also propose another method, called hierarchical algorithm, that provides an approximation to the estimator when the sample size increases and runs in time O{m In(m)n}. Our main theoretical results are the strong consistency of both estimators when the sample size n grows to infinity. We illustrate the convergence of these algorithms through some simulation examples and we apply the method to identify recombination hotspots in real SNPs data.

  • 出版日期2018-9