Abstract
We study the nucleotide-nucleotide mutual information function of the DNA sequences of the three completely sequenced human chromosomes 20, 21, and 22. We find in each human chromosome (i) the absence of the base pair (bp) sequence periodicity characteristic for protein coding regions, (ii) the absence of the sequence periodicity characteristic for both protein secondary structure and DNA bendability, and (iii) the presence of significant statistical dependencies at about and at about We investigate to which degree the density and composition of interspersed repeats might explain these observed statistical patterns in all three human chromosomes. We use simple stochastic models to substitute known interspersed repeats and find by numerical studies that (iv) the presence of interspersed repeats dominates short-range correlations as measured by on the scale of several hundred base pairs in human chromosomes 20, 21, and 22. On the other hand, we find that (v) interspersed repeats contribute only weakly to long-range correlations due to the clustering of highly abundant Alu repeats.
- Received 22 January 2003
DOI:https://doi.org/10.1103/PhysRevE.67.061913
©2003 American Physical Society