[AD]INFORMATION DECOMPOSITION OF SYMBOLICAL SEQUENCES

Data Bases of DNA and Protein Sequences with Latent Periodicity

 

Bioinformatic group of the Center of Bioengineering of the Russian Academy of Sciences

 

 

Bioinformatic Education in Moscow Physical Engineering Institute

 

We developed the method of Information Decomposition (ID) of a content of any symbolical sequence. ID method does not change the statistical properties of symbolical sequence and calculates the information autocorrelation function. The method is based on the calculation of Shannon mutual information between analyzed and artificial symbolical sequences, and permits to reveal rather latent periodicity in any symbolical sequence that can not be found by all developed before mathematical methods. Using this method we analyzed the GENBANK, full sequenced genomes and SWISS-PROT data banks using the supercomputer clusters and found thousands of gene and proteins with different types of latent periodicity.

We show the stability of ID method in the case of lot of random letter changes in analysed symbolic sequence. We demonstrate the efficiency of the method, analyzing both poems, and DNA and protein sequences. In poems of A. Puskin and W. Shakespeare we found a latent periodicity of different lengths that can be reflections a periodicity of poem sounds. In DNA and protein sequences we show the existence the lot of DNA and amino acid sequences with different types and lengths of the latent periodicity. We found the latent periodicity of 93% of tyrosine and serine protein kinases and the latent periodicity of 86% of proteins that contain NAD+ site in Swiss-prot data bank.

 

For more information about our developed methods of the DNA similarity search please see:

 

 

1.        M B Chaley, E V Korotkov, D A Phoenix Relationships among isoacceptor tRNAs seems to support the coevolution theory of the origin of the genetic code. Journal of molecular evolution. 03/1999; 48(2):168-77

2.        E.V.Korotkov "New family wide spread mirror-reflected MB1 repeats in human genome", Molec.Biol. (USSR), V.25, P.250-263, 1991

  1. E.V.Korotkov "MB1 family repeats in genomes of many mammals", Izvestia of Akad. Sci. of USA, Seria Biology, No.4, 546-557, 1992.
  2. Korotkov E.V. "Fast method of homology and purine pyrimidine mutual relations between DNA sequences search" DNA Sequence V.4, 413-415, 1994
  3. Korotkov E.V., Korotkova M.A."Enlarged similarity of nucleic acids sequences", DNA Research, v.3, N.3, 157-164, 1996
  4. Korotkov E.V. and Korotkova M.A. “MIRs: family repeats that is common for many vertebrates”. Mol.Biol. v.34.,348-353, 2000.
  5. Korotkov E.V. and Korotkova M.A. “Study of the presence MIRs in the human 22 chromosome”. Mol.Biol. , v.34, 376-382, 2001
  6. Chaley M.B., Korotkov E.V. “Evolution of MIR elements located in the coding regions of human genome”, Mol. Biol.  v.35, 874-882, 2001.
  7. Data banks
  8. Chaley M.B., Frenkel F.E., Korotkov E.V. Skryabin K.G. Revealing and Functional Analysis of tRNA-like Sequences in Various Genomes. Gene, 335, 57-71, 2004

 

 

 

More details for mathematical method of Informational Decomposition (ID) and data bases you could receive from publications:

1.        http://bioinf.narod.ru\Pub/korotkov.pdf

2.        Korotkov E.V. and Korotkova M.A. "DNA regions with latent periodicity in some human clones", DNA Sequence, V.5, pp.353-358, 1995.

3.        Korotkov E.V., Korotkova M.A., Tulko J.S. " Latent sequence perio- dicity of some oncogenes and DNA-binding protein genes", CABIOS,v.13, pp.37-44, 1997

4.        Korotkova M.A., Korotkov E.V. and RudenkoV.M. "Latent periodicity of protein sequences", Journal of Molecular Modelling, v.5, pp.103-115, 1999

5.        Korotkov E.V., Korotkova M.A., Rudenko V.M. and Skryabin K.G., "Latent periodicity of the protein sequences" Molecularnya Biologya (Russian), v.33, pp.611-617, 1999.

6.      Chaley M.B., Korotkov E.V. and Skryabin K.G. "Method reavealing latent periodicity of the nucleotide sequences modified for a case of small samples" DNA Research, 6, 153-163, 1999.

7.        Chaley MB, Korotkov EV, Kudryashov NA “Latent periodicity of 21 bases typical for MCP II gene is widely present in various genes” DNA Sequence, 14, 33-52,2003

8.        Korotkov EV, Korotkova MA, Kudryashov NA Information decomposition of symbolical texts. Los-Alamos Arxiv,  0302195

9.        Korotkov EV, Korotkova MA, Kudryshov NA Information decomposition method for analysis of symbolical sequences. Physical Letters A, v.312, 198-210, 2003.

10.     Korotkov EV, Korotkova MA, Kudryashov NA «Information approach for determination of periodicity of genetic texts» Molec. Biology (Russian) v.37, N3, pp.436-451, 2003,

11.      Laskin AA, Chaley MB, Korotkov EV and Kudryashov NA Identification of NAD+ regions in the amino acid sequences of different proteins” Molec. Biology (Russian) v.37, N4, pp.663-673, 2003.

12.      A.A. Laskin, E.V. Korotkov, N.A. Kudryashov  ”Latent periodicity of many domains in rotein sequences reflects their structure, function and evolution”.  pp. 135-144, in “BIOINFORMATICS OF GENOME REGULATION AND STRUCTURE”, N.Kolchanov and  R.Hofestaedt ed’s, Kluwer press, 2004

13.     Korotkov E.V. Enzyme as a thermal resonance pump.

14.      Laskin AA, Kudryashov NA, Skryabin KG, Korotkov EV.  Latent periodicity of serine-threonine and tyrosine protein kinases and other protein families. Comput Biol Chem. 2005 29(3):229-243

15.     Turutina VP, Laskin AA,  Skryabin K.G., Kudryashov N.A. and Korotkov EV,  "Latent periodicity of many protein families", Biochemistry, 2006, 71,18-31.

16.     Shelenkov A, Skryabin K, Korotkov EV.          Search and classification of potential minisatellite sequences from bacterial genomes. DNA Res. v. 13(3):89-102. 2006.

17.     Turutina VP, Laskin AA,  Skryabin K.G., Kudryashov N.A. and Korotkov EV, "Latent periodicity of 94 protein families", J. Compt. Biol. 2006, v.13:946-964.

18.     Laskin AA, Skryabin KG, Korotkov EV Latent periodicity of protein families, identified with the indel-aware algorithm.J Proteome Res. 2007 v.6, 862-868.

19.     Shelenkov A, Korotkov A, Korotkov E. MMsat-a database of potential micro- and minisatellites. 409, 53-60, Gene. 2008.

20.     Shelenkov AA, Skryabin KG, Korotkov EV Search and classification of potential minisatellite sequences from plants genomes, Genetika  (Rus),  v.44, pp.120-136, 2008.

21.      Shelenkov AA, Korotkov EV The search of regular sequences in promoters from different species with help of run test. Mathematical Biology and Bioinformatics, v.3, N.1, pp.1-15, 2008.

22.     Frenkel FE, Korotkov EV. Classification analysis of triplet periodicity in protein-coding regions of genes. Gene. 2008. 15;421(1-2):52-60. 2008.

23.     F. E. Frenkel', E. V. Korotkov   Classification of triplet periodicity of gene sequences form KEGG data bank.   Molekulyarnaya biologiya, v.42, ¹ 4, pp. 707-720. 2008 

24.     Korotkov E.V., Rudenko V.M. “Phase shift of triplet periodicityin gene sequences.  Mathematical Biology and Bioinformatics. (Russian)  2009.v. 4. ¹ 2. pp. 66-80.

25.     Shelenkov A, Korotkov E. Search of regular sequences in promoters from eukaryotic genomes. Comput Biol Chem. 2009;33:196-204

26.     Frenkel FE, Korotkov EV. Using triplet periodicity of nucleotide sequences for finding potential reading frame shifts in genes. DNA Res. 2009;16:105-114.

27.     EV Korotkov, MA Korotkova  “Bioinformatics and  search of shifts of reading frame in genes” Information technologies and computation systems (Russian), ¹1, pp.1-23, 2010.

28.     EV Korotkov, MA Korotkova «Study of the triplet periodicity phase shifts in genes, Journal of Integrative Bioinformatics, v.7,131-141, 2010

29.     Y.M..Suvorova, Korotkov E.V. Splicing of the triplet periodicity in genes from different species. In Proceedings on the 6th International Symposium of Health Informatics and Bioinformatics, Izmir, Turkey, 2-5 May 2011 (http://hibit.iyte.edu.tr), pp.246-250

30.     Rudenko V.M. and Korotkov E.V. “Search of latent periodicity in the financial time series by the cyclic decomposition method”, Applied Informatics (in Russian), ¹ 3, 2011  

31.     Korotkov E.V. Rudenko V.M and Suvoriva Yu.G. “Using of triplet periodicity of DNA sequences for search of spliced genes” in press.

 

 

 

 

 

Contacts: genekorotkov@gmail.com

 

 

[AD]