ANALYSIS OF THE CRYPTOSPORIDIUM SPP GP60 GENE VARIABILITY APPLYING INFORMATION THEORY

D. POPESCU*#, IONELA MIRELA NEAGOE**,***, SUZANA E. CILIEVICI***,****, DIANA R. CONSTANTIN*****, V.I.R. NICULESCU******

*Department of Mathematical Modelling in Life Sciences, “Gheorghe Mihoc-Caius Iacob” Institute of Mathematical Statistics and Applied Mathematics of the Romanian Academy, 13, Calea 13 septembrie, Bucharest-050711, Romania

**Parasitology & Micology Laboratory, ”Cantacuzino” National Medico-Military Institute for Research and Development, 103, Splaiul Independenței, Bucharest-050096, Romania

***“Carol Davila” University of Medicine and Pharmacy, Parasitology Chair, 19−21, Dimitrie Gerota street, Bucharest-020027, Romania
****Parasitic Diseases Clinic, Colentina Universitary Hospital, 19−21, Şoseaua Ştefan cel Mare, Bucharest, Romania
*****Astronomical Institute of the Romanian Academy, 5, Cuțitul de Argint street, Bucharest- 040557, Romania
******Plasma and Radiation Physics, National Institute for Lasers, 409, Atomiștilor Street, Măgurele, Ilfov, Romania

In this paper we used statistical methods to understand the genetic information of DNA considered as a statistical system. The alphabet of a DNA sequence is defined by the four nucleotides: adenine, cytosine, guanine, and thymine. The order of nucleotides along the DNA sequences encodes the genetic information. We have analyzed three Cryptosporidium DNA sequences: one DNA sequence isolated and analyzed in our laboratory and two DNA reference sequences from the public database GenBank. Each DNA sequence is considered as a statistical system and is represented by a random variable and an associate probability distribution. The Shannon entropy, Renyi entropy, Onicescu informational energy and square deviation from uniform distribution are used in order to measure the degree of randomness for the three statistical systems. The similarity and difference between the three DNA sequences of the two Cryptosporidium species (Cryptosporidium hominis and Cryptosporidium parvum) were assessed by calculating the statistical distance between the probability distributions associated with each pair of DNA sequences. Each of the three DNA sequences pairs with one of the other two sequences and forms three pairs of sequences. Using the associated probability distributions, the statistical distance between them can be calculated. Bhattacharyya distance measures similarity degree between the two probability distributions. The Kullback-Leiber and the resistor-average distances measure the difference between the two distributions.

Key words: Statistical methods, DNA sequences, Cryptosporidium, gp60 gene.

Corresponding author’s e-mail: dghpopescu@gmail.com

 

Full text: PDF