Background: Because milk and milk products play a vital role in human nutrition, dairy cattle farmers are working in increasing milk production or changing its composition. For this reason, researching the genes which play an important role in milk production and its composition is of high value. Information theory is an interdisciplinary branch of mathematics which overlaps with communications engineering, biology, and medicine. It has been used in genetic and bioinformatics analyses such as the biological structures and sequences.
Materials and methods: In this study, a total of 20 microRNAs from those affecting the breast tissue and mammary glands have been extracted from the microRNA database. For each microRNA sequence, the entropy values of the first- to third-order were calculated and the Kullback-Leibler divergence criteria were estimated. Then, the Kullback-Leibler divergence matrix of the microRNAs was considered as the inputs for clustering methods. All calculations were performed in the R program. The biological pathway of each target was predicted using the KEGG server.
Results: MicroRNAs are divided into two main groups based upon comparing and analyzing all the created clusters. The first group contains 18 microRNA and the second group contains 2 microRNAs at the first- and third-order entropies. The second-order entropy contains 19 microRNA in the first group and only 1 microRNA in the second group. The clustering topology changes as the entropy order changes from 1 to 3, with the most significant changes being seen in the clustering resulted from the third-order entropy.
Conclusion: In the proposed method of clustering, we obtained a biological grouping of genes. There is a good concordance between most of the microRNAs within one cluster and their biological pathway. The algorithm is applicable for clustering a range of genes and even genomes based on their DNA sequences entropy. Our method can help assign and predict the biological activity of those genes that lack robust annotations because it relies only on the DNA sequence and length of the genes.