Abstract
The non-coding elements of a genome, with many of them considered as junk earlier, have now started gaining long due respectability, with microRNAs as the best current example. MicroRNAs bind preferentially to the 3′ untranslated regions (UTRs) of the target genes and negatively regulate their expression most of the time. Several microRNA:target prediction softwares have been developed based upon various assumptions and the majority of them consider the free energy of binding of a target to its microRNA and seed conservation. However, the average concordance between the predictions made by these softwares is limited and compounded by a large number of false-positive results. In this study, we describe a methodology developed by us to refine microRNA:target prediction by target prediction softwares through observations made from a comprehensive study. We incorporated the information obtained from dinucleotide content variation patterns recorded for flanking regions around the target sites using support vector machines (SVMs) trained over two different major sources of experimental data, besides other sources. We assessed the performance of our methodology with rigorous tests over four different dataset models and also compared it with a recently published refinement tool, MirTif. Our methodology attained a higher average accuracy of 0.88, average sensitivity and specificity of 0.81 and 0.94, respectively, and areas under the curves (AUCs) for all the four models scored above 0.9, suggesting better performance by our methodology and a possible role of flanking regions in microRNA targeting control. We used our methodology over genes of three different pathways — toll-like receptor (TLR), apoptosis and insulin — to finally predict the most probable targets. We also investigated their possible regulatory associations, and identified a hsa-miR-23a regulatory module.
Similar content being viewed by others
Abbreviations
- Ac:
-
accuracy
- AUC:
-
area under the curve
- FN:
-
false negative
- FP:
-
false positive
- MCC:
-
Matthew correlation coefficient
- ROC:
-
receiver operating characteristic
- Sn:
-
sensitivity
- Sp:
-
specificity
- SVM:
-
support vector machine
- TFBS:
-
transcription factor-binding site
- TLR:
-
toll-like receptor
- TN:
-
true negative
- TP:
-
true positive
- UTR:
-
untranslated region
- VDR:
-
vitamin D receptor
References
Akbani R, Kwek S and Japkowicz N 2004 Applying support vector machines to imbalanced datasets; in Proceedings of the 15th ECML (Italy: Springer)
Ambros V, Bartel B, Bartel D P, Burge C B, Carrington J C, Chen X, Dreyfuss G, Eddy S R et al. 2003 A uniform system for microRNA annotation; RNA 9 277–279
Andronescu M, Zhang Z C and Condon A 2005 Secondary structure prediction of interacting RNA molecules; J. Mol. Biol. 4 987–1001
Brennecke J, Stark A, Russell RB and Cohen S M 2005 Principles of microRNA-target recognition; PLoS Biol. 3 e85
Chang C and Lin C 2001 LIBSVM: a library for support vector machines http://www.csie.ntu.edu.tw/~cjlin/libsvm
Cheng A M, Byrom M W, Shelton J and Ford L P 2005 Antisense inhibition of human miRNAs and indications for an involvement of miRNA in cell growth and apoptosis; Nucleic Acids Res. 33 1290–1297
Cullen B R 2004 Transcription and processing of human microRNA precursors; Mol. Cell 16 861–865
Didiano D and Hobert O 2008 Molecular architecture of a miRNA-regulated 3′ UTR; RNA 14 1297–1317
Doench J G and Sharp P A 2004 Specificity of microRNA target selection in translational repression; Genes Dev. 18 504–511
Drucker H, Burges C, Kaufman L, Smola A and Vapnik V 1997 Support vector regression machines; Adv. Neural Inf. Processing Syst. 9 155–161
Gardner P P and Giegerich R 2004 A comprehensive comparison of comparative RNA structure prediction approaches; BMC Bioinformatics 5 140
Griffiths-Jones S 2004 The microRNA registry; Nucleic Acids Res. 32 D109–D111
Griffiths-Jones S, Grocock R J, Van D S, Bateman A and Enright A J 2006 miRBase: microRNA sequences, targets and gene nomenclature; Nucleic Acids Res. 34 D140–D144
Griffiths-Jones S, Saini H K, Dongen S and Enright A J 2008 miRBase: tools for microRNA genomics; Nucleic Acids Res. 36 D154–D158
Grimson A, Farh K K, Johnston W K, Garrett-Engele P, Lim L P and Bartel D P 2007 MicroRNA targeting specificity in mammals: determinants beyond seed pairing; Mol. Cell 6 91–105
Hammell M, Long D, Zhang L, Lee A, Carmack C S, Han M, Ding Y and Ambros V 2008 mirWIP: microRNA target prediction based on microRNA-containing ribonucleoprotein-enriched transcripts; Nat. Methods 5 813–819
He M L, Chen Y, Peng Y, Jin D, Du D, Wu J, Lu P and Lin M C 2002 Induction of apoptosis and inhibition of cell growth by developmental regulator hTBX5; Biochem. Biophys. Res. Commun. 297 185–192
Hobert O 2004 Common logic of transcription factor and microRNA action; Trends Biochem. Sci. 29 462–468
Höchsmann M, Toller T, Giegerich R and Kurtz S 2003 Local similarity in RNA secondary structures; Proceedings of the IEEE Bioinformatics Conference CSB-2003 (California, USA: Stanford) pp 159–168
Hofacker I L and Stadler P F 2006 Memory efficient folding algorithms for circular RNA secondary structures; Bioinformatics 22 1172–1176
Hubbard T, Barker D, Birney E, Cameron G, Chen Y, Clark L, Cox T and Cuff J 2002 The Ensembl genome database project; Nucleic Acids Res. 30 38–41
Kanehisa M and Goto S 2000 KEGG: Kyoto encyclopedia of genes and genomes; Nucleic Acids Res. 28 27–30
Kel A E, Gössling E, Reuter I, Cheremushkin E, Kel-Margoulis O V and Wingender E 2003 MATCH: a tool for searching transcription factor binding sites in DNA sequences; Nucleic Acids Res. 31 3576–3579
Kent W J, Hsu F, Karolchik D, Kuhn R M, Clawson H, Trumbower H and Haussler D 2005 Exploring relationships and mining data with the UCSC gene sorter; Genome Res. 15 737–741
Kertesz M, Iovino N, Unnerstall U, Gaul U and Segal E 2007 The role of site accessibility in microRNA target recognition; Nat. Genet. 39 1278–1284
Kim S K, Nam J W, Rhee J K, Lee W J and Zhang B T 2006 miTarget: microRNA target gene prediction using a support vector machine; BMC Bioinformatics 7 411
Kiriakidou M, Nelson PT, Kouranov A, Fitziev P, Bouyioukos C, Mourelatos Z and Hatzigeorgiou A 2004 A combined computational-experimental approach predicts human microRNA targets; Genes Dev. 18 1165–1178
Korfali N, Ruchaud S, Loegering D, Bernard D, Dingwall C, Kaufmann S H and Earnshaw W C 2004 Caspase-7 gene disruption reveals an involvement of the enzyme during the early stages of apoptosis; J. Biol. Chem. 279 1030–1039
Lai E C 2002 Micro RNAs are complementary to 3′ UTR sequence motifs that mediate negative post-transcriptional regulation; Nat. Genet. 30 363–374
Lai E C 2004 Predicting and validating microRNA targets; Genome Biol. 5 115
Lee R C, Feinbaum R L and Ambros V 1993 The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14; Cell 75 843–854
Lewis B P, Burge C B and Bartel D P 2005 Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets; Cell 120 15–20
Lewis B P, Shih I H, Jones-Rhoades M W, Bartel D P and Burge C B 2003 Prediction of mammalian microRNA targets; Cell 115 787–798
Lim L P, Lau N C, Garrett-Engele P, Grimson A, Schelter J M, Castle J, Bartel D P and Linsley P S et al. 2005 Microarray analysis shows that some microRNAs downregulate large numbers of target mRNAs; Nature (London) 433 769–773
Long D, Lee R, Williams P, Chan C Y, Ambros V and Ding Y 2007 Potent effect of target structure on microRNA function; Nat. Struct. Mol. Biol. 14 287–294
Miller A A and Waterhouse P 2005 Plant and animal microRNAs: similarities and differences; Funct. Integr. Genomics 5 129–135
Rehmsmeier M, Steffen P, Hochsmann M and Giegerich R 2004 Fast and effective prediction of microRNA/target duplexes; RNA 10 1507–1517
Robins H, Li Y and Padgett R 2005 Incorporating structure to predict microRNA targets; Proc. Natl. Acad. Sci. USA 102 4006–4009
Sethupathy P, Corda B and Hatzigeorgiou A G 2006 TarBase: a comprehensive database of experimentally supported animal microRNA targets; RNA 12 192–197
Shankar R, Chaurasia A, Ghosh B, Chekmenev D, Cheremushkin E, Kel A and Mukerji M 2007 Non-random genomic divergence in repetitive sequences of human and chimpanzee in genes of different functional categories; Mol. Genet. Genomics 277 441–455
Song Z, Krishna S, Thanos D, Strominger J L and Ono S J 1994 A novel cysteine-rich sequence-specific DNA-binding protein interacts with the conserved X-box motif of the human major histocompatibility complex class II genes via a repeated Cys-His domain and functions as a transcriptional repressor; J. Exp. Med. 180 1763–1774
Thadani R and Tammi M T 2006 MicroTar: predicting microRNA targets from RNA duplexes; BMC Bioinformatics 18 7
Thompson W, Rouchka E C and Lawrence C E 2003 Gibbs recursive sampler: finding transcription factor binding sites; Nucleic Acids Res. 31 3580–3585
Thompson W, Palumbo M J, Wasserman W W, Liu J S and Lawrence C E 2004 Decoding human regulatory circuits; Genome Res. 14 1967–1974
Umeda M, Nishitani H and Nishimoto T 2003 A novel nuclear protein, Twa1, and Muskelin comprise a complex with RanBPM; Gene 303 47–54
Wang X and El Naqa I M 2008 Prediction of both conserved and nonconserved microRNA targets in animals; Bioinformatics 24 325–332
Will S, Reiche K, Hofacker I L, Stadler P F and Backofen R 2007 Inferring noncoding RNA families and classes by means of genome-scale structure-based clustering; PLoS Comput. Biol. 3 4
Wingender E, Chen X, Hehl R, Karas H, Liebich I, Matys V, Meinhardt T, Prüss M et al. 2000 TRANSFAC: an integrated system for gene expression regulation; Nucleic Acids Res. 28 316–319
Xiao F, Zuo Z, Cai G, Kang S, Gao X and Li T 2009 miRecords: an integrated resource for microRNA-target interactions; Nucleic Acids Res. 37 D105–D110
Yang Y, Wang Y and Li K 2008 MirTif: a support vector machine-based microRNA target interaction filter; BMC Bioinformatics 9 S4
Zhang D, Yoon H G and Wong J 2005 JMJD2A is a novel N-CoRinteracting protein and is involved in repression of the human transcription factor achaete scute-like homologue 2 (ASCL2/Hash2); Mol. Cell. Biol. 25 6404–6414
Author information
Authors and Affiliations
Corresponding author
Additional information
Contributed equally
Electronic supplementary material
Rights and permissions
About this article
Cite this article
Heikham, R., Shankar, R. Flanking region sequence information to refine microRNA target predictions. J Biosci 35, 105–118 (2010). https://doi.org/10.1007/s12038-010-0013-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12038-010-0013-7