Abstract
Traditionally, studies in learning theory tend to concentrate on situations where potentially ever increasing number of training examples is available. However, there are situations where only extremely small samples can be used in order to perform an inference. In such situations it is of utmost importance to theoretically analyze what and under what circumstances can be learned. One such scenario is detection of differentially expressed genes. In our previous study (BMC Bioinformatics, 2009) we theoretically analyzed one of the most popular techniques for identifying genes with statistically different expression in SAGE libraries - the Audic-Claverie statistic (Genome Research, 1997). When comparing two libraries in the Audic-Claverie framework, it is assumed that under the null hypothesis their tag counts come from the same underlying (unknown) Poisson distribution. Since each SAGE library represents a single measurement, the inference has to be performed on the smallest sample possible - sample of size 1. In this contribution we compare the Audic-Claverie approach with a (regularized) maximum likelihood (ML) framework. We analytically approximate the expected K-L divergence from the true unknown Poisson distribution to the model and show that while the expected K-L divergence to the ML-estimated models seems to be always larger than that of the Audic-Claverie statistic, the most divergence appears for true Poisson distributions with small mean parameter. We also theoretically analyze the effect of regularization of ML estimates in the case of zero observed counts. Our results constitute a rigorous analysis of a situation of great practical importance where the benefits of Bayesian approach can be clearly demonstrated in a quantitative and principled manner.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Varuzza, L., Gruber, A., Pereira, C.A.B.: Significance tests for comparing digital gene expression profiles. Nature Precedings npre.2008.2002.3 (2008)
Velculescu, V., Zhang, L., Vogelstein, B., Kinzler, K.: Serial analysis of gene expression. Science 270, 484–487 (1995)
Audic, S., Claverie, J.: The significance of digital expression profiles. Genome Res. 7, 986–995 (1997)
Ruijter, J., Kampen, A.V., Baas, F.: Statistical evaluation of SAGE libraries: consequences for experimental design. Physiol. Genomics 11, 37–44 (2002)
Ge, N., Epstein, C.: An empirical Bayesian significance test of cDNA library data. Journal of Computational Biology 11, 1175–1188 (2004)
Stekel, D., Git, Y., Falciani, F.: The comparison of gene expressiom from multiple cDNA libraries. Genome Research 10, 2055–2061 (2000)
Medina, C., Rotter, B., Horres, R., Udupa, S., Besser, B., Bellarmino, L., Baum, M., Matsumura, H., Terauchi, R., Kahl, G., Winter, P.: SuperSAGE: the drought stress-responsive transcriptome of chickpea roots. BMC Genomics 9, 553 (2008)
Kim, H., Baek, K., Lee, S., Kim, J., Lee, B., Cho, H., Kim, W., Choi, D., Hur, C.: Pepper EST database: comprehensive in silico tool for analyzing the chili pepper (Capsicum annuum) transcriptome. BMC Plant Biology 8, 101–108 (2008)
Morin, R., OConnor, M., Griffith, M., Kuchenbauer, F., Delaney, A., Prabhu, A., Zhao, Y., McDonald, H., Zeng, T., Hirst, M., Eaves, C., Marra, M.: Application of massively parallel sequencing to microRNA profiling and discovery in human embryonic stem cells. Genome Research 18, 610–621 (2008)
Cervigni, G., Paniego, N., Pessino, S., Selva, J., Diaz, M., Spangenberg, G., Echenique, V.: Gene expression in diplosporous and sexual Eragrostis curvula genotypes with differing ploidy levels. BMC Plant Biology 67, 11–23 (2008)
Miles, J., Blomberg, A., Krisher, R., Everts, R., Sonstegard, T., Tassell, C.V., Zeulke, K.: Comparative transcriptome analysis of in vivoand in vitro-produced porcine blastocysts by small amplified RNA-serial analysis of gene expression (SAR-SAGE). Molecular Reproduction and Development 75, 976–988 (2008)
Tiňo, P.: Basic properties and information theory of audic-claverie statistic for analyzing cDNA arrays. BMC Bioinformatics 10, 308–310 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tiňo, P. (2011). One-Shot Learning of Poisson Distributions in Serial Analysis of Gene Expression. In: Liu, D., Zhang, H., Polycarpou, M., Alippi, C., He, H. (eds) Advances in Neural Networks – ISNN 2011. ISNN 2011. Lecture Notes in Computer Science, vol 6676. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21090-7_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-21090-7_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21089-1
Online ISBN: 978-3-642-21090-7
eBook Packages: Computer ScienceComputer Science (R0)