Abstract
In the recent years, a large number of genomes from a variety of different organisms have been sequenced. Most of the sequence data has been publicly released and can be assessed by interested users. However, this wealth of information is currently underexploited by scientists not directly involved in genome annotation. This is partially because sequencing, assembly, and automated annotation can be done much faster than the identification, classification, and prediction of the intracellular localization of the gene products. This part of the annotation process still largely relies on manual curation and addition of contextual information. Users of genome databases who are unfamiliar with the types of data available from (whole) genomes might therefore find themselves either overwhelmed by the vast amount and multiple layers of data or dissatisfied with less-than-meaningful analyses of the data.
In this chapter we present procedures and approaches to identify and characterize gene models of enzymes involved in metabolic pathways based on their similarity to known sequences. Furthermore we describe how to predict the subcellular location of the proteins using publicly available prediction servers and how to interpret the obtained results. The strategies we describe are generally applicable to organisms with primary plastids such as land plants or green algae. Additionally, we describe strategies suitable for those groups of algae with secondary plastids (for instance diatoms), which are characterized by a different cellular topology and a larger number of intracellular compartments compared to plants.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Human Genome Sequencing Consortium (2004) Finishing the euchromatic sequence of the human genome. Nature 431:931–945
Wood V, Gwilliam R, Rajandream MA, Lyne M, Lyne R, Stewart A, Sgouros J, Peat N, Hayles J, Baker S, Basham D, Bowman S, Brooks K, Brown D, Brown S, Chillingworth T, Churcher C, Collins M, Connor R, Cronin A, Davis P, Feltwell T, Fraser A, Gentles S, Goble A, Hamlin N, Harris D, Hidalgo J, Hodgson G, Holroyd S, Hornsby T, Howarth S, Huckle EJ, Hunt S, Jagels K, James K, Jones L, Jones M, Leather S, McDonald S, McLean J, Mooney P, Moule S, Mungall K, Murphy L, Niblett D, Odell C, Oliver K, O’Neil S, Pearson D, Quail MA, Rabbinowitsch E, Rutherford K, Rutter S, Saunders D, Seeger K, Sharp S, Skelton J, Simmonds M, Squares R, Squares S, Stevens K, Taylor K, Taylor RG, Tivey A, Walsh S, Warren T, Whitehead S, Woodward J, Volckaert G, Aert R, Robben J, Grymonprez B, Weltjens I, Vanstreels E, Rieger M, Schafer M, Muller-Auer S, Gabel C, Fuchs M, Dusterhoft A, Fritzc C, Holzer E, Moestl D, Hilbert H, Borzym K, Langer I, Beck A, Lehrach H, Reinhardt R, Pohl TM, Eger P, Zimmermann W, Wedler H, Wambutt R, Purnelle B, Goffeau A, Cadieu E, Dreano S, Gloux S, Lelaure V, Mottier S, Galibert F, Aves SJ, Xiang Z, Hunt C, Moore K, Hurst SM, Lucas M, Rochet M, Gaillardin C, Tallada VA, Garzon A, Thode G, Daga RR, Cruzado L, Jimenez J, Sanchez M, del Rey F, Benito J, Dominguez A, Revuelta JL, Moreno S, Armstrong J, Forsburg SL, Cerutti L, Lowe T, McCombie WR, Paulsen I, Potashkin J, Shpakovski GV, Ussery D, Barrell BG, Nurse P (2002) The genome sequence of Schizosaccharomyces pombe. Nature 415:871–880
The Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408:796–815
Bohnsack MT, Schleiff E (2010) The evolution of protein targeting and translocation systems. Biochim Biophys Acta 1803:1115–1130
Pelzer-Reith B, Freund S, Schnarrenberger C, Yatsuki H, Hori K (1995) The plastid aldolase gene from Chlamydomonas reinhardtii: intron/exon organization, evolution, and promoter structure. Mol Gen Genet 248:481–486
Gross W, Lenze D, Nowitzki U, Weiske J, Schnarrenberger C (1999) Characterization, cloning, and evolutionary history of the chloroplast and cytosolic class I aldolases of the red alga Galdieria sulphuraria. Gene 230:7–14
Rogers M, Keeling PJ (2004) Lateral transfer and recompartmentalization of Calvin cycle enzymes of plants and algae. J Mol Evol 58:367–375
Kroth PG, Schroers Y, Kilian O (2005) The peculiar distribution of class I and class II aldolases in diatoms and in red algae. Curr Genet 48:389–400
Emanuelsson O, Brunak S, von Heijne G, Nielsen H (2007) Locating proteins in the cell using TargetP, SignalP and related tools. Nat Protoc 2:953–971
Nakai K, Horton P (2007) Computational prediction of subcellular localization. In: van der Giezen M (ed) Protein targeting protocols. Humana Press, Totowa, NJ, pp 429–465
Gruber A, Vugrinec S, Hempel F, Gould SB, Maier UG, Kroth PG (2007) Protein targeting into complex diatom plastids: functional characterisation of a specific targeting motif. Plant Mol Biol 64:519–530
Nassoury N, Cappadocia M, Morse D (2003) Plastid ultrastructure defines the protein import pathway in dinoflagellates. J Cell Sci 116:2867–2874
Kroth PG (2002) Protein transport into secondary plastids and the evolution of primary and secondary plastids. Int Rev Cytol 221:191–255
Gutensohn M, Fan E, Frielingsdorf S, Hanner P, Hou B, Hust B, Klösgen RB (2006) Toc, Tic, Tat et al.: structure and function of protein transport machineries in chloroplasts. J Plant Physiol 163:333–347
Villarejo A, Buren S, Larsson S, Dejardin A, Monne M, Rudhe C, Karlsson J, Jansson S, Lerouge P, Rolland N, von Heijne G, Grebe M, Bako L, Samuelsson G (2005) Evidence for a protein transported through the secretory pathway en route to the higher plant chloroplast. Nat Cell Biol 7:1224–1231
Carrie C, Giraud E, Whelan J (2009) Protein transport in organelles: dual targeting of proteins to mitochondria and chloroplasts. FEBS J 276:1187–1195
Peeters N, Small I (2001) Dual targeting to mitochondria and chloroplasts. Biochim Biophys Acta 1541:54–63
Fiserova J, Goldberg MW (2010) Nucleocytoplasmic transport in yeast: a few roles for many actors. Biochem Soc Trans 38:273–277
Liaud MF, Lichtle C, Apt K, Martin W, Cerff R (2000) Compartment-specific isoforms of TPI and GAPDH are imported into diatom mitochondria as a fusion protein: evidence in favor of a mitochondrial origin of the eukaryotic glycolytic pathway. Mol Biol Evol 17:213–223
Nakayama T, Ishida K-i, Archibald JM (2012) Broad distribution of TPI-GAPDH fusion proteins among eukaryotes: evidence for glycolytic reactions in the mitochondrion? PLoS ONE 7(12):e52340. doi:10.1371/journal.pone.0052340
Hall TA (1999) BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symp Ser 41:95–98
Kanehisa M, Goto S, Furumichi M, Tanabe M, Hirakawa M (2010) KEGG for representation and analysis of molecular networks involving diseases and drugs. Nucleic Acids Res 38:D355–D360
Michal G (1999) Biochemical pathways—an atlas of biochemistry and molecular biology. Wiley, New York
Thiele I, Swainston N, Fleming RMT, Hoppe A, Sahoo S, Aurich MK, Haraldsdottir H, Mo ML, Rolfsson O, Stobbe MD, Thorleifsson SG, Agren R, Bölling C, Bordel S, Chavali AK, Dobson P, Dunn WB, Endler L, Hala D, Hucka M, Hull D, Jameson D, Jamshidi N, Jonsson JJ, Juty N, Keating S, Nookaew I, Le Novère N, Malys N, Mazein A, Papin JA, Price ND, Selkov Sr E, Sigurdsson MI, Simeonidis E, Sonnenschein N, Smallbone K, Sorokin A, van Beek JHGM, Weichart D, Goryanin I, Nielsen J, Westerhoff HV, Kell DB, Mendes P, Palsson BØ (2013) A community-driven global reconstruction of human metabolism, Nature Biotechnology 31, 419–425, doi:10.1038/nbt.2488
Webb EC (1992) Enzyme nomenclature 1992: recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology on the nomenclature and classification of enzymes. International Union of Biochemistry and Molecular Biology, San Diego, CA
Scheer M, Grote A, Chang A, Schomburg I, Munaretto C, Rother M, Sohngen C, Stelzer M, Thiele J, Schomburg D (2011) BRENDA, the enzyme information system. Nucleic Acids Res 39:D670–D676
Harris MA, Deegan JI, Lomax J, Ashburner M, Tweedie S, Carbon S, Lewis S, Mungall C, Day-Richter J, Eilbeck K, Blake JA, Bult C, Diehl AD, Dolan M, Drabkin H, Eppig JT, Hill DP, Ni L, Ringwald M, Balakrishnan R, Binkley G, Cherry JM, Christie KR, Costanzo MC, Dong Q, Engel SR, Fisk DG, Hirschman JE, Hitz BC, Hong EL, Krieger CJ, Miyasato SR, Nash RS, Park J, Skrzypek MS, Weng S, Wong ED, Zhu KK, Botstein D, Dolinski K, Livstone MS, Oughtred R, Berardini T, Li DH, Rhee SY, Apweiler R, Barrell D, Camon E, Dimmer E, Huntley R, Mulder N, Khodiyar VK, Lovering RC, Povey S, Chisholm R, Fey P, Gaudet P, Kibbe W, Kishore R, Schwarz EM, Sternberg P, Van Auken K, Giglio MG, Hannick L, Wortman J, Aslett M, Berriman M, Wood V, Jacob H, Laulederkind S, Petri V, Shimoyama M, Smith J, Twigger S, Jaiswal P, Seigfried T, Howe D, Westerfield M, Collmer C, Torto-Alalibo T, Feltrin E, Valle G, Bromberg S, Burgess S, McCarthy F (2008) The gene ontology project in 2008. Nucleic Acids Res 36:D440–D444
Kelly RJ, Vincent DE, Friedberg I (2010) IPRStats: visualization of the functional potential of an InterProScan run. BMC Bioinformatics 11(Suppl 12):S13
Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Smirnov S, Sverdlov AV, Vasudevan S, Wolf YI, Yin JJ, Natale DA (2003) The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4:41
Altschul SF, Madden TL, Schaffer AA, Zhang JH, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410
Yu YK, Gertz EM, Agarwala R, Schäffer AA, Altschul SF (2006) Retrieval accuracy, statistical significance and compositional similarity in protein sequence database searches. Nucleic Acids Res 34:5966–5973
Gertz EM, Yu YK, Agarwala R, Schaffer A, Altschul S (2006) Composition-based statistics and translated nucleotide searches: improving the TBLASTN module of BLAST. BMC Biol 4:41
Arnold K, Bordoli L, Kopp J, Schwede T (2006) The SWISS-MODEL workspace: a web-based environment for protein structure homology modelling. Bioinformatics 22:195–201
Kiefer F, Arnold K, Künzli M, Bordoli L, Schwede T (2009) The SWISS-MODEL repository and associated resources. Nucleic Acids Res 37:D387–D392
Sonnhammer ELL, von Heijne G, Krogh A (1998) A hidden Markov model for predicting transmembrane helices in protein sequences. In: Glasgow J, Littlejohn T, Major F, Lathrop R, Sankoff D, Sensen C (eds) Proceedings of sixth int. conf. on intelligent systems for molecular biology. AAAI Press, Menlo Park, CA, pp 175–182
Sigrist CJA, Cerutti L, de Castro E, Langendijk-Genevaux PS, Bulliard V, Bairoch A, Hulo N (2010) PROSITE, a protein domain database for functional characterization and annotation. Nucleic Acids Res 38:D161–D166
Clark PC (2005) Molecular biology. Elsevier Academic Press, Boston, MA
Pedersen AG, Nielsen H (1997) Neural network prediction of translation initiation sites in eukaryotes: perspectives for EST and genome analysis. Proc Int Conf Intell Syst Mol Biol 5:226–233
Emanuelsson O, von Heijne G (2001) Prediction of organellar targeting signals. Biochim Biophys Acta 1541:114–119
Wu Q, Krainer AR (1999) AT-AC pre-mRNA splicing mechanisms and conservation of minor introns in voltage-gated ion channel genes. Mol Cell Biol 19:3225–3236
Brudno M, Do CB, Cooper GM, Kim MF, Davydov E, Green ED, Sidow A, Batzoglou S (2003) LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA. Genome Res 13:721–731
Couronne O, Poliakov A, Bray N, Ishkhanov T, Ryaboy D, Rubin E, Pachter L, Dubchak I (2003) Strategies and tools for whole-genome alignments. Genome Res 13:73–80
Sprenger J, Fink JL, Teasdale R (2006) Evaluation and comparison of mammalian subcellular localization prediction methods. BMC Bioinformatics 7:S3
Dalbey RE, von Heijne G (2002) Protein targeting, transport and translocation. Academic, London
Kroth PG, Chiovitti A, Gruber A, Martin-Jezequel V, Mock T, Parker MS, Stanley MS, Kaplan A, Caron L, Weber T, Maheswari U, Armbrust EV, Bowler C (2008) A model for carbohydrate metabolism in the diatom Phaeodactylum tricornutum deduced from comparative whole genome analysis. PLoS ONE 3:e1426
Gould SB, Waller RF, McFadden GI (2008) Plastid evolution. Annu Rev Plant Physiol 59:491–517
Emanuelsson O, Nielsen H, Brunak S, von Heijne G (2000) Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. J Mol Biol 300:1005–1016
Small I, Peeters N, Legeai F, Lurin C (2004) Predotar: a tool for rapidly screening proteomes for N-terminal targeting sequences. Proteomics 4:1581–1590
Kilian O, Kroth PG (2005) Identification and characterization of a new conserved motif within the presequence of proteins targeted into complex diatom plastids. Plant J 41:175–183
Gould SB, Sommer MS, Kroth PG, Gile GH, Keeling PJ, Maier UG (2006) Nucleus-to-nucleus gene transfer and protein retargeting into a remnant cytoplasm of cryptophytes and diatoms. Mol Biol Evol 23:2413–2422
Bendtsen JD, Nielsen H, von Heijne G, Brunak S (2004) Improved prediction of signal peptides: SignalP 3.0. J Mol Biol 340:783–795
Nielsen H, Engelbrecht J, Brunak S, von Heijne G (1997) Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Eng 10:1–6
Emanuelsson O, Nielsen H, von Heijne G (1999) ChloroP, a neural network-based method for predicting chloroplast transit peptides and their cleavage sites. Protein Sci 8:978–984
Marchler-Bauer A, Lu S, Anderson JB, Chitsaz F, Derbyshire MK, DeWeese-Scott C, Fong JH, Geer LY, Geer RC, Gonzales NR, Gwadz M, Hurwitz DI, Jackson JD, Ke Z, Lanczycki CJ, Lu F, Marchler GH, Mullokandov M, Omelchenko MV, Robertson CL, Song JS, Thanki N, Yamashita RA, Zhang D, Zhang N, Zheng C, Bryant SH (2011) CDD: a conserved domain database for the functional annotation of proteins. Nucleic Acids Res 39:D225–D229
Felsner G, Sommer MS, Maier UG (2010) The physical and functional borders of transit peptide-like sequences in secondary endosymbionts. BMC Plant Biol 10:223
Huesgen PF, Alami M, Lange PF, Foster LJ, Schröder WP, Overall CM, Green BR (2013) Proteomic amino-termini profiling reveals targeting information for protein import into complex plastids. PLOS ONE (in press)
Gschloessl B, Guermeur Y, Cock JM (2008) HECTAR: a method to predict subcellular targeting in heterokonts. BMC Bioinformatics 9:393
Lang M, Kroth PG (2001) Diatom fucoxanthin chlorophyll a/c-binding protein (FCP) and land plant light-harvesting proteins use a similar pathway for thylakoid membrane Insertion. J Biol Chem 276:7985–7991
Gould SB, Fan E, Hempel F, Maier UG, Klosgen RB (2007) Translocation of a phycoerythrin alpha subunit across five biological membranes. J Biol Chem 282:30295–30302
Horton P, Park KJ, Obayashi T, Fujita N, Harada H, Adams-Collier CJ, Nakai K (2007) WoLF PSORT: protein localization predictor. Nucleic Acids Res 35:W585–W587
Wang X, Li G-Z (2012) A multi-label predictor for identifying the subcellular locations of singleplex and multiplex eukaryotic proteins. PLoS ONE 7:e36317
Koehler RA (1998) GFP for in vivo imaging of subcellular structures in plant cells. Trends Plant Sci 3:317–320
Webster P, Schwarz H, Griffiths G (2008) Introduction to electron microscopy for biologists. In: Terence DA (ed) Methods in cell biology. Academic, London, pp 45–58
Andersen JS, Wilkinson CJ, Mayor T, Mortensen P, Nigg EA, Mann M (2003) Proteomic characterization of the human centrosome by protein correlation profiling. Nature 426:570–574
Haqqani AS, Kelly JF, Stanimirovic DB (2008) Quantitative protein profiling by mass spectrometry using isotope-coded affinity tags. Methods Mol Biol 439:225–240
Gouw JW, Krijgsveld J, Heck AJ (2010) Quantitative proteomics by metabolic labeling of model organisms. Mol Cell Proteomics 9:11–24
Peltier JB, Friso G, Kalume DE, Roepstorff P, Nilsson F, Adamska I, van Wijk KJ (2000) Proteomics of the chloroplast: systematic identification and targeting analysis of lumenal and peripheral thylakoid proteins. Plant Cell 12:319–341
Eddy SR (2009) A new generation of homology search tools based on probabilistic inference. Genome Inform 23:205–211
Finn RD, Clements J, Eddy SR (2011) HMMER web server: interactive sequence similarity searching. Nucleic Acids Res 39(Web Server Issue):W29–W37
Remmert M, Biegert A, Hauser A, Söding J (2012) HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat Methods 9:173–175
Biegert A, Mayer C, Remmert M, Söding J, Lupas A (2006) The MPI Toolkit for protein sequence analysis. Nucleic Acids Res 34:W335–W339
Acknowledgements
The authors are grateful for helpful discussions with Daniela Ewe and for financial support by the German Research Foundation (DFG) to PGK (KR1661/3-4 and SFB969, project A04) and by the University of Konstanz.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer Science+Business Media, New York
About this protocol
Cite this protocol
Gruber, A., Kroth, P.G. (2014). Deducing Intracellular Distributions of Metabolic Pathways from Genomic Data. In: Sriram, G. (eds) Plant Metabolism. Methods in Molecular Biology, vol 1083. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-62703-661-0_12
Download citation
DOI: https://doi.org/10.1007/978-1-62703-661-0_12
Published:
Publisher Name: Humana Press, Totowa, NJ
Print ISBN: 978-1-62703-660-3
Online ISBN: 978-1-62703-661-0
eBook Packages: Springer Protocols