Abstract
MALDIquant and associated R packages provide a versatile and completely free open-source platform for analyzing 2D mass spectrometry data as generated, for instance, by MALDI and SELDI instruments. We first describe the various methods and algorithms available in MALDIquant. Subsequently, we illustrate a typical analysis workflow using MALDIquant by investigating an experimental cancer data set, starting from raw mass spectrometry measurements and ending at multivariate classification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aebersold, R., & Mann, M. (2003). Mass spectrometry-based proteomics. Nature, 422, 198–207.
Ahdesmäki, M., & Strimmer, K. (2010). Feature selection in omics prediction problems using cat scores and false nondiscovery rate control. The Annals of Applied Statistics, 4(1), 503–519.
Andrew, M. A. (1979). Another efficient algorithm for convex hulls in two dimensions. Information Processing Letters, 9, 216–219. Amsterdam: Elsevier.
Baggerly, K. A., Morris, J. S., & Coombes, K. R. (2004). Reproducibility of SELDI-TOF protein patterns in serum: Comparing datasets from different experiments. Bioinformatics, 20, 777–785.
Bloemberg, T. G., Gerretzen, J., Wouters, H. J. P., Gloerich, J., van Dael, M., Wessels, H. J. C. T., et al. (2010). Improved parametric time warping for proteomics. Chemometrics and Intelligent Laboratory Systems, 104, 65–74.
Bolstad, B. M., Irizarry, R. A., Astrand, M., & Speed, T. P. (2003). A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics, 19, 185–193.
Borgaonkar, S. P., Hocker, H., Shin, H., & Markey, M. K. (2010). Comparison of normalization methods for the identification of biomarkers using MALDI-TOF and SELDI-TOF mass spectra. OMICS, 14, 115–126.
Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.
Bromba, M. U. A., & Ziegler, H. (1981). Application hints for Savitzky–Golay digital smoothing filters. Analytical Chemistry, 53(11), 1583–1586.
Callister, S. J., Barry, R. C., Adkins, J. N., Johnson, E. T., Qian, W.-J., Webb-Robertson, B.-J. M., et al. (2006). Normalization approaches for removing systematic biases associated with mass spectrometry and label-free proteomics. Journal of Proteome Research, 5, 277–286.
Chambers, M. C., Maclean, B., Burke, R., Amodei, D., Ruderman, D. L., Neumann, S., et al. (2012). A cross-platform toolkit for mass spectrometry and proteomics. Nature Biotechnology, 30(10), 918–920.
Clifford, D., Montoliu, G. S. I., Rezzi, S., Martin, F.-P., Guy, P., Bruce, S., et al. (2009). Alignment using variable penalty dynamic time warping. Analytical Chemistry, 81, 1000–1007.
Coombes, K. R., Tsavachidis, S., Morris, J. S., Baggerly, K. A., Hung, M.-C., & Kuerer, H. M. (2005). Improved peak detection and quantification of mass spectrometry data acquired from surface-enhanced laser desorption and ionization by denoising spectra with the undecimated discrete wavelet transform. Proteomics, 5, 4107–4117.
Cornett, D. S., Reyzer, M. L., Chaurand, P., & Caprioli, R. M. (2007). MALDI imaging mass spectrometry: Molecular snapshots of biochemical systems. Nature Methods, 4, 828–833.
Dieterle, F., Ross, A., Schlotterbeck, G., & Senn, H. (2006). Probabilistic quotient normalization as robust method to account for dilution of complex biological mixtures. Application in 1H NMR metabonomics. Analytical Chemistry, 78, 4281–4290.
Du, P., Kibbe, W. A., & Lin, S. M. (2006). Improved peak detection in mass spectrum by incorporating continuous wavelet transform-based pattern matching. Bioinformatics, 22, 2059–2065.
Du, P., Stolovitzky, G., Horvatovich, P., Bischoff, R., Lim, J., & Suits, F. (2008). A noise model for mass spectrometry based proteomics. Bioinformatics, 24, 1070–1077.
Fiedler, G. M., Leichtle, A. B., Kase, J., Baumann, S., Ceglarek, U., Felix, K., et al. (2009). Serum peptidome profiling revealed platelet factor 4 as a potential discriminating peptide associated with pancreatic cancer. Clinical Cancer Research, 15, 3812–3819.
Friedman, J. H. (1984). A variable span smoother. Technical Report, DTIC Document.
Gammerman, A., Nouretdinov, I., Burford, B., Chervonenkis, A., Vovk, V., & Luo, Z. (2008). Clinical mass spectrometry proteomic diagnosis by conformal predictors. Statistical Applications in Genetics and Molecular Biology, 7, 13.
Gibb, S., & Strimmer, K. (2012). MALDIquant: A versatile R package for the analysis of mass spectrometry data. Bioinformatics, 28, 2270–2271.
Gibb, S., & Strimmer, K. (2015). Differential protein expression and peak selection in mass spectrometry data by binary discriminant analysis. Bioinformatics, 31, 3156–3162.
Gil, J. Y., & Kimmel, R. (2002). Efficient dilation, erosion, opening, and closing algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24, 1606–1617.
Gregori, J., Villarreal, L., Méndez, O., Sánchez, A., Baselga, J., & Villanueva, J. (2012). Batch effects correction improves the sensitivity of significance tests in spectral counting-based comparative discovery proteomics. Journal of Proteomics, 75(13), 3938–3951.
He, Q. P., Wang, J., Mobley, J. A., Richman, J., & Grizzle, W. E. (2011). Self-calibrated warping for mass spectra alignment. Cancer Informatics, 10, 65–82.
House, L. L., Clyde, M. A., & Wolpert, R. L. (2011). Bayesian nonparametric models for peak identification in MALDI-TOF mass spectroscopy. The Annals of Applied Statistics, 5, 1488–1511.
Hu, J., Coombes, K. R., Morris, J. S., & Baggerly, K. A. (2005). The importance of experimental design in proteomic mass spectrometry experiments: Some cautionary tales. Briefings in Functional Genomics and Proteomics, 3, 322–331.
Jeffries, N. (2005). Algorithms for alignment of mass spectrometry proteomic data. Bioinformatics, 21, 3066–3073.
Kim, S., Koo, I., Fang, A., & Zhang, X. (2011). Smith-Waterman peak alignment for comprehensive two-dimensional gas chromatography-mass spectrometry. BMC Bioinformatics, 12, 235.
Lange, E., Gröpl, C., Reinert, K., Kohlbacher, O., & Hildebrandt, A. (2006). High-accuracy peak picking of proteomics data using wavelet techniques. In Pacific Symposium on Biocomputing (Vol. 11, pp. 243–254).
Leek, J. T., Scharpf, R. B., Bravo, H. C., Simcha, D., Langmead, B., Johnson, W. E., et al. (2010). Tackling the widespread and critical impact of batch effects in high-throughput data. Nature Reviews Genetics, 11, 733–739.
Leichtle, A. B., Dufour, J.-F., & Fiedler, G. M. (2013). Potentials and pitfalls of clinical peptidomics and metabolomics. Swiss Medical Weekly, 143, w13801.
Li, X. (2005). PROcess: Ciphergen SELDI-TOF Processing. R package version 1.44.0.
Lilley, K. S., Deery, M. J., & Gatto, L. (2011). Challenges for proteomics core facilities. Proteomics, 11(6), 1017–1025.
Lin, S. M., Haney, R. P., Campa, M. J., Fitzgerald, M. C., & Patz, E. F. (2005). Characterising phase variations in MALDI-TOF data & correcting them by peak alignment. Cancer Informatics, 1, 32–40.
Liu, Q., Krishnapuram, B., Pratapa, P., Liao, X., Hartemink, A., & Carin, L. (2003). Identification of differentially expressed proteins using MALDI-TOF mass spectra. Signals, Systems & Computers, 2003. Conference Record (Vol. 2, pp. 1323–1327).
Liu, Q., Sung, A. H., Qiao, M., Chen, Z., Yang, J. Y., Yang, M. Q., et al. (2009). Comparison of feature selection & classification for MALDI-MS data. BMC Genomics, 10(Suppl 1), S3.
Liu, L. H., Shan, B. E., Tian, Z. Q., Sang, M. X., Ai, J., Zhang, Z. F., et al. (2010). Potential biomarkers for esophageal carcinoma detected by matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. Clinical Chemistry & Laboratory Medicine, 486, 855–861.
Martens, L., Chambers, M., Sturm, M., Kessner, D., Levander, F., Shofstahl, J., et al. (2011). mzML–a community standard for mass spectrometry data. Molecular & Cellular Proteomics, 10, R110.000133.
Mertens, B. J. A., de Noo, M. E., Tollenaar, R. A. E. M., & Deelder, A. M. (2006). Mass spectrometry proteomic diagnosis: Enacting the double cross-validatory paradigm. Journal of Computational Biology, 13, 1591–1605.
Meuleman, W., Engwegen, J. Y., Gast, M.-C. W., Beijnen, J. H., Reinders, M. J., & Wessels, L. F. (2008). Comparison of normalisation methods for surface-enhanced laser desorption and ionisation (SELDI) time-of-flight (TOF) mass spectrometry data. BMC Bioinformatics, 9, 88.
Morhác, M. (2009). An algorithm for determination of peak regions and baseline elimination in spectroscopic data. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, 600, 478–487.
Morris, J. S., Baggerly, K. A., Gutstein, H. B., & Coombes, K. R. (2010). Statistical contributions to proteomic research. Methods in Molecular Biology, 641, 143–166.
Morris, J. S., Coombes, K. R., Koomen, J., Baggerly, K. A., & Kobayashi, R. (2005). Feature extraction and quantification for mass spectrometry in biomedical applications using the mean spectrum. Bioinformatics, 21, 1764–1775.
Norris, J. L., Cornett, D. S., Mobley, J. A., Andersson, M., Seeley, E. H., Chaurand, P., et al. (2007). Processing MALDI mass spectra to improve mass spectral direct tissue analysis. International Journal of Mass Spectrometry, 260, 212–221.
Pedrioli, P. G. A., Eng, J. K., Hubley, R., Vogelzang, M., Deutsch, E. W., Raught, B., et al. (2004). A common open representation of mass spectrometry data and its application to proteomics research. Nature Biotechnology, 22, 1459–1466.
Purohit, P. V., & Rocke, D. M. (2003). Discriminant models for high-throughput proteomics mass spectrometer data. Proteomics, 3, 1699–1703.
R Core Team (2015). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.
Robb, R. A., Hanson, D. P., Karwoski, R. A., Larson, A. G., Workman, E. L., & Stacy, M. C. (1989). Analyze: A comprehensive, operator-interactive software package for multidimensional medical image display and analysis. Computerized Medical Imaging and Graphics, 13, 433–454.
Ryan, C. G., Clayton, E., Griffin, W. L., Sie, S. H., & Cousens, D. R. (1988). SNIP, a statistics-sensitive background treatment for the quantitative analysis of PIXE spectra in geoscience applications. Nuclear Instruments and Methods in Physics Research Section B: Beam Interactions with Materials and Atoms, 34, 396–402.
Sakoe, H., & Chiba, S. (1978). Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech and Signal Processing, 26, 43–49.
Sauve, A. C., & Speed, T. P. (2004). Normalization, baseline correction and alignment of high-throughput mass spectrometry data. In Proceedings of the Data Proceedings Gensips.
Savitzky, A., & Golay, M. J. E. (1964). Smoothing and differentiation of data by simplified least squares procedures. Analytical Chemistry, 36, 1627–1639.
Schramm, T., Hester, A., Klinkert, I., Both, J.-P., Heeren, R. M. A., Brunelle, A., et al. (2012). imzML–a common data format for the flexible exchange and processing of mass spectrometry imaging data. Journal of Proteomics, 75, 5106–5110.
Shin, H., & Markey, M. K. (2006). A machine learning perspective on the development of clinical decision support systems utilizing mass spectra of blood samples. Journal of Biomedical Informatics, 39, 227–248.
Sköld, M., Rydén, T., Samuelsson, V., Bratt, C., Ekblad, L., Olsson, H., et al. (2007). Regression analysis and modelling of data acquisition for SELDI-TOF mass spectrometry. Bioinformatics, 23, 1401–1409.
Smith, C. A., Want, E. J., O’Maille, G., Abagyan, R., & Siuzdak, G. (2006). XCMS: Processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. Analytical Chemistry, 78, 779–787.
Smith, R., Ventura, D., & Prince, J. T. (2013). LC-MS alignment in theory and practice: A comprehensive algorithmic review. Briefings in Bioinformatics, 16(1), 104–117.
Strimmer, K. (2014). crossval: Generic functions for cross validation. R package version 1.0.1.
Tibshirani, R., Hastie, T., Narsimhan, B., & Chu, G. (2003). Class prediction by nearest shrunken centroids, with applications to DNA microarrays. Statistical Science, 18, 104–117.
Tibshirani, R., Hastie, T., Narasimhan, B., Soltys, S., Shi, G., Koong, A., et al. (2004). Sample classification from protein mass spectrometry, by ‘peak probability contrasts’. Bioinformatics, 20, 3034–3044.
Toppoo, S., Roveri, A., Vitale, M. P., Zaccarin, M., Serain, E., Apostolidis, E., et al. (2008). MPA: A multiple peak alignment algorithm to perform multiple comparisons of liquid-phase proteomic profiles. Proteomics, 8, 250–253.
Torgrip, R. J. O., Åberg, M., Karlberg, B., & Jacobsson, S. P. (2003). Peak alignment using reduced set mapping. Journal of Chemometrics, 17, 573–582.
Tracy, M. B., Chen, H., Weaver, D. M., Malyarenko, D. I., Sasinowski, M., Cazares, L. H., et al. (2008). Precision enhancement of MALDI-TOF MS using high resolution peak detection and label-free alignment. Proteomics, 8, 1530–1538.
van Herk, M. (1992). A fast algorithm for local minimum and maximum filters on rectangular and octagonal kernels. Pattern Recognition Letters, 13, 517–521.
Veselkov, K. A., Lindon, J. C., Ebbels, T. M. D., Crockford, D., Volynkin, V. V., Holmes, E., et al. (2009). Recursive segment-wise peak alignment of biological (1)h NMR spectra for improved metabolic biomarker recovery. Analytical Chemistry, 81, 56–66.
Wang, B., Fang, A., Heim, J., Bogdanov, B., Pugh, S., Libardoni, M., et al. (2010). DISCO: distance and spectrum correlation optimization alignment for two-dimensional gas chromatography time-of-flight mass spectrometry-based metabolomics. Analytical Chemistry, 82, 5069–5081.
Wehrens, R., Bloemberg, T., & Eilers, P. (2015). Fast parametric time warping of peak lists. Bioinformatics, 15, 3063–3065.
Williams, B., Cornett, S., Dawant, B., Crecelius, A., Bodenheimer, B., & Caprioli, R. (2005). An algorithm for baseline correction of MALDI mass spectra. In Proceedings of the 43rd Annual Southeast Regional Conference (Vol. 1, pp. 137–142). ACM-SE 43.
Yasui, Y., McLerran, D., Adam, B., Winget, M., Thornquist, M., & Feng, Z. (2003). An automated peak-identification/calibration procedure for high-dimensional protein measures from mass spectrometers. Journal of Biomedicine and Biotechnology, 4, 242–248.
Yasui, Y., Pepe, M., Thompson, M. L., Adam, B.-L., Wright, G. L., Qu, Y., et al. (2003). A data-analytic strategy for protein biomarker discovery: profiling of high-dimensional proteomic data for cancer detection. Biostatistics, 4, 449–463.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Gibb, S., Strimmer, K. (2017). Mass Spectrometry Analysis Using MALDIquant. In: Datta, S., Mertens, B. (eds) Statistical Analysis of Proteomics, Metabolomics, and Lipidomics Data Using Mass Spectrometry. Frontiers in Probability and the Statistical Sciences. Springer, Cham. https://doi.org/10.1007/978-3-319-45809-0_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-45809-0_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45807-6
Online ISBN: 978-3-319-45809-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)