Abstract
Background
Ab initio protein structure prediction is to predict the tertiary structure of a protein from its amino acid sequence alone. As an important topic in bioinformatics, considerable efforts have been made on designing the ab initio methods. Unfortunately, lacking of a perfect energy function, it is a difficult task to select a good near-native structure from the predicted decoy structures in the last step.
Methods
Here we propose an ensemble clustering method based on k-medoids to deal with this problem. The k-medoids method is run many times to generate clustering ensembles, and then a voting method is used to combine the clustering results. A confidence score is defined to select the final near-native model, considering both the cluster size and the cluster similarity.
Results
We have applied the method to 54 single-domain targets in CASP-11. For about 70.4% of these targets, the proposed method can select better near-native structures compared to the SPICKER method used by the I-TASSER server.
Conclusions
The experiments show that, the proposed method is effective in selecting the near-native structure from decoy sets for different targets in terms of the similarity between the selected structure and the native structure.
Article PDF
Similar content being viewed by others
References
UniProtKB/TrEMBL Protein Database Release Statistics. https://doi.org/www.ebi.ac.uk/uniprot/TrEMBLstats (Accessed Jun 30, 2017)
Zhang, Y. and Skolnick, J. (2004) Automated structure prediction of weakly homologous proteins on a genomic scale. Proc. Natl. Acad. Sci. USA, 101, 7594–7599
Huang, D. S., Zhao, X. M., Huang, G. B. and Cheung, Y. M. (2006) Classifying protein sequences using hydropathy blocks. Pattern Recognit., 39, 2293–2300
Xia, J. F., Zhao, X. M., Song, J. and Huang, D. S. (2010) APIS: accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility. BMC Bioinformatics, 11, 174
Huang, D. S., Zhang, L., Han, K., Deng, S., Yang, K. and Zhang, H. (2014) Prediction of protein-protein interactions based on protein-protein correlation using least squares regression. Curr. Protein Pept. Sci., 15, 553–560
Shortle, D., Simons, K. T. and Baker, D. (1998) Clustering of lowenergy conformations near the native structures of small proteins. Proc. Natl. Acad. Sci. USA, 95, 11158–11162
Kaufman, L. and Rousseeuw, P. J. (1987) Clustering by means of medoids. In Statistical Data Analysis Based on The Ll-Norm and Related Methods, Dodge, Y. (ed.). Basel: Birkhäuser Basel
Deng, Z., Choi, K. S., Jiang, Y., Wang, J. and Wang, S. (2016) A survey on soft subspace clustering. Inf. Sci., 348, 84–106
Bradley, P., Malmström, L., Qian, B., Schonbrun, J., Chivian, D., Kim, D. E., Meiler, J., Misura, K. M. and Baker, D. (2005) Free modeling with Rosetta in CASP6. Proteins, 61, 128–134
Jain, A. K. (2010) Data clustering: 50 years beyond K-means. Pattern Recognit. Lett., 31, 651–666
Snyder, D. A. and Montelione, G. T. (2005) Clustering algorithms for identifying core atom sets and for assessing the precision of protein structure ensembles. Proteins, 59, 673–686
Asur, S., Ucar, D., and Parthasarathy, S. (2006) An ensemble approach for clustering protein-protein interaction networks. Bioinfomatics, 23, i29–i40
Pirim, H. and Seker, S. E. (2012) Ensemble clustering for biological datasets. In Bioinformatics, Pérez-Sánchez, H., (Ed.). IntechOpen
Zhang, Y. and Skolnick, J. (2004) Scoring function for automated assessment of protein structure template quality. Proteins, 57, 702–710
Moult, J., Pedersen, J. T., Judson, R. and Fidelis, K. (1995) A large-scale experiment to assess protein structure prediction methods. Proteins, 23, ii–v
Yang, J., Yan, R., Roy, A., Xu, D., Poisson, J. and Zhang, Y. (2015) The I-TASSER Suite: protein structure and function prediction. Nat. Methods, 12, 7–8
Zhang, Y. (2008) I-TASSER server for protein 3D structure prediction. BMC Bioinformatics, 9, 40
The 11th Critical Assessment of Techniques for Protein Structure Prediction. predictioncenter.org/casp11/zscores_final.cgi (Accessed Jun 30, 2017)
Zhang, Y. and Skolnick, J. (2004) SPICKER: a clustering approach to identify near-native protein folds. J. Comput. Chem., 25, 865–871
Vega-Pons, S. and Ruiz-Shulcloper, J. (2011) A survey of clustering ensemble algorithms. Int. J. Pattern Recognit. Artif. Intell., 25, 337–372
Acknowledgements
This work is supported by the National Key R&D Program of China (No. 2017YFE0111900), and the Lanzhou Talents Program for Innovation and Entrepreneurship (No. 2016-RC-93).
Author information
Authors and Affiliations
Corresponding author
Additional information
Author summary: It is a difficult task to select a good near-native structure from the predicted decoy structures produced by ab initio structure prediction methods. The k-medoids is usually used for the purpose due to its simplicity and efficiency. However, the result of the k-medoids method may be affected by its initial centroid selection. The paper proposes a new ensemble clustering method based on k-medoids to deal with this problem. The experiments show that the proposed method is effective in selecting the near-native structure from decoy sets for different targets.
Rights and permissions
About this article
Cite this article
Li, L., Yan, H. & Lu, Y. Selecting near-native protein structures from ab initio models using ensemble clustering. Quant Biol 6, 307–312 (2018). https://doi.org/10.1007/s40484-018-0158-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40484-018-0158-1