Selecting near-native protein structures from ab initio models using ensemble clustering

Li, Li; Yan, Huanqian; Lu, Yonggang

doi:10.1007/s40484-018-0158-1

Selecting near-native protein structures from ab initio models using ensemble clustering

Research Article
Published: 24 November 2018

Volume 6, pages 307–312, (2018)
Cite this article

Download PDF

Quantitative Biology

Selecting near-native protein structures from ab initio models using ensemble clustering

Download PDF

Li Li¹,
Huanqian Yan¹ &
Yonggang Lu¹

307 Accesses
1 Citation
Explore all metrics

Abstract

Background

Ab initio protein structure prediction is to predict the tertiary structure of a protein from its amino acid sequence alone. As an important topic in bioinformatics, considerable efforts have been made on designing the ab initio methods. Unfortunately, lacking of a perfect energy function, it is a difficult task to select a good near-native structure from the predicted decoy structures in the last step.

Methods

Here we propose an ensemble clustering method based on k-medoids to deal with this problem. The k-medoids method is run many times to generate clustering ensembles, and then a voting method is used to combine the clustering results. A confidence score is defined to select the final near-native model, considering both the cluster size and the cluster similarity.

Results

We have applied the method to 54 single-domain targets in CASP-11. For about 70.4% of these targets, the proposed method can select better near-native structures compared to the SPICKER method used by the I-TASSER server.

Conclusions

The experiments show that, the proposed method is effective in selecting the near-native structure from decoy sets for different targets in terms of the similarity between the selected structure and the native structure.

Article PDF

Ranking near-native candidate protein structures via random forest classification

Article Open access 24 December 2019

MASS: predict the global qualities of individual protein models using random forests and novel statistical potentials

Article Open access 06 July 2020

Decoy selection for protein structure prediction via extreme gradient boosting and ranking

Article Open access 09 December 2020

References

UniProtKB/TrEMBL Protein Database Release Statistics. https://doi.org/www.ebi.ac.uk/uniprot/TrEMBLstats (Accessed Jun 30, 2017)
Zhang, Y. and Skolnick, J. (2004) Automated structure prediction of weakly homologous proteins on a genomic scale. Proc. Natl. Acad. Sci. USA, 101, 7594–7599
Article CAS PubMed PubMed Central Google Scholar
Huang, D. S., Zhao, X. M., Huang, G. B. and Cheung, Y. M. (2006) Classifying protein sequences using hydropathy blocks. Pattern Recognit., 39, 2293–2300
Article Google Scholar
Xia, J. F., Zhao, X. M., Song, J. and Huang, D. S. (2010) APIS: accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility. BMC Bioinformatics, 11, 174
Article CAS PubMed PubMed Central Google Scholar
Huang, D. S., Zhang, L., Han, K., Deng, S., Yang, K. and Zhang, H. (2014) Prediction of protein-protein interactions based on protein-protein correlation using least squares regression. Curr. Protein Pept. Sci., 15, 553–560
Article CAS PubMed Google Scholar
Shortle, D., Simons, K. T. and Baker, D. (1998) Clustering of lowenergy conformations near the native structures of small proteins. Proc. Natl. Acad. Sci. USA, 95, 11158–11162
Article CAS PubMed PubMed Central Google Scholar
Kaufman, L. and Rousseeuw, P. J. (1987) Clustering by means of medoids. In Statistical Data Analysis Based on The Ll-Norm and Related Methods, Dodge, Y. (ed.). Basel: Birkhäuser Basel
Google Scholar
Deng, Z., Choi, K. S., Jiang, Y., Wang, J. and Wang, S. (2016) A survey on soft subspace clustering. Inf. Sci., 348, 84–106
Article Google Scholar
Bradley, P., Malmström, L., Qian, B., Schonbrun, J., Chivian, D., Kim, D. E., Meiler, J., Misura, K. M. and Baker, D. (2005) Free modeling with Rosetta in CASP6. Proteins, 61, 128–134
Article CAS PubMed Google Scholar
Jain, A. K. (2010) Data clustering: 50 years beyond K-means. Pattern Recognit. Lett., 31, 651–666
Article Google Scholar
Snyder, D. A. and Montelione, G. T. (2005) Clustering algorithms for identifying core atom sets and for assessing the precision of protein structure ensembles. Proteins, 59, 673–686
Article CAS PubMed Google Scholar
Asur, S., Ucar, D., and Parthasarathy, S. (2006) An ensemble approach for clustering protein-protein interaction networks. Bioinfomatics, 23, i29–i40
Article CAS Google Scholar
Pirim, H. and Seker, S. E. (2012) Ensemble clustering for biological datasets. In Bioinformatics, Pérez-Sánchez, H., (Ed.). IntechOpen
Google Scholar
Zhang, Y. and Skolnick, J. (2004) Scoring function for automated assessment of protein structure template quality. Proteins, 57, 702–710
Article CAS PubMed Google Scholar
Moult, J., Pedersen, J. T., Judson, R. and Fidelis, K. (1995) A large-scale experiment to assess protein structure prediction methods. Proteins, 23, ii–v
Article CAS PubMed Google Scholar
Yang, J., Yan, R., Roy, A., Xu, D., Poisson, J. and Zhang, Y. (2015) The I-TASSER Suite: protein structure and function prediction. Nat. Methods, 12, 7–8
Article CAS PubMed PubMed Central Google Scholar
Zhang, Y. (2008) I-TASSER server for protein 3D structure prediction. BMC Bioinformatics, 9, 40
Article CAS PubMed PubMed Central Google Scholar
The 11th Critical Assessment of Techniques for Protein Structure Prediction. predictioncenter.org/casp11/zscores_final.cgi (Accessed Jun 30, 2017)
Zhang, Y. and Skolnick, J. (2004) SPICKER: a clustering approach to identify near-native protein folds. J. Comput. Chem., 25, 865–871
Article CAS PubMed Google Scholar
Vega-Pons, S. and Ruiz-Shulcloper, J. (2011) A survey of clustering ensemble algorithms. Int. J. Pattern Recognit. Artif. Intell., 25, 337–372
Article Google Scholar

Download references

Acknowledgements

This work is supported by the National Key R&D Program of China (No. 2017YFE0111900), and the Lanzhou Talents Program for Innovation and Entrepreneurship (No. 2016-RC-93).

Author information

Authors and Affiliations

School of Information Science and Engineering, Lanzhou University, Lanzhou, 730000, China
Li Li, Huanqian Yan & Yonggang Lu

Authors

Li Li
View author publications
You can also search for this author in PubMed Google Scholar
Huanqian Yan
View author publications
You can also search for this author in PubMed Google Scholar
Yonggang Lu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yonggang Lu.

Additional information

Author summary: It is a difficult task to select a good near-native structure from the predicted decoy structures produced by ab initio structure prediction methods. The k-medoids is usually used for the purpose due to its simplicity and efficiency. However, the result of the k-medoids method may be affected by its initial centroid selection. The paper proposes a new ensemble clustering method based on k-medoids to deal with this problem. The experiments show that the proposed method is effective in selecting the near-native structure from decoy sets for different targets.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, L., Yan, H. & Lu, Y. Selecting near-native protein structures from ab initio models using ensemble clustering. Quant Biol 6, 307–312 (2018). https://doi.org/10.1007/s40484-018-0158-1

Download citation

Received: 09 March 2018
Revised: 23 April 2018
Accepted: 05 May 2018
Published: 24 November 2018
Issue Date: December 2018
DOI: https://doi.org/10.1007/s40484-018-0158-1

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Selecting near-native protein structures from ab initio models using ensemble clustering