Skip to main content

Comparison of Fusion Methodologies Using CNV and RNA-Seq for Cancer Classification: A Case Study on Non-Small-Cell Lung Cancer

  • Conference paper
  • First Online:
Bioengineering and Biomedical Signal and Image Processing (BIOMESIP 2021)

Abstract

Lung cancer is one of the most frequent cancer types, and one among those causing more deceases worldwide. Nowadays, in order to improve the diagnosis of cancer more screenings are performed to the same patient and various biological sources are being gathered. Fusing the information provided by these sources can lead to a more robust diagnosis, which can improve the prognosis of the patient. In this work, a comparison of fusion methodologies (early and intermediate) using RNA-Seq and Copy Number Variation data for Non-Small-Cell Lung Cancer classification is performed. We found that great results can be attained using both fusion methodologies, with an AUC of 0.984 for the early fusion and 0.989 for the intermediate fusion, improving those obtained by each source of information independently (0.978 RNA-Seq and 0.910 Copy Number Variation). This work shows that fusion methodologies can enhance the classification of non-small-cell lung cancer, and that these methodologies can be promising for the diagnosis of other cancer types.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Castillo, D., et al.: Leukemia multiclass assessment and classification from microarray and rna-seq technologies integration at gene expression level. PloS One 14(2), e0212127 (2019)

    Google Scholar 

  2. Castillo, D., Gálvez, J.M., Herrera, L.J., San Román, B., Rojas, F., Rojas, I.: Integration of rna-seq data with heterogeneous microarray data for breast cancer profiling. BMC Bioinf. 18(1), 506 (2017)

    Article  Google Scholar 

  3. Castillo-Secilla, D., et al.: Knowseq r-bioc package: the automatic smart gene expression tool for retrieving relevant biological knowledge. Comput. Biol. Med. 133, 104387 (2021)

    Google Scholar 

  4. Chen, T., Guestrin, C.: Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining, pp. 785–794 (2016)

    Google Scholar 

  5. Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. J. Bioinf. Comput. Biol. 3(02), 185–205 (2005)

    Article  Google Scholar 

  6. Dong, Y., et al.: Mlw-gcforest: a multi-weighted gcforest model towards the staging of lung adenocarcinoma based on multi-modal genetic data. BMC Bioinf. 20(1), 1–14 (2019)

    Google Scholar 

  7. Gálvez, J.M., et al.: Towards improving skin cancer diagnosis by integrating microarray and rna-seq datasets. IEEE J. Biomed. Health Inf. 24(7), 2119–2130 (2019)

    Google Scholar 

  8. Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256. JMLR Workshop and Conference Proceedings (2010)

    Google Scholar 

  9. González, S., Castillo, D., Galvez, J.M., Rojas, I., Herrera, L.J.: Feature selection and assessment of lung cancer sub-types by applying predictive models. In: International Work-Conference on Artificial Neural Networks, pp. 883–894. Springer (2019)

    Google Scholar 

  10. Grossman, R.L., et al.: Toward a shared vision for cancer genomic data. New England J. Med. 375(12), 1109–1112 (2016)

    Google Scholar 

  11. Hanna, N., et al.: Systemic therapy for stage iv non-small-cell lung cancer: american society of clinical oncology clinical practice guideline update. J. Clin. Oncol. (2017)

    Google Scholar 

  12. Huang, S.C., Pareek, A., Seyyedi, S., Banerjee, I., Lungren, M.P.: Fusion of medical imaging and electronic health records using deep learning: a systematic review and implementation guidelines. NPJ Digital Med. 3(1), 1–9 (2020)

    Article  Google Scholar 

  13. Kenfield, S.A., Wei, E.K., Stampfer, M.J., Rosner, B.A., Colditz, G.A.: Comparison of aspects of smoking among the four histological types of lung cancer. Tobacco Control 17(3), 198–204 (2008)

    Google Scholar 

  14. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  15. Lawrence, M., et al.: Software for computing and annotating genomic ranges. PLoS Comput. Biol. 9(8), e1003118 (2013)

    Google Scholar 

  16. Lee, T.Y., Huang, K.Y., Chuang, C.H., Lee, C.Y., Chang, T.H.: Incorporating deep learning and multi-omics autoencoding for analysis of lung adenocarcinoma prognostication. Comput. Biol. Chem. 87, 107277 (2020)

    Article  Google Scholar 

  17. Paszke, A., et al.: An imperative style, high-performance deep learning library. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’ Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32, pp. 8024–8035. Curran Associates, Inc. (2019), http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf

  18. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  19. Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Patt Anal. Mach. Intell. 27(8), 1226–1238 (2005)

    Article  Google Scholar 

  20. Portal, G.: Gdc rna-seq analysis pipeline. https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/Expression_mRNA_Pipeline/. Accessed 4 Jul 2020

  21. Qiu, Z.W., Bi, J.H., Gazdar, A.F., Song, K.: Genome-wide copy number variation pattern analysis and a classification signature for non-small cell lung cancer. Genes Chromosom. Cancer 56(7), 559–569 (2017)

    Article  Google Scholar 

  22. Ritchie, M.E., et al.: Limma powers differential expression analyses for rna-sequencing and microarray studies. Nucleic Acids Res. 43(7), e47–e47 (2015)

    Google Scholar 

  23. Ross, D.T., et al.: Systematic variation in gene expression patterns in human cancer cell lines. Nat. Genet. 24(3), 227–235 (2000)

    Google Scholar 

  24. Shlien, A., Malkin, D.: Copy number variations and cancer. Genome Med. 1(6), 1–9 (2009)

    Article  Google Scholar 

  25. Snoek, C.G., Worring, M., Smeulders, A.W.: Early versus late fusion in semantic video analysis. In: Proceedings of the 13th Annual ACM International Conference on Multimedia, pp. 399–402 (2005)

    Google Scholar 

  26. Subramanian, J., Govindan, R.: Lung cancer in never smokers: a review. J. Clin. Oncol. 25(5), 561–570 (2007)

    Article  Google Scholar 

  27. Sung, H., et al.: Global cancer statistics 2020: globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: a cancer J. Clin. 71(3), pp. 209-249 (2021)

    Google Scholar 

  28. Heigener, D.F., Reck, M.: Der Internist 58(12), 1258–1263 (2017). https://doi.org/10.1007/s00108-017-0339-4

  29. UK, C.R.: Types of lung cancer. https://www.cancerresearchuk.org/about-cancer/lung-cancer/stages-types-grades/types

  30. Weinstein, J.N., et al.: The cancer genome atlas pan-cancer analysis project. Nat. Genet. 45(10), 1113 (2013)

    Article  Google Scholar 

Download references

Acknowledgements

The results published here are in whole or part based upon data generated by the TCGA Research Network: https://www.cancer.gov/tcga.

This work was funded by the Spanish Ministry of Sciences, Innovation and Universities under Grant RTI2018-101674-B-I00 as part of project “Computer Architectures and Machine Learning-based solutions for complex challenges in Bioinformatics, Biotechnology and Biomedicine” and by the Government of Andalusia under the grant CV20-64934 as part of the project “Development of an intelligence platform for the integration of heterogenous sources of information (images, genetic information and proteomics) for the characterization and prediction of COVID-19 patients’ virulence and pathogenicity”. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of this manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Francisco Carrillo-Perez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Carrillo-Perez, F., Morales, J.C., Castillo-Secilla, D., Guillen, A., Rojas, I., Herrera, L.J. (2021). Comparison of Fusion Methodologies Using CNV and RNA-Seq for Cancer Classification: A Case Study on Non-Small-Cell Lung Cancer. In: Rojas, I., Castillo-Secilla, D., Herrera, L.J., Pomares, H. (eds) Bioengineering and Biomedical Signal and Image Processing. BIOMESIP 2021. Lecture Notes in Computer Science(), vol 12940. Springer, Cham. https://doi.org/10.1007/978-3-030-88163-4_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-88163-4_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-88162-7

  • Online ISBN: 978-3-030-88163-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics