Skip to main content

COVID-19 Biomarkers Detection Using ‘KnowSeq’ R Package

  • Conference paper
  • First Online:
Bioengineering and Biomedical Signal and Image Processing (BIOMESIP 2021)

Abstract

‘KnowSeq’ R Package includes all the essential tools to carry out transcriptomic analysis, providing intuitive functions to build efficient and robust pipelines. In this paper, its capacities are demonstrated in a practical COVID-19 biomarkers detection problem using RNA-Sequencing data. Through Machine Learning techniques such as feature selection and supervised classification models, a clinical decision system for COVID-19 was developed using four genes proposed as COVID-19 signature: OAS3, CXCL9, IFITM1 and IFIT3. These four genes are highly related to different processes that affect the immune system behaviour and its response when facing viruses such as SARS-CoV-2. The final model reaches an accuracy over 97% when predicting over unseen samples.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. NCBI. Genbank and WGS statistics. https://www.ncbi.nlm.nih.gov/genbank/statistics/. Accessed May 2021

  2. National human genome research institute. the cost of sequencing a human genome. https://www.genome.gov/about-genomics/fact-sheets/Sequencing-Human-Genome-cost. Accessed May 2021

  3. Fernald, G.H., Capriotti, E., Daneshjou, R., Karczewski, K.J., Altman, R.B.: Bioinformatics challenges for personalized medicine. Bioinformatics 27(13), 1741–1748 (2011)

    Article  Google Scholar 

  4. Overby, C.L., Tarczy-Hornoch, P.: Personalized medicine: challenges and opportunities for translational bioinformatics. Pers. Med. 10(5), 453–462 (2013)

    Article  Google Scholar 

  5. Suwinski, P., Ong, C., Ling, M.H., Poh, Y.M., Khan, A.M., Ong, H.S.: Advancing personalized medicine through the application of whole exome sequencing and big data analytics. Front. Genet. 10, 49 (2019)

    Article  Google Scholar 

  6. Lightbody, G., et al.: Review of applications of high-throughput sequencing in personalized medicine: barriers and facilitators of future progress in research and clinical application. Briefings Bioinform. 20(5), 1795–1811 (2019)

    Google Scholar 

  7. Castillo, D., et al.: Leukemia multiclass assessment and classification from microarray and rna-seq technologies integration at gene expression level. PloS One 14(2), e0212127 (2019)

    Google Scholar 

  8. Fan, Z., Jamil, M., Sadiq, M.T., Huang, X., Yu, X.: Exploiting multiple optimizers with transfer learning techniques for the identification of COVID-19 patients. J. Healthcare Eng. 2020, 8889412 (2020)

    Google Scholar 

  9. Akbari, H., et al.: Depression recognition based on the reconstruction of phase space of eeg signals and geometrical features. Appl. Acoust. 179, 108078 (2021)

    Google Scholar 

  10. Sadiq, M.T., Yu, X., Yuan, Z.: Exploiting dimensionality reduction and neural network techniques for the development of expert brain–computer interfaces. Expert Syst. Appl. 164, 114031 (2021)

    Google Scholar 

  11. Hassantabar, S., Wang, Z., Jha, N.K.: SCANN: synthesis of compact and accurate neural networks. arXiv preprint arXiv:1904.09090 (2019)

  12. Hassantabar, S., Dai, X., Jha, N.K.: Steerage: synthesis of neural networks using architecture search and grow-and-prune methods. arXiv preprint arXiv:1912.05831 (2019)

  13. Hassantabar, S., Terway, P., Jha, N.K.: Tutor: training neural networks using decision rules as model priors. arXiv preprint arXiv:2010.05429 (2020)

  14. Hassantabar, S., et al.: COVIDDEEP: SARS-COV-2/COVID-19 test based on wearable medical sensors and efficient neural networks. arXiv preprint arXiv:2007.10497 (2020)

  15. Imran, A., et al.: AI4COVID-19: AI enabled preliminary diagnosis for COVID-19 from cough samples via an app. Inform. Med. Unlocked 20, 100378 (2020)

    Google Scholar 

  16. Farooq, M., Hafeez, A.: COVID-ResNet: a deep learning framework for screening of covid19 from radiographs. arXiv preprint arXiv:2003.14395 (2020)

  17. Hassantabar, S., Ahmadi, M., Sharifi, A.: Diagnosis and detection of infected tissue of COVID-19 patients based on lung x-ray image using convolutional neural network approaches. Chaos Solitons Fractals 140, 110170 (2020)

    Article  MathSciNet  Google Scholar 

  18. Besser, J., Carleton, H.A., Gerner-Smidt, P., Lindsey, R.L., Trees, E.: Next-generation sequencing technologies and their application to the study and control of bacterial infections. Clin. Microbiol. Infection 24(4), 335–341 (2018)

    Article  Google Scholar 

  19. Ozsolak, F., Milos, P.M.: RNA sequencing: advances, challenges and opportunities. Nat. Rev. Genet. 12(2), 87–98 (2011)

    Article  Google Scholar 

  20. Technology networks. RNA-Seq: Basics, applications and protocol. https://www.technologynetworks.com/genomics/articles/rna-seq-basics-applications-and-protocol-299461. Accessed May 2021

  21. Wang, C., et al.: RNA-Seq profiling of circular RNA in human lung adenocarcinoma and squamous cell carcinoma. Mol. Cancer 18(1), 1–6 (2019)

    Article  Google Scholar 

  22. Wang, J., Dean, D.C., Hornicek, F.J., Shi, H., Duan, Z.: RNA sequencing (RNA-Seq) and its application in ovarian cancer. Gynecol. Oncol. 152(1), 194–201 (2019)

    Article  Google Scholar 

  23. Andres-Terre, M., et al.: Integrated, multi-cohort analysis identifies conserved transcriptional signatures across multiple respiratory viruses. Immunity 43(6), 1199–1211 (2015)

    Article  Google Scholar 

  24. Woods, C.W., et al.: A host transcriptional signature for presymptomatic detection of infection in humans exposed to influenza H1N1 or H3N2. PloS One 8(1), e52198 (2013)

    Google Scholar 

  25. Wang, D., Li, J.R., Zhang, Y.H., Chen, L., Huang, T., Cai, Y.D.: Identification of differentially expressed genes between original breast cancer and xenograft using machine learning algorithms. Genes 9(3), 155 (2018)

    Article  Google Scholar 

  26. Townes, F.W., Hicks, S.C., Aryee, M.J., Irizarry, R.A.: Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model. Genome Biol. 20(1), 1–16 (2019)

    Article  Google Scholar 

  27. Saeys, Y., Inza, I., Larranaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)

    Article  Google Scholar 

  28. Lee, C.P., Leu, Y.: A novel hybrid feature selection method for microarray data analysis. Appl. Soft Comput. 11(1), 208–213 (2011)

    Article  Google Scholar 

  29. Ayyad, S.M., Saleh, A.I., Labib, L.M.: Gene expression cancer classification using modified k-nearest neighbors technique. Biosystems 176, 41–51 (2019)

    Article  Google Scholar 

  30. Cristianini, N., Shawe-Taylor, J., et al.: An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, Cambridge (2000)

    Google Scholar 

  31. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  Google Scholar 

  32. Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)

    Article  Google Scholar 

  33. Castillo-Secilla, D., et al.: KnowSeq R-Bioc package: the automatic smart gene expression tool for retrieving relevant biological knowledge. Comput. Biol. Med. 133, 104387 (2021)

    Article  Google Scholar 

  34. Gentleman, R.C., et al.: Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 5(10), 1–16 (2004)

    Article  Google Scholar 

  35. Barrett, T., et al.: NCBI geo: archive for functional genomics data sets‒’update. Nucl. Acids Res. 41(D1), D991–D995 (2012)

    Article  Google Scholar 

  36. Massey, F.J., Jr.: The Kolmogorov-Smirnov test for goodness of fit. J. Am. Stat. Assoc. 46(253), 68–78 (1951)

    Article  Google Scholar 

  37. Walfish, S.: A review of statistical outlier methods. Pharm. Technol. 30(11), 82 (2006)

    Google Scholar 

  38. Fujita, A., Sato, J.R., Demasi, M.A.A., Sogayar, M.C., Ferreira, C.E., Miyano, S.: Comparing Pearson, Spearman and Hoeffding’s d measure for gene expression association analysis. J. Bioinform. Comput. Biol. 7(04), 663–684 (2009)

    Article  Google Scholar 

  39. Dudoit, S., Fridlyand, J., Speed, T.P.: Comparison of discrimination methods for the classification of tumors using gene expression data. J. Am. Stat. Assoc. 97(457), 77–87 (2002)

    Article  MathSciNet  Google Scholar 

  40. Lazar, C., et al.: Batch effect removal methods for microarray gene expression data integration: a survey. Briefings Bioinform. 14(4), 469–490 (2013)

    Article  Google Scholar 

  41. Zhang, Y., Parmigiani, G., Johnson, W.E.: Combat-seq: batch effect adjustment for RNA-Seq count data. NAR Genom. Bioinform. 2(3), lqaa078 (2020)

    Google Scholar 

  42. Leek, J.T., Storey, J.D.: Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet. 3(9), e161 (2007)

    Article  Google Scholar 

  43. Love, M.I., Huber, W., Anders, S.: Moderated estimation of fold change and dispersion for RNA-Seq data with DESeq2. Genome Biol. 15(12), 1–21 (2014)

    Article  Google Scholar 

  44. Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)

    Article  Google Scholar 

  45. Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(11), 2579–2605 (2008)

    MATH  Google Scholar 

  46. John, C.R., et al.: M3c: Monte Carlo reference-based consensus clustering. Sci. Rep. 10(1), 1–14 (2020)

    Article  Google Scholar 

  47. DeDiego, M.L., Martinez-Sobrido, L., Topham, D.J.: Novel functions of IFI44l as a feedback regulator of host antiviral responses. J. Virol. 93(21), e01159-19 (2019)

    Article  Google Scholar 

  48. Fensterl, V., Sen, G.C.: The ISG56/IFIT1 gene family. J. Interferon Cytokine Res. 31(1), 71–78 (2011)

    Article  Google Scholar 

  49. Yang, G., Xu, Y., Chen, X., Hu, G.: IFITM1 plays an essential role in the antiproliferative action of interferon-\(\gamma \). Oncogene 26(4), 594–603 (2007)

    Article  Google Scholar 

  50. Rebouillat, D., Hovanessian, A.G.: The human 2’, 5’-oligoadenylate synthetase family: interferon-induced proteins with unique enzymatic properties. J. Interferon Cytokine Res. 19(4), 295–308 (1999)

    Article  Google Scholar 

  51. Coperchini, F., Chiovato, L., Ricci, G., Croce, L., Magri, F., Rotondi, M.: The cytokine storm in COVID-19: further advances in our understanding the role of specific chemokines involved. Cytokine Growth Factor Rev. 58, 82–91 (2021)

    Article  Google Scholar 

  52. Coperchini, F., Chiovato, L., Rotondi, M.: Interleukin-6, CXCL10 and infiltrating macrophages in COVID-19-related cytokine storm: not one for all but all for one! Front. Immunol. 12, 668507 (2021)

    Google Scholar 

  53. Shaath, H., Vishnubalaji, R., Elkord, E., Alajez, N.M.: Single-cell transcriptome analysis highlights a role for neutrophils and inflammatory macrophages in the pathogenesis of severe COVID-19. Cells 9(11), 2374 (2020)

    Article  Google Scholar 

  54. Jain, R., et al.: Host transcriptomic profiling of COVID-19 patients with mild, moderate, and severe clinical outcomes. Comput. Struct. Biotechnol. J. 19, 153–160 (2021)

    Article  Google Scholar 

  55. Blot, M., et al.: CXCL10 could drive longer duration of mechanical ventilation during COVID-19 ARDS. Critical Care 24(1), 1–15 (2020)

    Article  Google Scholar 

  56. Callahan, V., et al.: The pro-inflammatory chemokines CXCL9, CXCL10 and CXCL11 are upregulated following SARS-COV-2 infection in an AKT-dependent manner. Viruses 13(6), 1062 (2021)

    Article  Google Scholar 

  57. Zhou, S., et al.: A neanderthal OAS1 isoform protects individuals of European ancestry against COVID-19 susceptibility and severity. Nat. Med. 27(4), 659–667 (2021)

    Article  Google Scholar 

  58. Wu, M., et al.: Profiling Covid-19 genetic research: a data-driven study utilizing intelligent bibliometrics. Front. Res. Metrics Analytics 6, 30 (2021)

    Article  Google Scholar 

Download references

Acknowledgements

This work was funded by the Government of Andalusia under the Project CV20-64934 titled “Development of an intelligent platform that allows the integration of heterogeneous information sources (imaging, genetics and proteomics) for the characterization and prediction of virulence and pathogenicity in patients with COVID-19”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Javier Bajo-Morales .

Editor information

Editors and Affiliations

Ethics declarations

Supplemetary Information

Open source code is available at https://github.com/jbajo09/BIOMESIP-COVID19-KNOWSEQ for researchers to replicate the KnowSeq pipeline proposed.

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bajo-Morales, J., Castillo-Secilla, D., Herrera, L.J., Rojas, I. (2021). COVID-19 Biomarkers Detection Using ‘KnowSeq’ R Package. In: Rojas, I., Castillo-Secilla, D., Herrera, L.J., Pomares, H. (eds) Bioengineering and Biomedical Signal and Image Processing. BIOMESIP 2021. Lecture Notes in Computer Science(), vol 12940. Springer, Cham. https://doi.org/10.1007/978-3-030-88163-4_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-88163-4_37

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-88162-7

  • Online ISBN: 978-3-030-88163-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics