Skip to main content

The Use of Hybrid Information Retrieve Technique and Bayesian Relevance Feedback Classification on Clinical Dataset

  • Conference paper
  • First Online:
Soft Computing in Data Science (SCDS 2019)

Abstract

Retrieval of information related to a subset variable or feature has become the attention of many researchers in data mining fields. The objective of feature selection (FS) is to improve the performance of the prediction. This contributes to providing a better definition of the features, feature structure, feature ranking, feature selection functions, efficient search techniques, and feature validation methods. In this study, a retrieval method that integrates correlation and linear forward selection algorithms to evaluate and generate the subset of clinical features are present. The objective of the research is to find the optimal features of a cancer dataset and to classify the disease into multiple cancer stages: one, two, three, and four. The research methodology is developed based on data mining, knowledge data discovery with four phases: pre-processing, resampling, feature selection, and classification. The proposed Bayesian Relevance Feedback (BRF) for classification is also described to resolve the zero value of posterior probabilities, concentrating on increasing the accuracy in the diagnosis of cancer stages. The experimental works are done on oral cancer dataset by applying WEKA. The analysis on accuracy performance was done on several classification algorithms using 15 optimal features that were chosen by a hybrid features selection method. The result shows that, BRF has outperformed others achieving 97.25% classification accuracy compared to the six classifiers, which are K-Nearest Neighbors Classifier, Multi Class Classifier, Tree-Random, Multilayer Perceptron, Naïve Bayes, and Support Vector Machine.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Han, J., Pei, J., Kamber, M.: Data mining: concepts and techniques, 2nd edn. Elsevier, San Francisco (2011)

    MATH  Google Scholar 

  2. Liao, S.H., Chu, P.H., Hsiao, P.Y.: Data mining techniques and applications–a decade review from 2000 to 2011. Expert Syst. Appl. 39(12), 11303–11311 (2012)

    Article  Google Scholar 

  3. Ngai, E.W., Hu, Y., Wong, Y.H., Chen, Y., Sun, X.: The application of data mining techniques in financial fraud detection: a classification framework and an academic review of literature. Decis. Support Syst. 50(3), 559–569 (2011)

    Article  Google Scholar 

  4. Carneiro, N., Figueira, G., Costa, M.: A data mining based system for credit-card fraud detection in e-tail. Decis. Support Syst. 95, 91–101 (2017)

    Article  Google Scholar 

  5. Romero, C., Ventura, S.: Educational data mining: a review of the state of the art. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 40(6), 601–618 (2010)

    Article  Google Scholar 

  6. Mohamad, S.K., Tasir, Z.: Educational data mining: a review. Procedia-Soc. Behav. Sci. 97, 320–324 (2013)

    Article  Google Scholar 

  7. Esfandiari, N., Babavalian, M.R., Moghadam, A.M.E., Tabar, V.K.: Knowledge discovery in medicine: Current issue and future trend. Expert Syst. Appl. 41(9), 4434–4463 (2014)

    Article  Google Scholar 

  8. Mohd, F., Jalil, M.A., Noor, N.M.M., Bakar, Z.A., Abdullah., Z.: Enhancement of Bayesian model with relevance feedback for improving diagnostic model. Malays. J. Comput. Sci. (Spec. Issue December), 1–14 (2018)

    Google Scholar 

  9. Dangare, C.S., Apte, S.S.: Improved study of heart disease prediction system using data mining classification techniques. Int. J. Comput. Appl. 47(10), 44–48 (2012)

    Google Scholar 

  10. Kourou, K., Exarchos, T.P., Exarchos, K.P., Karamouzis, M.V., Fotiadis, D.I.: Machine learning applications in cancer prognosis and prediction. Comput. Struct. Biotechnol. J. 13, 8–17 (2015)

    Article  Google Scholar 

  11. Borowska, K., Topczewska, M.: Data preprocessing in the classification of the imbalanced data. Adv. Comput. Sci. Res. 11, 31–46 (2014)

    Google Scholar 

  12. Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Burlington (2016)

    Google Scholar 

  13. Jain, D., Singh, V.: Feature selection and classification systems for chronic disease prediction: a review. Egypt. Inf. J. 19(3), 179–189 (2018)

    Article  Google Scholar 

  14. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3(Mar), 1157–1182 (2003)

    MATH  Google Scholar 

  15. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16(2002), 321–357 (2002)

    Article  Google Scholar 

  16. Bakar, Z.A., Mohd, F., Noor, N.M.M., Rajion, Z.A.: Demographic profile of oral cancer patients in East Coast of Peninsular Malaysia. Int. Med. J. 20(3), 362–364 (2013)

    Google Scholar 

  17. Hall, M.A., Correlation-based feature selection for machine learning. University of Waikato, Hamilton, NewZealand (1999)

    Google Scholar 

  18. Zhu, W., Zeng, N., Wang, N.: Sensitivity, specificity, accuracy, associated confidence interval and ROC analysis with practical SAS implementations. In: NESUG Proceedings: Health Care and Life Sciences, Baltimore, Maryland, vol. 19, p. 67 (2010)

    Google Scholar 

  19. Powers, D.M.: Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. J. Mach. Learn. Technol. 2(1), 37–63 (2011)

    MathSciNet  Google Scholar 

  20. Kautz, T., Eskofier, B.M., Pasluosta, C.F.: Generic performance measure for multiclass-classifiers. Pattern Recogn. 68, 111–125 (2017)

    Article  Google Scholar 

  21. Fushiki, T.: Estimation of prediction error by using K-fold cross-validation. Stat. Comput. 21(2), 137–146 (2011)

    Article  MathSciNet  Google Scholar 

  22. Zhang, Y., Yang, Y.: Cross-validation for selecting a model selection procedure. J. Econom. 187(1), 95–112 (2015)

    Article  MathSciNet  Google Scholar 

  23. Kraemer, H.C.: Kappa coefficient. Wiley StatsRef: Statistics Reference Online 1–4 (2014)

    Google Scholar 

  24. Boughorbel, S., Jarray, F., El-Anbari, M.: Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric. PLoS ONE 12(6), e0177678 (2017)

    Article  Google Scholar 

Download references

Acknowledgement

This study is partially funded by the JKKLA, Universiti Malaysia Terengganu (UMT).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fatihah Mohd .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mohd, F., Abdul Jalil, M., Mohamad Noor, N.M., Ismail, S., Abu Bakar, Z. (2019). The Use of Hybrid Information Retrieve Technique and Bayesian Relevance Feedback Classification on Clinical Dataset. In: Berry, M., Yap, B., Mohamed, A., Köppen, M. (eds) Soft Computing in Data Science. SCDS 2019. Communications in Computer and Information Science, vol 1100. Springer, Singapore. https://doi.org/10.1007/978-981-15-0399-3_16

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-0399-3_16

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-0398-6

  • Online ISBN: 978-981-15-0399-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics