Abstract
Semantic discretization, which is relatively a new concept, can be viewed as the discretization technique that uses the semantics of the data along with its value. The semantics of the data refer to the domain knowledge inherent in the data. The semantics of data is derived from the data value itself. Objective and context of the study also contribute significantly to identifying semantic of the data. Since no explicit ontology is associated with the data in semantic discretization, identifying, interpreting, and exploiting, the semantics of the data is a challenging task. This paper presents a novel algorithm for semantic discretization, in which machine learning techniques such as classification and association rule mining is used to derive semantic knowledge, which is further used for discretization. To show the effectiveness of the proposed semantic discretization algorithm, we applied it on diabetes dataset. Experimental results show 2–15% improvement in classification accuracy on semantically discretized dataset in comparison to the original and statistically discretized dataset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Yang YW, Wu GI, Maimon X, Oded Rokach L Book section, discretization methods, data mining and knowledge discovery handbook, 2005, Springer US, Boston, MA @ 978-0-387-25465-4
Chandrakar O, Saini JR (2017) Knowledge based semantic discretization using data mining techniques. Int J Adv Intell Parad
Liu H, Hussain F, Tan CL, Dash M (2002) Discretization: an enabling technique. Data Min Knowl Disc 6(4):393–423
Dougherty J, Kohavi R, Sahami M (1995) Supervised and unsupervised discretization of continuous features. In: Proceedings of the twelfth international conference on machine learning (ICML), 1995, pp 194–202
Yang Y, Webb GI, Wu X (2010) Discretization methods. In: Data mining and knowledge discovery handbook, pp 101–116
Li R-P, Wang Z-O (2002) An entropy-based discretization method for classification rules with inconsistency checking. In: Proceedings of the first international conference on machine learning and cybernetics (ICMLC), pp 243–246
Yang Y, Webb GI (2009) Discretization for naive-bayes learning: managing discretization bias and variance. Mach Learn 74(1):39–74
Bay SD (2001) Multivariate discretization for set mining. Knowl Inf Syst 3:491–512
Cerquides J, Lopez R (1997) Proposal and empirical comparison of a parallelizable distance-based discretization method. In: III international conference on knowledge discovery and data mining (KDDM97). Newport Beach, California, USA, pp 139–142
Steck H, Jaakkola T (2004) Predictive discretization during model selection. In: XXVI symposium in pattern recognition (DAGM04). Lecture notes in computer science 3175, Springer, Tbingen, Germany, pp 1–8
Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc.
Au W-H, Chan KCC, Wong AKC (2006) A fuzzy approach to partitioning continuous attributes for classification. IEEE Trans Knowl Data Eng 18(5):715–719
Kerber R (1992) ChiMerge: discretization of numeric attributes. X national conference on artificial intelligence American association (AAAI92). USA, pp 123–128
Chandrakar O, Saini JR Development of Indian weighted diabetic risk score (IWDRS) using machine learning techniques for type-2 diabetes. In: COMPUTE ‘16 proceedings of the 9th annual ACM India conference. ACM New York, NY, USA, pp 125–128. ©2016, ISBN: 978-1-4503-4808-9. https://doi.org/10.1145/2998476.2998497
Bouckaert RR, Frank E, Hall M, Kirkby R, Reutemann P, Seewald A, Scuse D (2016) WEKA manual for version 3-8-1. University of Waikato, Hamilton, New Zealand
Chandrakar O, Saini JR Questionnaire for deriving diabetic risk score for Indian population. Accepted for presentation and publication at international conference on artificial intelligence in health care, ICAIHC-2016
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Chandrakar, O., Saini, J.R., Bhatti, D.G. (2019). Novel Semantic Discretization Technique for Type-2 Diabetes Classification Model. In: Saini, H., Sayal, R., Govardhan, A., Buyya, R. (eds) Innovations in Computer Science and Engineering. Lecture Notes in Networks and Systems, vol 74. Springer, Singapore. https://doi.org/10.1007/978-981-13-7082-3_17
Download citation
DOI: https://doi.org/10.1007/978-981-13-7082-3_17
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-7081-6
Online ISBN: 978-981-13-7082-3
eBook Packages: EngineeringEngineering (R0)