Abstract
In a world with highly competitive markets, there is a great need in almost all business organizations to develop a highly effective coordination and decision support tool that can be used to become a daily life predictive enterprise to direct, optimize and automate specific decision-making processes. The improved decision-making support can help people to examine data on the past circumstances and present events, as well as project future actions, which will continually improve the quality of products or services. Such improvement has been driven by recent advances in digital data collection and storage technology. The new technology in data collection has resulted in the growth of massive databases, also known as data avalanches. These rapidly growing databases occur in various applications including service industry, global supply chain organizations, air traffic control, nuclear reactors, aircraft fly-by-wire, real time sensor networks, industrial process control, hospital healthcare, and security systems. The massive data, especially text records, on one hand, may contain a great wealth of knowledge and information, but on the other hand, contain other information that may not be reliable due to many uncertainty reasons in our changing environments. However, manually classifying thousands of text records according to their contents can be demanding and overwhelming. Data mining has gained a lot of attention from researchers and practitioners over the past decade as an emerging research area in finding meaningful patterns to make sense out of massive data sets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Cerrito P, Cerrito JC (2006) Data and text mining the electronic medical record to improve care and to lower costs. SAS SUGI Proceedings paper 077–31
Duda RO, Hart PE, Stork DG (2001) Pattern Classification, 2nd edn. Wiley, New York
Myllymaki P, Silander T, Tirri H, Uronen P (2001) Bayesian data mining on the web with B-Course. Proceedings of the 1st IEEE International Conference on Data Mining (ICDM-2001), pp. 626–629
Frand J (1996) Data mining: what is data mining? www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palace/datamining.htm
Liu B, Grossman R, Zhai Y (2003) Mining data records in web pages. Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD-2003), pp. 601–606
Myatt GJ (2006) Making Sense of Data: a Practical Guide to Exploratory Data Analysis and Data Mining. Wiley, New York
Dagli CH, Lee H-C (1997) Impacts of data mining technology on product design and planning. In: Plonka F, Olling G (eds) Computer applications in production and engineering. Chapman and Hall, Detroit, Michigan, pp. 58–7
Osuna E, Freund R, Girosi F (1997) Training support vector machines: an application to face detection. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 130–136
Han J, Kamber M (2006) Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann/Elsevier, USA
Berson A, Smith S, Thearling K (1999) Building Data Mining Applications for CRM. McGraw-Hill, New York
Yuan Y, Shaw MJ (1995) Induction of fuzzy decision trees. Fuzzy Sets and Systems 69:125–139
Hand DJ, Mannila H, Smyth P (2000) Principles of Data Mining. MIT Press, Mass., USA
Hartigan J (1975) Clustering algorithms. Wiley, New York
Fan H, Ramamohanarao K (2003) A Bayesian approach to use emerging patterns for classification. Proceedings of the 14th Australasian Database Conference, Adelaide, Australia, pp. 39–48
Schmidt M (1996) Identifying Speaker with Support Vector Networks. Proceedings of Interface, Sydney
Heyer LJ, Kruglyak S, Yooseph S (1999) Exploring expression data: identification and analysis of coexpressed genes. Genome Research 9:1106–1115
von Ahsen N, Oellerich M, Armstrong VW, Schütz E (1999) Application of a thermodynamic nearest-neighbor model to estimate nucleic acid stability and optimize probe design: prediction of melting points of multiple mutations of apolipoprotein B-3500 and factor V with a hybridization probe genotyping assay on the LightCycler. Clinical Chemistry 45:2094–2101
Bishop CM (1995) Neural Networks for Pattern Recognition. Clarendon Press, Oxford
Zeitouni K, Chelghoum N (2001) Spatial decision tree-application to traffic risk analysis. Computer Systems and Applications, ACS/IEEE International Conference, pp. 203–207
Ismail S, Manan bin Ahmad A (2004) Recurrent neural network with backpropagation through time algorithm for arabic recognition. IEEE International Symposium on Communications and Information Technology (ISCIT-2004), pp. 98–102
Kehtarnavaz N, Griswold N, Miller K, Lescoe P (1998) A transportable neural-network approach to autonomous vehicle following. IEEE Transactions on Vehicular Technology 47:694–702
Bennett KP, Mangasarian OL (1992) Robust linear programming discrimination of two linearly inseparable sets. Optimization Methods and Software 1:23–34
Burges CJC (1998) A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2:121–167
Scholkopf B, Burges C, Vapnik V (1995) Extracting support data for a given task. Proceedings of the 1st International Conference on Knowledge Discovery and Data Mining 1995, AAAI Press, Mass., USA, pp. 252–257
Blanz V, Scholkopf B, Bulthoff H et al. (1996) Comparison of view-based object recognition algorithms using realistic 3d models. Springer Lecture Notes in Computer Science 1112:251–256
Joachims T (1997) Text categorization with support vector machines. Technical report, LS VIII Number 23, University of Dortmund, ftp://ftp-ai.informatik.uni-dortmund.de/pub/Reports/report23.ps.Z
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer London
About this chapter
Cite this chapter
Chaovalitwongse, W., Pham, H., Hwang, S., Liang, Z., Pham, C. (2008). Recent Advances in Data Mining for Categorizing Text Records. In: Pham, H. (eds) Recent Advances in Reliability and Quality in Design. Springer Series in Reliability Engineering. Springer, London. https://doi.org/10.1007/978-1-84800-113-8_21
Download citation
DOI: https://doi.org/10.1007/978-1-84800-113-8_21
Publisher Name: Springer, London
Print ISBN: 978-1-84800-112-1
Online ISBN: 978-1-84800-113-8
eBook Packages: EngineeringEngineering (R0)