Skip to main content

Text Encoding

  • Chapter
  • First Online:
Text Mining

Part of the book series: Studies in Big Data ((SBD,volume 45))

Abstract

This chapter is concerned with the process of encoding texts into numerical vectors as their representations, and its overview will be presented in Sect. 3.1.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alpaydin, E.: Introduction to Machine Learning. MIT Press, Cambridge (2014)

    Google Scholar 

  2. Hyvarinen, A., Oja, E.: Independent component analysis: algorihtms and applications. Neural Netw. 4–5, 411–430 (2000)

    Article  Google Scholar 

  3. Jo, T.: The Implementation of Dynamic Document Organization Using the Integration of Text Clustering and Text Categorization, University of Ottawa (2006)

    Google Scholar 

  4. Jo, T.: Modified version of SVM for text categorization. Int. J. Fuzzy Log. Intell. Syst. 8, 52–60 (2008)

    Article  Google Scholar 

  5. Jo, T.: Inverted Index based modified version of KNN for text categorization. J. Inf. Process. Syst. 4, 17–26 (2008)

    Article  Google Scholar 

  6. Jo, T.: Neural text categorizer for exclusive text categorization. J. Inf. Process. Syst. 4, 77–86 (2008)

    Article  Google Scholar 

  7. Jo, T.: NTC (Neural Text Categorizer): neural network for text categorization. Int. J. Inf. Stud. 2, 83–96 (2010)

    Google Scholar 

  8. Jo, T.: Definition of table similarity for news article classification. In: The Proceedings of Fourth International Conference on Data Mining, pp. 202–207 (2012)

    Google Scholar 

  9. Jo, T.: Index optimization with KNN considering similarities among features. In: The Proceedings of 14th International Conference on Advances in Information and Knowledge Engineering, pp. 120–124 (2015)

    Google Scholar 

  10. Jo, T.: Normalized table matching algorithm as approach to text categorization. Soft Comput. 19, 839–849 (2015)

    Article  MathSciNet  Google Scholar 

  11. Jo, T.: Keyword extraction by KNN considering feature similarities. In: The Proceedings of The 2nd International Conference on Advances in Big Data Analysis, pp. 64–68 (2015)

    Google Scholar 

  12. Jo, T.: KNN based word categorization considering feature similarities. In: The Proceedings of 17th International Conference on Artificial Intelligence, pp. 343–346 (2015)

    Google Scholar 

  13. Jo, T., Cho, D.: Index based approach for text categorization. Int. J. Math. Comput. Simul. 2, 127–132 (2008)

    Google Scholar 

  14. Jo, T., Japkowicz, N.: Text clustering using NTSO. In: The Proceedings of IJCNN, pp. 558–563 (2005)

    Google Scholar 

  15. Jo, T., Lee, M., Kim, Y.: String vectors as a representation of documents with numerical vectors in text categorization. J. Converg. Inf. Technol. 2 66–73 (2007)

    Google Scholar 

  16. Kaski, S., Honkela, T., Lagus, K., Kohonen, T.: WEBSOM-Self organizing maps of document collections. Neurocomputing 21, 101–117 (1998)

    Article  Google Scholar 

  17. Leslie, C.S., Eskin, E., Cohen, A., Weston, J., Noble, W.S.: Mismatch String Kernels for Discriminative Protein Classification. Bioinformatics 20, 467–476 (2004)

    Article  Google Scholar 

  18. Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text classification with string kernels. J. Mach. Learn. Res. 2, 419–444 (2002)

    Google Scholar 

  19. Poole, D.: Linear Algebra: A Modern Introduction. Brooks/Collen, Pacific Grove (2003)

    Google Scholar 

  20. Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34, 1–47 (2002)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer International Publishing AG, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Jo, T. (2019). Text Encoding. In: Text Mining. Studies in Big Data, vol 45. Springer, Cham. https://doi.org/10.1007/978-3-319-91815-0_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-91815-0_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-91814-3

  • Online ISBN: 978-3-319-91815-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics