Skip to main content

Unstructured Data, NoSQL, and Terms Analytics

  • Chapter
  • First Online:
Big Data Applications and Use Cases

Abstract

Today’s high-dimensional data, which is mostly unstructured, makes data patterns discovery (a.k.a. data mining) challenging and difficult for services engineers. Unstructured data mining deviates from existing information extraction methodologies that have been previously put forward due to the fact that recent data formation and storage has no standard schema; and the data is heterogeneous. At the storage level, the NoSQL database has been proposed as a preferred technology to accommodate the high-dimensional data, and the technology has received significant enterprise adoption. At the technology level, the query style of NoSQL databases differ from schema-based storages such as the RDBMS. Currently, there is lack of tools, technologies, and methodologies that can aid the community to support data patterns discovery in the big data epoch. Previously, an Analytics-as-a-Service (AaaS) framework is proposed for terms mining in document-based NoSQL systems. In this chapter, we provide comprehensive views about the performance of several algorithms that have been employed to achieve the topics and terms mining tasks. This chapter is a reproduction of several proposed algorithms which can enable the software engineering community to realize what has been done regarding the enhancement of accuracy of terms mining form document-based NoSQL systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://couchdb.apache.org/

References

  1. M.R. Wigan, R. Clarke, Big data’s big unintended consequences. Computer 46(6), 46–53 (2013). doi:10.1109/MC.2013.195

    Article  Google Scholar 

  2. R. Akerkar, C. Badica, C. B. Burdescu, Desiderata for research in web intelligence, mining and semantics, in Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics (WIMS '12). ACM, New York, NY, USA, Article 0, 5 pages. DOI= 10.1145/2254129.2254131 http://doi.acm.org/10.1145/2254129.2254131

    Google Scholar 

  3. P. C. Zikopoulos, C. Eaton, D. de Roos, T. Deutsch, G. Lapis, Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data, Published by McGraw-Hill Companies, 2012. https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Big%20Data%20University/page/FREE%20ebook%20-%20Understanding%20Big%20Data

  4. K. Rupanagunta, D. Zakkam, H. Rao, How to Mine Unstructured Data, Article in Information Management, June 29 2012, http://www.information-management.com/newsletters/data-mining-unstructured-big-data-youtube--10022781-1.html

  5. IBM Research, Analytics-as-a-Service Platform, Available: http://researcher.ibm.com/researcher/view_project.php?id=3992

  6. J. Sequeda, D. P. Miranker, “Linked Data,” Linked Data tutorial at Semtech 2012, Jun 07, 2012. Available: http://www.slideshare.net/juansequeda/linked-data-tutorial-at-semtech-2012

  7. Google Knowledge Graph, Available: http://www.google.ca/insidesearch/features/search/knowledge.html

  8. NoSQL, http://nosql-database.org/

  9. EMC, EMC Accelerates Journey to Big Data with Business Analytics-as-a-Service, http://www.emc.com/collateral/white-papers/h11259-emc-accelerates-journey-big-data-ba-wp.pdf

  10. SAS, Analytics as a Service: Customer Experiences, http://www.sas.com/offices/europe/uk/resources/brochure/aaas_research_brief.pdf

  11. X. Sun, B. Gao, L. Fan, W. An, A Cost-Effective Approach to Delivering Analytics as a Service, IEEE 19th International Conference on Web Services (ICWS 2012), vol., no., pp.512,519, 24–29 June 2012, doi: 10.1109/ICWS.2012.79

    Google Scholar 

  12. P. Deepak, P. M. Deshpande, K. Murthy, Configurable and Extensible Multi-flows for Providing Analytics as a Service on the Cloud, 2012 Annual SRII Global Conference (SRII), vol., no., pp.1,10, 24–27 July 2012, doi: 10.1109/SRII.2012.11

    Google Scholar 

  13. D. Keim, J. Kohlhammer, G. Ellis, F. Mansmann, Mastering the Information Age Solving Problems with Visual Analytics, Printed in Germany, Druckhaus “Thomas Müntzer” GmbH, Bad Langensalza ISBN 978-3-905673-77-7

    Google Scholar 

  14. F. S. Gharehchopogh, Z. A. Khalifelu, Analysis and evaluation of unstructured data: text mining versus natural language processing, Application of Information and Communication Technologies (AICT), 2011 5th International Conference on, vol., no., pp.1–4, 12–14 Oct. 2011, doi: 10.1109/ICAICT.2011.6111017

    Google Scholar 

  15. V. Tunali, T. T. Bilgin, PRETO: A High-performance Text Mining Tool for Preprocessing Turkish Texts, 2012 International Conference on Computer Systems and Technologies

    Google Scholar 

  16. S.V. Vinchurkar, S.M. Nirkhi, Feature extraction of product from customer feedback through blog. Int. J. Emerg. Technol. Adv. Eng. 2(1), 314–323 (2012). ISSN 2250-2459

    Google Scholar 

  17. D. Kuonen, Challenges in bioinformatics for statistical data miners. Bull. Swiss Stat. Soc. 46, 10–17 (2003)

    Google Scholar 

  18. J. Y. Hsu, W. Yih, Template-Based Information Mining from HTML Documents, American Association for Artificial Intelligence, July 1997

    Google Scholar 

  19. M. Delgado, M. Martín-Bautista, D. Sánchez, M. Vila, Mining Text Data: Special Features and Patterns, Pattern Detection and Discovery, Lecture Notes in Computer Science, 2002, Volume 2447/2002, 175-186, DOI: 10.1007/3-540-45728-3_11

    Google Scholar 

  20. Q. Zhao, S. S. Bhowmick, Association Rule Mining: A Survey, Technical Report, CAIS, Nanyang Technological University, Singapore, No. 2003116, 2003

    Google Scholar 

  21. W. Abramowicz, T. Kaczmarek, M. Kowalkiewicz, Supporting topic map creation using data mining techniques. Aust. J. Inf. Syst. 11(1), 63–78 (2003)

    Google Scholar 

  22. B. Janet, A. V. Reddy, Cube index for unstructured text analysis and mining, in Proceedings of the 2011 International Conference on Communication, Computing & Security (ICCCS '11). ACM, New York, NY, USA, 397–402

    Google Scholar 

  23. L. Han, T.O. Suzek, Y. Wang, S.H. Bryant, The text-mining based PubChem Bioassay neighboring analysis. BMC Bioinformatics 11, 549 (2010). doi:10.1186/1471-2105-11-549

    Article  Google Scholar 

  24. L. Dey, S. K. M. Haque, Studying the effects of noisy text on text mining applications, in Proceedings of the Third Workshop on Analytics for Noisy Unstructured Text Data (AND '09). ACM, New York, NY, USA, 107–114

    Google Scholar 

  25. S. Godbole, I. Bhattacharya, A. Gupta, A. Vea, Building re-usable dictionary repositories for real-world text mining, in Proceedings of the 19th ACM international conference on Information and knowledge management (CIKM '10). ACM, New York, NY, USA, 1189–1198

    Google Scholar 

  26. R. Feldman, M. Fresko, H. Hirsh, Y. Aumann, O. Liphstat, Y. Schler, M. Rajman, Knowledge Management: A Text Mining Approach, Proc. of the 2nd Int. Conf. on Practical Aspects of Knowledge Management (PAKM98), (Basel, Switzerland, 29–30 Oct 1998)

    Google Scholar 

  27. R. Feldman, M. Fresko, Y. Kinar, Y. Lindell, O. Liphstat, M. Rajman, Y. Schler, O. Zamir, Text mining at the term level, Proc. of the 2nd European Symposium on Principles of Data Mining and Knowledge Discovery (PKDD'98)

    Google Scholar 

  28. J. C. Scholtes, Text-Mining: The next step in search technology, DESI-III Workshop Barcelona, June 8, 2009

    Google Scholar 

  29. J. Lee, D. Grossman, O. Frieder, M. C. Mccabe, Integrating structured data and text: a multi-dimensional approach, Proc. of Information Technology: Coding and Computing, 2000. International Conference on, vol., no., pp. 264–269, 2000

    Google Scholar 

  30. V. Gupta, G.S. Lehal, A survey of text mining techniques and applications. J. Emerg. Technol. Web Intell. 1(1), 60–76 (2009)

    Google Scholar 

  31. R.K. Lomotey, R. Deters, Analytics-as-a-Service framework for terms association mining in unstructured data. Int. J. Bus. Process Integrat. Manag. 7(1), 49–61 (2014)

    Article  Google Scholar 

  32. Y. Gu, C. Kallas, J. Zhang, J. Marx, J. Tjoe, Automatic Patient Search Using Bernoulli Model. in Proc. of 2013 I.E. International Conference on Healthcare Informatics (ICHI 2013), pp. 517–522, Sept 9–11 2013, (Philadelphia, PA, USA, 2013)

  33. R. K. Lomotey, R. Deters, Terms extraction from unstructured data silos, 8th International Conference on System of Systems Engineering (SoSE 13), (2013) pp. 19–24, 2–6 June 2013, doi: 10.1109/SYSoSE.2013.6575236

    Google Scholar 

  34. T. Scheffer, C. Decomain, S. Wrobel, Mining the Web with active hidden Markov models, ICDM 2001, Proceedings IEEE International Conference on Data Mining, vol., no., pp. 645–646, 2001, doi: 10.1109/ICDM.2001.989591

    Google Scholar 

  35. S. Mukherjee, S.J. Mitra, Hidden Markov Models, grammars, and biology: a tutorial. J. Bioinform. Comput. Biol. 3(2), 491–526 (2005)

    Article  Google Scholar 

  36. R. K. Lomotey, R. Deters, Data Mining from NoSQL Document-Append Style Storages. Proc. of the 2014 I.E. International Conference on Web Services (ICWS 2014), pp. 385–392, June 27–July 02, 2014, (Anchorage, Alaska, USA, 2014)

    Google Scholar 

  37. R. K. Lomotey, R. Deters, RSenter: tool for topics and terms extraction from unstructured data debris. Proc. of the 2013 I.E. International Congress on Big Data, pp. 395–402, Santa Clara, California, 27 June–2 July 2013

    Google Scholar 

  38. S. Haiduc, G. Bavota, R. Oliveto, A. de Lucia, A. Marcus, Automatic Query Performance Assessment during the Retrieval of Software Artifacts, Automated Software Engineering 2012 (ASE ’12), September 3–7, 2012, Essen, Germany

    Google Scholar 

  39. A. Balinsky, H. Balinsky, S. Simske, On the Helmholtz Principle for Data Mining, Published by Hewlett-Packard Development Company, L.P. (2010). Available: http://www.hpl.hp.com/techreports/2010/HPL-2010-133.pdf

  40. Erlang Programing Language, http://www.erlang.org/

Download references

Acknowledgement

• Special thanks to grad students in the MADMUC Lab, University of Saskatchewan.

• Thanks to Prof. Patrick Hung of the IT Security Unit, University of Ontario Institute of Technology.

• Final thanks to the Editors and Reviewers of this chapter for their feedback.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Richard K. Lomotey .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Lomotey, R.K., Deters, R. (2016). Unstructured Data, NoSQL, and Terms Analytics. In: Hung, P. (eds) Big Data Applications and Use Cases. International Series on Computer Entertainment and Media Technology. Springer, Cham. https://doi.org/10.1007/978-3-319-30146-4_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-30146-4_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-30144-0

  • Online ISBN: 978-3-319-30146-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics