Unstructured Data, NoSQL, and Terms Analytics

Lomotey, Richard K.; Deters, Ralph

doi:10.1007/978-3-319-30146-4_6

Richard K. Lomotey³ &
Ralph Deters⁴

Part of the book series: International Series on Computer Entertainment and Media Technology ((ISCEMT))

2635 Accesses
2 Citations

Abstract

Today’s high-dimensional data, which is mostly unstructured, makes data patterns discovery (a.k.a. data mining) challenging and difficult for services engineers. Unstructured data mining deviates from existing information extraction methodologies that have been previously put forward due to the fact that recent data formation and storage has no standard schema; and the data is heterogeneous. At the storage level, the NoSQL database has been proposed as a preferred technology to accommodate the high-dimensional data, and the technology has received significant enterprise adoption. At the technology level, the query style of NoSQL databases differ from schema-based storages such as the RDBMS. Currently, there is lack of tools, technologies, and methodologies that can aid the community to support data patterns discovery in the big data epoch. Previously, an Analytics-as-a-Service (AaaS) framework is proposed for terms mining in document-based NoSQL systems. In this chapter, we provide comprehensive views about the performance of several algorithms that have been employed to achieve the topics and terms mining tasks. This chapter is a reproduction of several proposed algorithms which can enable the software engineering community to realize what has been done regarding the enhancement of accuracy of terms mining form document-based NoSQL systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://couchdb.apache.org/

References

M.R. Wigan, R. Clarke, Big data’s big unintended consequences. Computer 46(6), 46–53 (2013). doi:10.1109/MC.2013.195
Article Google Scholar
R. Akerkar, C. Badica, C. B. Burdescu, Desiderata for research in web intelligence, mining and semantics, in Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics (WIMS '12). ACM, New York, NY, USA, Article 0, 5 pages. DOI= 10.1145/2254129.2254131 http://doi.acm.org/10.1145/2254129.2254131
Google Scholar
P. C. Zikopoulos, C. Eaton, D. de Roos, T. Deutsch, G. Lapis, Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data, Published by McGraw-Hill Companies, 2012. https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Big%20Data%20University/page/FREE%20ebook%20-%20Understanding%20Big%20Data
K. Rupanagunta, D. Zakkam, H. Rao, How to Mine Unstructured Data, Article in Information Management, June 29 2012, http://www.information-management.com/newsletters/data-mining-unstructured-big-data-youtube--10022781-1.html
IBM Research, Analytics-as-a-Service Platform, Available: http://researcher.ibm.com/researcher/view_project.php?id=3992
J. Sequeda, D. P. Miranker, “Linked Data,” Linked Data tutorial at Semtech 2012, Jun 07, 2012. Available: http://www.slideshare.net/juansequeda/linked-data-tutorial-at-semtech-2012
Google Knowledge Graph, Available: http://www.google.ca/insidesearch/features/search/knowledge.html
NoSQL, http://nosql-database.org/
EMC, EMC Accelerates Journey to Big Data with Business Analytics-as-a-Service, http://www.emc.com/collateral/white-papers/h11259-emc-accelerates-journey-big-data-ba-wp.pdf
SAS, Analytics as a Service: Customer Experiences, http://www.sas.com/offices/europe/uk/resources/brochure/aaas_research_brief.pdf
X. Sun, B. Gao, L. Fan, W. An, A Cost-Effective Approach to Delivering Analytics as a Service, IEEE 19th International Conference on Web Services (ICWS 2012), vol., no., pp.512,519, 24–29 June 2012, doi: 10.1109/ICWS.2012.79
Google Scholar
P. Deepak, P. M. Deshpande, K. Murthy, Configurable and Extensible Multi-flows for Providing Analytics as a Service on the Cloud, 2012 Annual SRII Global Conference (SRII), vol., no., pp.1,10, 24–27 July 2012, doi: 10.1109/SRII.2012.11
Google Scholar
D. Keim, J. Kohlhammer, G. Ellis, F. Mansmann, Mastering the Information Age Solving Problems with Visual Analytics, Printed in Germany, Druckhaus “Thomas Müntzer” GmbH, Bad Langensalza ISBN 978-3-905673-77-7
Google Scholar
F. S. Gharehchopogh, Z. A. Khalifelu, Analysis and evaluation of unstructured data: text mining versus natural language processing, Application of Information and Communication Technologies (AICT), 2011 5th International Conference on, vol., no., pp.1–4, 12–14 Oct. 2011, doi: 10.1109/ICAICT.2011.6111017
Google Scholar
V. Tunali, T. T. Bilgin, PRETO: A High-performance Text Mining Tool for Preprocessing Turkish Texts, 2012 International Conference on Computer Systems and Technologies
Google Scholar
S.V. Vinchurkar, S.M. Nirkhi, Feature extraction of product from customer feedback through blog. Int. J. Emerg. Technol. Adv. Eng. 2(1), 314–323 (2012). ISSN 2250-2459
Google Scholar
D. Kuonen, Challenges in bioinformatics for statistical data miners. Bull. Swiss Stat. Soc. 46, 10–17 (2003)
Google Scholar
J. Y. Hsu, W. Yih, Template-Based Information Mining from HTML Documents, American Association for Artificial Intelligence, July 1997
Google Scholar
M. Delgado, M. Martín-Bautista, D. Sánchez, M. Vila, Mining Text Data: Special Features and Patterns, Pattern Detection and Discovery, Lecture Notes in Computer Science, 2002, Volume 2447/2002, 175-186, DOI: 10.1007/3-540-45728-3_11
Google Scholar
Q. Zhao, S. S. Bhowmick, Association Rule Mining: A Survey, Technical Report, CAIS, Nanyang Technological University, Singapore, No. 2003116, 2003
Google Scholar
W. Abramowicz, T. Kaczmarek, M. Kowalkiewicz, Supporting topic map creation using data mining techniques. Aust. J. Inf. Syst. 11(1), 63–78 (2003)
Google Scholar
B. Janet, A. V. Reddy, Cube index for unstructured text analysis and mining, in Proceedings of the 2011 International Conference on Communication, Computing & Security (ICCCS '11). ACM, New York, NY, USA, 397–402
Google Scholar
L. Han, T.O. Suzek, Y. Wang, S.H. Bryant, The text-mining based PubChem Bioassay neighboring analysis. BMC Bioinformatics 11, 549 (2010). doi:10.1186/1471-2105-11-549
Article Google Scholar
L. Dey, S. K. M. Haque, Studying the effects of noisy text on text mining applications, in Proceedings of the Third Workshop on Analytics for Noisy Unstructured Text Data (AND '09). ACM, New York, NY, USA, 107–114
Google Scholar
S. Godbole, I. Bhattacharya, A. Gupta, A. Vea, Building re-usable dictionary repositories for real-world text mining, in Proceedings of the 19th ACM international conference on Information and knowledge management (CIKM '10). ACM, New York, NY, USA, 1189–1198
Google Scholar
R. Feldman, M. Fresko, H. Hirsh, Y. Aumann, O. Liphstat, Y. Schler, M. Rajman, Knowledge Management: A Text Mining Approach, Proc. of the 2nd Int. Conf. on Practical Aspects of Knowledge Management (PAKM98), (Basel, Switzerland, 29–30 Oct 1998)
Google Scholar
R. Feldman, M. Fresko, Y. Kinar, Y. Lindell, O. Liphstat, M. Rajman, Y. Schler, O. Zamir, Text mining at the term level, Proc. of the 2nd European Symposium on Principles of Data Mining and Knowledge Discovery (PKDD'98)
Google Scholar
J. C. Scholtes, Text-Mining: The next step in search technology, DESI-III Workshop Barcelona, June 8, 2009
Google Scholar
J. Lee, D. Grossman, O. Frieder, M. C. Mccabe, Integrating structured data and text: a multi-dimensional approach, Proc. of Information Technology: Coding and Computing, 2000. International Conference on, vol., no., pp. 264–269, 2000
Google Scholar
V. Gupta, G.S. Lehal, A survey of text mining techniques and applications. J. Emerg. Technol. Web Intell. 1(1), 60–76 (2009)
Google Scholar
R.K. Lomotey, R. Deters, Analytics-as-a-Service framework for terms association mining in unstructured data. Int. J. Bus. Process Integrat. Manag. 7(1), 49–61 (2014)
Article Google Scholar
Y. Gu, C. Kallas, J. Zhang, J. Marx, J. Tjoe, Automatic Patient Search Using Bernoulli Model. in Proc. of 2013 I.E. International Conference on Healthcare Informatics (ICHI 2013), pp. 517–522, Sept 9–11 2013, (Philadelphia, PA, USA, 2013)
R. K. Lomotey, R. Deters, Terms extraction from unstructured data silos, 8th International Conference on System of Systems Engineering (SoSE 13), (2013) pp. 19–24, 2–6 June 2013, doi: 10.1109/SYSoSE.2013.6575236
Google Scholar
T. Scheffer, C. Decomain, S. Wrobel, Mining the Web with active hidden Markov models, ICDM 2001, Proceedings IEEE International Conference on Data Mining, vol., no., pp. 645–646, 2001, doi: 10.1109/ICDM.2001.989591
Google Scholar
S. Mukherjee, S.J. Mitra, Hidden Markov Models, grammars, and biology: a tutorial. J. Bioinform. Comput. Biol. 3(2), 491–526 (2005)
Article Google Scholar
R. K. Lomotey, R. Deters, Data Mining from NoSQL Document-Append Style Storages. Proc. of the 2014 I.E. International Conference on Web Services (ICWS 2014), pp. 385–392, June 27–July 02, 2014, (Anchorage, Alaska, USA, 2014)
Google Scholar
R. K. Lomotey, R. Deters, RSenter: tool for topics and terms extraction from unstructured data debris. Proc. of the 2013 I.E. International Congress on Big Data, pp. 395–402, Santa Clara, California, 27 June–2 July 2013
Google Scholar
S. Haiduc, G. Bavota, R. Oliveto, A. de Lucia, A. Marcus, Automatic Query Performance Assessment during the Retrieval of Software Artifacts, Automated Software Engineering 2012 (ASE ’12), September 3–7, 2012, Essen, Germany
Google Scholar
A. Balinsky, H. Balinsky, S. Simske, On the Helmholtz Principle for Data Mining, Published by Hewlett-Packard Development Company, L.P. (2010). Available: http://www.hpl.hp.com/techreports/2010/HPL-2010-133.pdf
Erlang Programing Language, http://www.erlang.org/

Download references

Acknowledgement

• Special thanks to grad students in the MADMUC Lab, University of Saskatchewan.

• Thanks to Prof. Patrick Hung of the IT Security Unit, University of Ontario Institute of Technology.

• Final thanks to the Editors and Reviewers of this chapter for their feedback.

Author information

Authors and Affiliations

Information Sciences and Technology, The Pennsylvania State University - Beaver, 15061, Monaca, PA, USA
Richard K. Lomotey
Department of Computer Science, University of Saskatchewan, Saskatoon, Canada, S7N 5C9
Ralph Deters

Authors

Richard K. Lomotey
View author publications
You can also search for this author in PubMed Google Scholar
Ralph Deters
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Richard K. Lomotey .

Editor information

Editors and Affiliations

Faculty of Business & IT, Univ of Ontario Inst of Tech, Oshawa, Ontario, Canada
Patrick C. K. Hung

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Lomotey, R.K., Deters, R. (2016). Unstructured Data, NoSQL, and Terms Analytics. In: Hung, P. (eds) Big Data Applications and Use Cases. International Series on Computer Entertainment and Media Technology. Springer, Cham. https://doi.org/10.1007/978-3-319-30146-4_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-30146-4_6
Published: 19 May 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-30144-0
Online ISBN: 978-3-319-30146-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics