Skip to main content

DWIaaS: Data Warehouse Infrastructure as a Service for Big Data Analytics

  • Chapter
  • First Online:
Transactions on Computational Collective Intelligence XXX

Part of the book series: Lecture Notes in Computer Science ((TCCI,volume 11120))

Abstract

Many novel challenges and opportunities are associated with Big Data which require rethinking for many aspects of the traditional data warehouse architecture. Indeed, big data are collections of data sets so large and complex to process using classical data warehousing. This data is sourced from many different places such as social media and stored in different formats. It is primarily unstructured data needs a high performance information technology infrastructure that provides superior computational efficiency and storage capacity. This infrastructure should be flexible and scalable to ensure its management over large scale. In recent years, cloud computing is gaining momentum with more and more successful adoptions. This paper proposes a new data warehouse infrastructure as a service to effectively support distribution of big data storage, computing and parallelized programming.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. NoSQL database. http://nosql-database.org/

  2. Big data-as-a-service: a market and technology perspective. Technical report, EMC Solution Group (2012)

    Google Scholar 

  3. Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D., Silberschatz, A., Rasin, A.: HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. Proc. VLDB Endow. 2(1), 922–933 (2009). http://www.vldb.org/pvldb/2/vldb09-861.pdf

    Article  Google Scholar 

  4. Abouzied, A., Bajda-Pawlikowski, K., Huang, J., Abadi, D.J., Silberschatz, A.: HadoopDB in action: building real world applications. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp. 1111–1114. ACM (2010)

    Google Scholar 

  5. Agrawal, D., Das, S., El Abbadi, A.: Big data and cloud computing: current state and future opportunities. In: Proceedings of the 14th International Conference on Extending Database Technology, pp. 530–533. ACM (2011)

    Google Scholar 

  6. Aloisioa, G., Fiorea, S., Foster, I., Williams, D.: Scientific big data analytics challenges at large scale. In: Proceedings of Big Data and Extreme-scale Computing (BDEC) (2013)

    Google Scholar 

  7. Bakshi, K.: Considerations for big data: architecture and approach. In: 2012 IEEE Aerospace Conference, pp. 1–7. IEEE (2012)

    Google Scholar 

  8. Bhatia, A., Vaswani, G.: Big data–a review. IEEE Int. J. Eng. Sci. Res. Technol. IJESRT (2013)

    Google Scholar 

  9. Borthakur, D.: The hadoop distributed file system: architecture and design. Hadoop Proj. Website 11(2007), 21 (2007)

    Google Scholar 

  10. Chaiken, R., et al.: SCOPE: easy and efficient parallel processing of massive data sets. Proc. VLDB Endow. 1(2), 1265–1276 (2008)

    Article  Google Scholar 

  11. Chaudhuri, S.: What next?: a half-dozen data management research goals for big data and the cloud. In: Proceedings of the 31st Symposium on Principles of Database Systems, pp. 1–4. ACM (2012)

    Google Scholar 

  12. Chaudhuri, S., Dayal, U., Narasayya, V.: An overview of business intelligence technology. Commun. ACM 54(8), 88–98 (2011)

    Article  Google Scholar 

  13. Chen, S.: Cheetah: a high performance, custom data warehouse on top of MapReduce. Proc. VLDB Endow. 3(1–2), 1459–1468 (2010)

    Article  Google Scholar 

  14. Cooper, B.F., et al.: PNUTS: Yahoo!’s hosted data serving platform. Proc. VLDB Endow. 1(2), 1277–1288 (2008)

    Article  MathSciNet  Google Scholar 

  15. Cuzzocrea, A., Bellatreche, L., Song, I.: Data warehousing and OLAP over big data: current challenges and future research directions. In: Proceedings of the Sixteenth International Workshop on Data Warehousing and OLAP, DOLAP 2013, San Francisco, CA, USA, 28 October 2013, pp. 67–70 (2013)

    Google Scholar 

  16. Cuzzocrea, A., Song, I.Y., Davis, K.C.: Analytics over large-scale multidimensional data: the big data revolution! In: Proceedings of the ACM 14th International Workshop on Data Warehousing and OLAP, pp. 101–104. ACM (2011)

    Google Scholar 

  17. Dabbèchi, H., Nabli, A., Bouzguenda, L.: Towards cloud-based data warehouse as a service for big data analytics. In: Nguyen, N.-T., Manolopoulos, Y., Iliadis, L., Trawiński, B. (eds.) ICCCI 2016. LNCS (LNAI), vol. 9876, pp. 180–189. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45246-3_17

    Chapter  Google Scholar 

  18. Dittrich, J., Quiané-Ruiz, J., Jindal, A., Kargin, Y., Setty, V., Schad, J.: Hadoop++: making a yellow elephant run like a cheetah (without it even noticing). PVLDB 3(1), 518–529 (2010). http://www.comp.nus.edu.sg/~vldb2010/proceedings/files/papers/R46.pdf

    Article  Google Scholar 

  19. Eltabakh, M.Y., Tian, Y., Özcan, F., Gemulla, R., Krettek, A., McPherson, J.: CoHadoop: flexible data placement and its exploitation in hadoop. PVLDB 4(9), 575–585 (2011). http://www.vldb.org/pvldb/vol4/p575-eltabakh.pdf

    Google Scholar 

  20. Essaidi, M.: ODBIS: towards a platform for on-demand business intelligence services. In: Proceedings of the 2010 EDBT/ICDT Workshops, p. 12. ACM (2010)

    Google Scholar 

  21. Fiore, S., D’Anca, A., Palazzo, C., Foster, I., Williams, D.N., Aloisio, G.: Ophidia: toward big data analytics for escience. Procedia Comput. Sci. 18, 2376–2385 (2013)

    Article  Google Scholar 

  22. Apache Hadoop: Hadoop (2009)

    Google Scholar 

  23. Herodotou, H., et al.: Starfish: a self-tuning system for big data analytics. In: CIDR, vol. 11, pp. 261–272 (2011)

    Google Scholar 

  24. Ji, C., Li, Y., Qiu, W., Awada, U., Li, K.: Big data processing in cloud computing environments. In: 2012 12th International Symposium on Pervasive Systems, Algorithms and Networks (ISPAN), pp. 17–23. IEEE (2012)

    Google Scholar 

  25. Kala Karun, A., Chitharanjan, K.: A review on hadoop—HDFS infrastructure extensions. In: 2013 IEEE Conference on Information & Communication Technologies (ICT), pp. 132–137. IEEE (2013)

    Google Scholar 

  26. Kataria, M., Mittal, M.P.: Big data: a review. Int. J. Comput. Sci. Mob. Comput. 3(7), 106–110 (2014)

    Google Scholar 

  27. Lämmel, R.: Google’s MapReduce programming model—revisited. Sci. Comput. Program. 70(1), 1–30 (2008)

    Article  MathSciNet  Google Scholar 

  28. O’Driscoll, A., Daugelaite, J., Sleator, R.D.: ‘Big data’, hadoop and cloud computing in genomics. J. Biomed. Inform. 46(5), 774–781 (2013)

    Article  Google Scholar 

  29. Sagiroglu, S., Sinanc, D.: Big data: a review. In: 2013 International Conference on Collaboration Technologies and Systems (CTS), pp. 42–47. IEEE (2013)

    Google Scholar 

  30. Sangupamba, O.M., Prat, N., Comyn-Wattiau, I.: Business intelligence and big data in the cloud: opportunities for design-science researchers. In: Indulska, M., Purao, S. (eds.) ER 2014. LNCS, vol. 8823, pp. 75–84. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-12256-4_8

    Chapter  Google Scholar 

  31. Strauch, C., Sites, U.L.S., Kriha, W.: NoSQL databases. Lecture Notes, Stuttgart Media University (2011)

    Google Scholar 

  32. Thusoo, A., et al.: Hive: a warehousing solution over a map-reduce framework. Proc. VLDB Endow. 2(2), 1626–1629 (2009)

    Article  Google Scholar 

  33. Thusoo, A., et al.: Hive-a petabyte scale data warehouse using hadoop. In: 2010 IEEE 26th International Conference on Data Engineering (ICDE), pp. 996–1005. IEEE (2010)

    Google Scholar 

  34. Vaquero, L.M., Rodero-Merino, L., Caceres, J., Lindner, M.: A break in the clouds: towards a cloud definition. ACM SIGCOMM Comput. Commun. Rev. 39(1), 50–55 (2008)

    Article  Google Scholar 

  35. Wanderman-Milne, S., Li, N.: Runtime code generation in cloudera impala. IEEE Data Eng. Bull. 37(1), 31–37 (2014)

    Google Scholar 

  36. Wang, K., Zhou, X., Qiao, K., Lang, M., McClelland, B., Raicu, I.: Towards scalable distributed workload manager with monitoring-based weakly consistent resource stealing. In: Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, pp. 219–222. ACM (2015)

    Google Scholar 

  37. Wang, L., et al.: G-Hadoop: MapReduce across distributed data centers for data-intensive computing. Futur. Gener. Comp. Syst. 29(3), 739–750 (2013). https://doi.org/10.1016/j.future.2012.09.001

    Article  Google Scholar 

  38. Xinhua, E., Han, J., Wang, Y., Liu, L.: Big data-as-a-service: definition and architecture. In: 2013 15th IEEE International Conference on Communication Technology (ICCT), pp. 738–742. IEEE (2013)

    Google Scholar 

  39. Zheng, Z., Zhu, J., Lyu, M.R.: Service-generated big data and big data-as-a-service: an overview. In: 2013 IEEE International Congress on Big Data (BigData Congress), pp. 403–410. IEEE (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hichem Dabbèchi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Dabbèchi, H., Nabli, A., Bouzguenda, L., Haddar, K. (2018). DWIaaS: Data Warehouse Infrastructure as a Service for Big Data Analytics. In: Thanh Nguyen, N., Kowalczyk, R. (eds) Transactions on Computational Collective Intelligence XXX. Lecture Notes in Computer Science(), vol 11120. Springer, Cham. https://doi.org/10.1007/978-3-319-99810-7_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-99810-7_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-99809-1

  • Online ISBN: 978-3-319-99810-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics