Skip to main content

A Sliding-Window Algorithm Implementation in MapReduce

  • Chapter
  • First Online:
Applications of Data Management and Analysis

Abstract

A limited resource processing platform may not be suited to process a large volume of data. The distributed processing platforms can solve this problem by incorporating commodity hardware collaboratively to process a large volume of data. The MapReduce programming framework is one candidate framework for large-scale processing, and Hadoop is its open-source implementation. This framework consists of the Hadoop Distributed File System and the MapReduce for computation capabilities. However, the MapReduce framework does not allow for data sharing for computation among the computing nodes. In this paper, we present an implementation of a sliding-window algorithm for data sharing for computation dependency in MapReduce. The algorithm is designed to facilitate the data processing a sequential order, e.g., moving average. The algorithm utilizes the MapReduce job metadata, e.g., input split size, to prepare the shared data between the computing nodes without violating the MapReduce fault tolerance handling mechanism.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Dean, J., & Ghemawat, S. (2008). MapReduce: Simplified data processing on large clusters. Communications of the ACM, 51, 107–113.

    Article  Google Scholar 

  2. Ekanayake, J., et al. (2008). Mapreduce for data intensive scientific analyses. In Proceedings of the 2008 Fourth IEEE International Conference on eScience, eScience ’08. pp. 277–284.

    Google Scholar 

  3. Apache Hadoop. (2015). Retrieved 19 Dec 2015, from, https://hadoop.apache.org/.

  4. Shvachko, K., et al. (2010). The hadoop distributed file system. In Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST). pp. 1–10.

    Google Scholar 

  5. Ma, Z., & Gu, L. (2010). The limitation of MapReduce: A probing case and a lightweight solution. In Proceedings of the 1st International Conference on Cloud Computing, GRIDs, and virtualization. pp. 68–73.

    Google Scholar 

  6. Elteir, M., et al. (2010) Enhancing Mapreduce via asynchronous data processing. In IEEE 16th International Conference on Parallel and Distributed Systems (ICPADS). pp. 397–405.

    Google Scholar 

  7. Bröder, A., & Gaissmaier, W. (2007). Sequential processing of cues in memory-based multiattribute decisions. Psychonomic Bulletin & Review, 14, 895–900.

    Article  Google Scholar 

  8. Datar, M., & Motwani, R. (2007). The sliding-window computation model and results. In Data streams (pp. 149–167). New York, NY: Springer.

    Chapter  Google Scholar 

  9. Olson, M. (2010). Hadoop: Scalable, flexible data storage and analysis. In IQT quarterly (Vol. 1, pp. 14–18). New York, NY: Springer.

    Google Scholar 

  10. Yu, Y., et al. (2008). DryadLINQ: A system for general-purpose distributed data-parallel computing using a high-level language. In OSDI (pp. 1–14). Berkeley, CA: USENIX Association.

    Google Scholar 

  11. Yang, H.-C., et al. (2007). Map-reduce-merge: Simplified relational data processing on large clusters. In Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data. pp. 1029–1040.

    Google Scholar 

  12. Li, L., et al. (2014). Rolling window time series prediction using MapReduce. In 2014 IEEE 15th International Conference on Information Reuse and Integration (IRI). pp. 757–764.

    Google Scholar 

  13. Dudek, A. E., et al. (2014). A generalized block bootstrap for seasonal time series. Journal of Time Series Analysis, 35, 89–114.

    Article  Google Scholar 

  14. Hu, Z., et al. (2002). An accumulative parallel skeleton for all. In Programming languages and systems (pp. 83–97). Heidelberg: Springer.

    Chapter  Google Scholar 

  15. Liu, Y., et al. (2014). Accumulative computation on MapReduce. IPSJ Online Transactions, 7, 33–42.

    Article  Google Scholar 

  16. Burgstahler, L., Neubauer, M. (2002). New modifications of the exponential moving average algorithm for bandwidth estimation. In Proceedings of the 15th ITC Specialist Seminar.

    Google Scholar 

  17. White, T. (2012). Hadoop: The definitive guide. Sebastopol, CA: O’Reilly Media, Inc..

    Google Scholar 

Download references

Acknowledgments

This work is supported and funded by Alberta Innovates Technology Futures (AITF), Calgary, AB, Canada. The authors would like to thank Alberta Health Services (AHS) and Calgary Laboratory Services (CLS), Calgary, Alberta, Canada, for endless logistics support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Behrouz H. Far .

Editor information

Editors and Affiliations

Appendices

Appendix A: Java Class for Record Sharing—Mapper Class

public class RShareMap extends Mapper<LongWritable, Text, LongWritable, Text> { @Override public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { Configuration conf = context.getConfiguration(); int SplitSize = Integer.parseInt(conf.get("SplitSize")); InputSplit split = context.getInputSplit(); long SplitLength = ((FileSplit) split).getLength(); long SplitOffset = ((FileSplit) split).getStart(); // Split the lines so you can select the element you want to copy forward String line = value.toString(); System.out.println(line.length()); // Split up each element of the line String[] elements = line.split("\n"); int window = 2; // number of records to share with the next map int size = 6;//one character size. Text outputValue = new Text(); LongWritable OutKey = new LongWritable(); long idx1 = SplitLength-(window-1)*size; // used for first split only SplitOffset =0 long idx2 = idx1+(window-1)*size;// used for first split only SplitOffset =0 /// Split values if ((SplitOffset==0)&&(key.get()>=idx1)& &(key.get()<=idx2)){ int temp= (int) Math.log10( ((window-1) +SplitLength)*size ); // calculate the new space for the copied forward element for first split only temp=temp+1; int Space = (int)Math.pow(10, temp); outputValue.set(elements[0]); long nkey= (long) (SplitLength*Space); nkey=nkey+key.get(); OutKey.set(nkey); context.write(OutKey, outputValue); } // End of First split values // Subsequent splits except the last one long nkey=key.get(); int x= (int) Math.log10( ((window-1) +SplitLength)*size ); x=x+1; int NewSpace = (int)Math.pow(10, x); long SplitNew=SplitOffset*NewSpace; if(SplitOffset>0){ long idx3 = SplitOffset+SplitLength-(window-1)*size; // used for any split other than SplitOffset =0 long idx4 = idx3+(window-1)*size; // used for any split other than SplitOffset =0 boolean test = (nkey>idx3)&&(nkey<=idx4)&& (SplitLength==SplitSize); if(test){ outputValue.set(elements[0]); long tempkey = (SplitOffset+SplitLength) *NewSpace; long tKey = tempkey+nkey; OutKey.set(tKey); context.write(OutKey, outputValue); //System.out.println(SplitLength); } } long newKey = SplitNew+key.get(); context.write(new LongWritable(newKey), value); } }

Appendix B : Java Class for Record Sharing—Reducer Class

public class RShareReduce extends Reducer<LongWritable, Text, LongWritable, Text> { public void reduce(LongWritable key, Text values, Context context) throws IOException, InterruptedException { long k1= key.get()+100; context.write(new LongWritable (k1), values); } }

Appendix C: Java Class for Record Sharing—Driver Class

public class RShareDriver { public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); // Use programArgs array to retrieve program arguments. String[] programArgs = new GenericOptionsParser (conf, args).getRemainingArgs(); conf.set("SplitSize", "12"); Job job = new Job(conf); job.setJarByClass(RShareDriver.class); job.setMapperClass(RShareMap.class); job.setReducerClass(RShareReduce.class); job.setOutputKeyClass(LongWritable.class); job.setOutputValueClass(Text.class); FileInputFormat.setMaxInputSplitSize(job, 12); // change the max split FileInputFormat.setMinInputSplitSize(job, 12); // change the min split // TODO: Update the input path for the location of the inputs of the map-reduce job. FileInputFormat.addInputPath(job, new Path(programArgs[0])); // TODO: Update the output path for the output directory of the map-reduce job. FileOutputFormat.setOutputPath(job, new Path(programArgs[1])); // Submit the job and wait for it to finish. job.waitForCompletion(true); // Submit and return immediately: // job.submit(); } }

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Mohammed, E.A., Naugler, C.T., Far, B.H. (2018). A Sliding-Window Algorithm Implementation in MapReduce. In: Moshirpour, M., Far, B., Alhajj, R. (eds) Applications of Data Management and Analysis . Lecture Notes in Social Networks. Springer, Cham. https://doi.org/10.1007/978-3-319-95810-1_6

Download citation

Publish with us

Policies and ethics