A Sliding-Window Algorithm Implementation in MapReduce

Mohammed, Emad A.; Naugler, Christopher T.; Far, Behrouz H.

doi:10.1007/978-3-319-95810-1_6

Emad A. Mohammed¹⁶,
Christopher T. Naugler¹⁷ &
Behrouz H. Far¹⁸

Part of the book series: Lecture Notes in Social Networks ((LNSN))

917 Accesses

Abstract

A limited resource processing platform may not be suited to process a large volume of data. The distributed processing platforms can solve this problem by incorporating commodity hardware collaboratively to process a large volume of data. The MapReduce programming framework is one candidate framework for large-scale processing, and Hadoop is its open-source implementation. This framework consists of the Hadoop Distributed File System and the MapReduce for computation capabilities. However, the MapReduce framework does not allow for data sharing for computation among the computing nodes. In this paper, we present an implementation of a sliding-window algorithm for data sharing for computation dependency in MapReduce. The algorithm is designed to facilitate the data processing a sequential order, e.g., moving average. The algorithm utilizes the MapReduce job metadata, e.g., input split size, to prepare the shared data between the computing nodes without violating the MapReduce fault tolerance handling mechanism.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Dean, J., & Ghemawat, S. (2008). MapReduce: Simplified data processing on large clusters. Communications of the ACM, 51, 107–113.
Article Google Scholar
Ekanayake, J., et al. (2008). Mapreduce for data intensive scientific analyses. In Proceedings of the 2008 Fourth IEEE International Conference on eScience, eScience ’08. pp. 277–284.
Google Scholar
Apache Hadoop. (2015). Retrieved 19 Dec 2015, from, https://hadoop.apache.org/.
Shvachko, K., et al. (2010). The hadoop distributed file system. In Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST). pp. 1–10.
Google Scholar
Ma, Z., & Gu, L. (2010). The limitation of MapReduce: A probing case and a lightweight solution. In Proceedings of the 1st International Conference on Cloud Computing, GRIDs, and virtualization. pp. 68–73.
Google Scholar
Elteir, M., et al. (2010) Enhancing Mapreduce via asynchronous data processing. In IEEE 16th International Conference on Parallel and Distributed Systems (ICPADS). pp. 397–405.
Google Scholar
Bröder, A., & Gaissmaier, W. (2007). Sequential processing of cues in memory-based multiattribute decisions. Psychonomic Bulletin & Review, 14, 895–900.
Article Google Scholar
Datar, M., & Motwani, R. (2007). The sliding-window computation model and results. In Data streams (pp. 149–167). New York, NY: Springer.
Chapter Google Scholar
Olson, M. (2010). Hadoop: Scalable, flexible data storage and analysis. In IQT quarterly (Vol. 1, pp. 14–18). New York, NY: Springer.
Google Scholar
Yu, Y., et al. (2008). DryadLINQ: A system for general-purpose distributed data-parallel computing using a high-level language. In OSDI (pp. 1–14). Berkeley, CA: USENIX Association.
Google Scholar
Yang, H.-C., et al. (2007). Map-reduce-merge: Simplified relational data processing on large clusters. In Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data. pp. 1029–1040.
Google Scholar
Li, L., et al. (2014). Rolling window time series prediction using MapReduce. In 2014 IEEE 15th International Conference on Information Reuse and Integration (IRI). pp. 757–764.
Google Scholar
Dudek, A. E., et al. (2014). A generalized block bootstrap for seasonal time series. Journal of Time Series Analysis, 35, 89–114.
Article Google Scholar
Hu, Z., et al. (2002). An accumulative parallel skeleton for all. In Programming languages and systems (pp. 83–97). Heidelberg: Springer.
Chapter Google Scholar
Liu, Y., et al. (2014). Accumulative computation on MapReduce. IPSJ Online Transactions, 7, 33–42.
Article Google Scholar
Burgstahler, L., Neubauer, M. (2002). New modifications of the exponential moving average algorithm for bandwidth estimation. In Proceedings of the 15th ITC Specialist Seminar.
Google Scholar
White, T. (2012). Hadoop: The definitive guide. Sebastopol, CA: O’Reilly Media, Inc..
Google Scholar

Download references

Acknowledgments

This work is supported and funded by Alberta Innovates Technology Futures (AITF), Calgary, AB, Canada. The authors would like to thank Alberta Health Services (AHS) and Calgary Laboratory Services (CLS), Calgary, Alberta, Canada, for endless logistics support.

Author information

Authors and Affiliations

Department of Software Engineering, Faculty of Engineering, Lakehead University, Thunder Bay, ON, Canada
Emad A. Mohammed
Departments of Pathology and Laboratory Medicine and Family Medicine, University of Calgary and Calgary Laboratory Services, Calgary, AB, Canada
Christopher T. Naugler
Department of Electrical and Computer Engineering, Schulich School of Engineering, University of Calgary, Calgary, AB, Canada
Behrouz H. Far

Authors

Emad A. Mohammed
View author publications
You can also search for this author in PubMed Google Scholar
Christopher T. Naugler
View author publications
You can also search for this author in PubMed Google Scholar
Behrouz H. Far
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Behrouz H. Far .

Editor information

Editors and Affiliations

Department Electrical & Computer Engineering, University of Calgary, Calgary, AB, Canada
Mohammad Moshirpour
Department Electrical & Computer Engineering, University of Calgary, Calgary, AB, Canada
Behrouz H. Far
Department of Computer Science, University of Calgary, Calgary, AB, Canada
Reda Alhajj

Appendices

Appendix A: Java Class for Record Sharing—Mapper Class

public class RShareMap extends Mapper<LongWritable, Text, LongWritable, Text> { @Override public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { Configuration conf = context.getConfiguration(); int SplitSize = Integer.parseInt(conf.get("SplitSize")); InputSplit split = context.getInputSplit(); long SplitLength = ((FileSplit) split).getLength(); long SplitOffset = ((FileSplit) split).getStart(); // Split the lines so you can select the element you want to copy forward String line = value.toString(); System.out.println(line.length()); // Split up each element of the line String[] elements = line.split("\n"); int window = 2; // number of records to share with the next map int size = 6;//one character size. Text outputValue = new Text(); LongWritable OutKey = new LongWritable(); long idx1 = SplitLength-(window-1)*size; // used for first split only SplitOffset =0 long idx2 = idx1+(window-1)*size;// used for first split only SplitOffset =0 /// Split values if ((SplitOffset==0)&&(key.get()>=idx1)& &(key.get()<=idx2)){ int temp= (int) Math.log10( ((window-1) +SplitLength)*size ); // calculate the new space for the copied forward element for first split only temp=temp+1; int Space = (int)Math.pow(10, temp); outputValue.set(elements[0]); long nkey= (long) (SplitLength*Space); nkey=nkey+key.get(); OutKey.set(nkey); context.write(OutKey, outputValue); } // End of First split values // Subsequent splits except the last one long nkey=key.get(); int x= (int) Math.log10( ((window-1) +SplitLength)*size ); x=x+1; int NewSpace = (int)Math.pow(10, x); long SplitNew=SplitOffset*NewSpace; if(SplitOffset>0){ long idx3 = SplitOffset+SplitLength-(window-1)*size; // used for any split other than SplitOffset =0 long idx4 = idx3+(window-1)*size; // used for any split other than SplitOffset =0 boolean test = (nkey>idx3)&&(nkey<=idx4)&& (SplitLength==SplitSize); if(test){ outputValue.set(elements[0]); long tempkey = (SplitOffset+SplitLength) *NewSpace; long tKey = tempkey+nkey; OutKey.set(tKey); context.write(OutKey, outputValue); //System.out.println(SplitLength); } } long newKey = SplitNew+key.get(); context.write(new LongWritable(newKey), value); } }

Appendix B : Java Class for Record Sharing—Reducer Class

public class RShareReduce extends Reducer<LongWritable, Text, LongWritable, Text> { public void reduce(LongWritable key, Text values, Context context) throws IOException, InterruptedException { long k1= key.get()+100; context.write(new LongWritable (k1), values); } }

Appendix C: Java Class for Record Sharing—Driver Class

public class RShareDriver { public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); // Use programArgs array to retrieve program arguments. String[] programArgs = new GenericOptionsParser (conf, args).getRemainingArgs(); conf.set("SplitSize", "12"); Job job = new Job(conf); job.setJarByClass(RShareDriver.class); job.setMapperClass(RShareMap.class); job.setReducerClass(RShareReduce.class); job.setOutputKeyClass(LongWritable.class); job.setOutputValueClass(Text.class); FileInputFormat.setMaxInputSplitSize(job, 12); // change the max split FileInputFormat.setMinInputSplitSize(job, 12); // change the min split // TODO: Update the input path for the location of the inputs of the map-reduce job. FileInputFormat.addInputPath(job, new Path(programArgs[0])); // TODO: Update the output path for the output directory of the map-reduce job. FileOutputFormat.setOutputPath(job, new Path(programArgs[1])); // Submit the job and wait for it to finish. job.waitForCompletion(true); // Submit and return immediately: // job.submit(); } }

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Mohammed, E.A., Naugler, C.T., Far, B.H. (2018). A Sliding-Window Algorithm Implementation in MapReduce. In: Moshirpour, M., Far, B., Alhajj, R. (eds) Applications of Data Management and Analysis . Lecture Notes in Social Networks. Springer, Cham. https://doi.org/10.1007/978-3-319-95810-1_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-95810-1_6
Published: 05 October 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-95809-5
Online ISBN: 978-3-319-95810-1
eBook Packages: Business and ManagementBusiness and Management (R0)

Publish with us

Policies and ethics