TS-CHIEF: a scalable and accurate forest algorithm for time series classification

Shifaz, Ahmed; Pelletier, Charlotte; Petitjean, François; Webb, Geoffrey I.

doi:10.1007/s10618-020-00679-8

TS-CHIEF: a scalable and accurate forest algorithm for time series classification

Published: 05 March 2020

Volume 34, pages 742–775, (2020)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

3893 Accesses
116 Citations
7 Altmetric
Explore all metrics

Abstract

Time Series Classification (TSC) has seen enormous progress over the last two decades. HIVE-COTE (Hierarchical Vote Collective of Transformation-based Ensembles) is the current state of the art in terms of classification accuracy. HIVE-COTE recognizes that time series data are a specific data type for which the traditional attribute-value representation, used predominantly in machine learning, fails to provide a relevant representation. HIVE-COTE combines multiple types of classifiers: each extracting information about a specific aspect of a time series, be it in the time domain, frequency domain or summarization of intervals within the series. However, HIVE-COTE (and its predecessor, FLAT-COTE) is often infeasible to run on even modest amounts of data. For instance, training HIVE-COTE on a dataset with only 1500 time series can require 8 days of CPU time. It has polynomial runtime with respect to the training set size, so this problem compounds as data quantity increases. We propose a novel TSC algorithm, TS-CHIEF (Time Series Combination of Heterogeneous and Integrated Embedding Forest), which rivals HIVE-COTE in accuracy but requires only a fraction of the runtime. TS-CHIEF constructs an ensemble classifier that integrates the most effective embeddings of time series that research has developed in the last decade. It uses tree-structured classifiers to do so efficiently. We assess TS-CHIEF on 85 datasets of the University of California Riverside (UCR) archive, where it achieves state-of-the-art accuracy with scalability and efficiency. We demonstrate that TS-CHIEF can be trained on 130 k time series in 2 days, a data quantity that is beyond the reach of any TSC algorithm with comparable accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Proximity Forest: an effective and scalable distance-based classifier for time series

Article 06 February 2019

HIVE-COTE 2.0: a new meta ensemble for time series classification

Article Open access 24 September 2021

TS-QUAD: A Smaller Elastic Ensemble for Time Series Classification with No Reduction in Accuracy

References

Bagnall A, Davis L, Hills J, Lines J (2012) Transformation based ensembles for time series classification. In: Proceedings of the SIAM international conference on data mining, pp 307–318
Bagnall A, Lines J, Hills J, Bostrom A (2015) Time-series classification with COTE: the collective of transformation-based ensembles. IEEE Trans Knowl Data Eng 27(9):2522–2535
Google Scholar
Bagnall A, Lines J, Bostrom A, Large J, Keogh E (2017) The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Min Knowl Discov 31(3):606–660
MathSciNet Google Scholar
Baydogan MG, Runger G (2016) Time series representation and similarity based on local autopatterns. Data Min Knowl Discov 30(2):476–509
MathSciNet MATH Google Scholar
Baydogan MG, Runger G, Tuv E (2013) A bag-of-features framework to classify time series. IEEE Trans Pattern Anal Mach Intell 35(11):2796–2802
Google Scholar
Benavoli A, Corani G, Mangili F (2016) Should we really use post-hoc tests based on mean-ranks? J Mach Learn Res 17(1):152–161
MathSciNet MATH Google Scholar
Bostrom A, Bagnall A (2015) Binary shapelet transform for multiclass time series classification. In: International conference on big data analytics and knowledge discovery, pp 257–269. Springer
Breiman L (2001) Random forests. Mach Learn 45(1):5–32 ISSN 08856125
MATH Google Scholar
Chen L, Ng R (2004) On The marriage of lp-norms and edit distance. In: Proceedings of the 13th international conference on very large data bases (VLDB), pp. 792–803
Chen Y, Keogh E, Hu B, Begum N, Bagnall A, Mueen A, Batista G (2015) The UCR time series classification archive. www.cs.ucr.edu/~eamonn/time_series_data/
Dau HA, Bagnall A, Kamgar K, Yeh C-CM, Zhu Y, Gharghabi S, Ratanamahatana CA, Keogh E (2018a) The UCR time series archive. arXiv:1810.07758. https://www.cs.ucr.edu/~eamonn/time_series_data_2018/
Dau HA, Keogh E, Kamgar K, Yeh C-CM, Zhu Y, Gharghabi S, Ratanamahatana CA, Yanping, Hu B, Begum N, Bagnall A, Mueen A, Batista G (2018b) The UCR time series classification archive. https://www.cs.ucr.edu/~eamonn/time_series_data_2018/
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet MATH Google Scholar
Deng H, Runger G, Tuv E, Vladimir M (2013) A time series forest for classification and feature extraction. Inform Sci 239:142–153
MathSciNet MATH Google Scholar
Ding H, Trajcevski G, Scheuermann P, Wang X, Keogh EJ (2008) Querying and mining of time series data: experimental comparison of representations and distance measures. Proc VLDB Endow 1(2):1542–1552
Google Scholar
Esling P, Agon C (2012) Time-series data mining. ACM Comput Surv (CSUR) 45(1):12
MATH Google Scholar
Fawaz HI, Forestier G, Weber J, Idoumghar L, Muller PA (2019) Deep learning for time series classification: a review. Data Min and Knowl Discov 33(4):917–963
MathSciNet Google Scholar
Friedman E, Eden R (2013) GNU Trove: high-performance collections library for Java. https://bitbucket.org/trove4j/trove/src/master/. Accessed 25 June 2019
Górecki T, Łuczak M (2013) Using derivatives in time series classification. Data Min Knowl Discov 26(2):310–331 ISSN 13845810
MathSciNet Google Scholar
Grabocka J, Schilling N, Wistuba M, Schmidt-Thieme L (2014) Learning time-series shapelets. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining—KDD ’14, pp 392–401
Hills J, Lines J, Baranauskas E, Mapp J, Bagnall A (2014) Classification of time series by shapelet transformation. Data Min Knowl Discov 28(4):851–881 ISSN 13845810
MathSciNet MATH Google Scholar
Hirschberg DS (1977) Algorithms for the longest common subsequence problem. J ACM 24(4):664–675
MathSciNet MATH Google Scholar
Jeong YS, Jeong MK, Omitaomu OA (2011) Weighted dynamic time warping for time series classification. Pattern Recognit 44(9):2231–2240
Google Scholar
Karlsson I, Papapetrou P, Boström H (2016) Generalized random shapelet forests. Data Min Knowl Discov 30(5):1053–1085
MathSciNet MATH Google Scholar
Keogh EJ, Pazzani MJ (2001) Derivative dynamic time warping. In: Proceedings of the 2001 SIAM international conference on data mining, pp 1–11
Keogh E, Kasetty S (2003) On the need for time series data mining benchmarks: a survey and empirical demonstration. Data Min Knowl Discov 7(4):349–371
MathSciNet Google Scholar
Keogh E, Chakrabarti K, Pazzani M, Mehrotra S (2001) Locally adaptive dimensionality reduction for indexing large time series databases. ACM Sigmod Record 30(2):151–162
MATH Google Scholar
Large J, Lines J, Bagnall A (2017) The heterogeneous ensembles of standard classification algorithms (HESCA): the whole is greater than the sum of its parts, pp 1–31. arXiv:1710.09220
Large J, Bagnall A, Malinowski S, Tavenard R (2018) From BOP to BOSS and beyond: time series classification with dictionary based classifiers, pp 1–22. arXiv:1809.06751
Le Guennec A, Malinowski S, Tavenard R (2016) Data augmentation for time series classification using convolutional neural networks. In: ECML/PKDD workshop on advanced analytics and learning on temporal data
Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing SAX: a novel symbolic representation of time series. Data Min Knowl Discov 15(2):107–144 ISSN 13845810
MathSciNet Google Scholar
Lin J, Khade R, Li Y (2012) Rotation-invariant similarity in time series using bag-of-patterns representation. J Intell Inf Syst 39(2):287–315
Google Scholar
Lines J, Bagnall A (2015) Time series classification with ensembles of elastic distance measures. Data Min Knowl Discov 29(3):565–592 ISSN 13845810
MathSciNet MATH Google Scholar
Lines J, Taylor S, Bagnall A (2018) Time series classification with hive-cote: the hierarchical vote collective of transformation-based ensembles. ACM Trans Knowl Discov Data (TKDD) 12(5):52
Google Scholar
Lucas B, Shifaz A, Pelletier C, O’Neill L, Zaidi N, Goethals B, Petitjean F, Webb GI (2019) Proximity Forest: an effective and scalable distance-based classifier for time series. Data Min Knowl Discov 33(3):607–635
Google Scholar
Marteau P-F (2009) Time warp edit distance with stiffness adjustment for time series matching. IEEE Trans Pattern Anal Mach Intell 31(2):306–318
Google Scholar
Middlehurst M, Vickers W, Bagnall A (2019) Scalable dictionary classifiers for time series classification. arXiv:1907.11815
Mueen A, Keogh E, Young N (2011) Logical-shapelets: an expressive primitive for time series classification. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining—KDD ’11, p 1154
Nwe TL, Dat TH, Ma B (2017) Convolutional neural network with multi-task learning scheme for acoustic scene classification. In: 2017 Asia-pacific signal and information processing association annual summit and conference (APSIPA ASC), pp 1347–1350. IEEE
Nweke HF, Teh YW, Al-Garadi MA, Alo UR (2018) Deep learning algorithms for human activity recognition using mobile and wearable sensor networks: state of the art and research challenges. Expert Syst Appl 105:233–261
Google Scholar
Osinski S, Weiss D (2015) HPPC: High performance primitive collections for Java. https://labs.carrotsearch.com/hppc.html. Accessed 25 June 2019
Pelletier C, Webb GI, Petitjean F (2019) Temporal convolutional neural network for the classification of satellite image time series. Remote Sens 11(5):523
Google Scholar
Rajkomar A, Oren E, Chen K, Dai AM, Hajaj N, Hardt M, Liu PJ, Liu X, Marcus J, Sun M et al (2018) Scalable and accurate deep learning with electronic health records. NPJ Digit Med 1(1):18
Google Scholar
Rakthanmanon T, Keogh E (2013) Fast shapelets: a scalable algorithm for discovering time series shapelets. In: Proceedings of the 2013 SIAM international conference on data mining, pp 668–676
Rakthanmanon T, Campana B, Mueen A, Batista G, Westover B, Zhu Q, Zakaria J, Keogh E (2013) Addressing big data time series: mining trillions of time series subsequences under dynamic time warping. ACM Trans Knowl Discov Data (TKDD) 7(3):10
Google Scholar
Schäfer P (2015) The BOSS is concerned with time series classification in the presence of noise. Data Min Knowl Discov 29(6):1505–1530
MathSciNet MATH Google Scholar
Schäfer P (2016) Scalable time series classification. Data Min Knowl Discov 30(5):1273–1298 ISSN 1573756X
MathSciNet MATH Google Scholar
Schäfer P, Högqvist M (2012) SFA: a symbolic fourier approximation and index for similarity search in high dimensional datasets. In: Proceedings of the 15th international conference on extending database technology, pp 516–527
Schäfer P, Leser U (2017) Fast and accurate time series classification with WEASEL. In: Proceedings of the 2017 ACM on conference on information and knowledge management (CIKM), pp 637–646. ISBN 9781450349185
Senin P, Malinchik S (2013) SAX-VSM: interpretable time series classification using SAX and vector space model. In: Proceedings of IEEE international conference on data mining, ICDM, pp 1175–1180, ISSN 15504786
Silva DF, Giusti R, Keogh E, Batista GE (2018) Speeding up similarity search under dynamic time warping by pruning unpromising alignments. Data Min Knowl Discov 32(4):988–1016
MathSciNet MATH Google Scholar
Stefan A, Athitsos V, Das G (2013) The move-split-merge metric for time series. IEEE Trans Knowl Data Eng 25(6):1425–1438 ISSN 10414347
Google Scholar
Susto GA, Cenedese A, Terzi M (2018) Time-series classification methods: review and applications to power systems data. In: Big data application in power systems, pp 179–220 Elsevier
Tan CW, Webb GI, Petitjean F (2017) Indexing and classifying gigabytes of time series under time warping. In: Proceedings of the 2017 SIAM international conference on data mining, pp 282–290. SIAM
Ueda N, Nakano R (1996) Generalization error of ensemble estimators. In: IEEE international conference on neural networks, volume 1, pp 90–95. IEEE
Wang Z, Yan W, Oates T (2017) Time series classification from scratch with deep neural networks: A strong baseline. In: 2017 international joint conference on neural networks (IJCNN), pp 1578–1585. IEEE
Wang J, Liu P, She MF, Nahavandi S, Kouzani A (2013) Bag-of-words representation for biomedical time series classification. Biomed Signal Process Control 8(6):634–644
Google Scholar
Wang J, Chen Y, Hao S, Peng X, Hu L (2019) Deep learning for sensor-based activity recognition: a survey. Pattern Recognit Lett 119:3–11
Google Scholar
Yang Q, Wu X (2006) 10 challenging problems in data mining research. Int J Inf Technol Decis Mak 5(04):597–604
Google Scholar
Ye L, Keogh E (2009) Time series shapelets. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining—KDD ’09, p 947

Download references

Acknowledgements

This research was supported by the Australian Research Council under grant DE170100037. This material is based upon work supported by the Air Force Office of Scientific Research, Asian Office of Aerospace Research and Development (AOARD) under award number FA2386-17-1-4036. The authors would like to thank Prof. Eamonn Keogh and all the people who have contributed to the UCR time series classification archive. We also would like to acknowledge the use of source code freely available at http://www.timeseriesclassification.com and thank Prof. Anthony Bagnall and other contributors of the project. We also acknowledge the use of source code freely provided by the original author of BOSS algorithm, Dr. Patrick Schäfer. Finally, we acknowledge the use of two Java libraries (Osinski and Weiss 2015; Friedman and Eden 2013), which was used to optimize the implementation of our source code.

Author information

Authors and Affiliations

Faculty of Information Technology, Monash University, 25 Exhibition Walk, 3800, Melbourne, VIC, Australia
Ahmed Shifaz, Charlotte Pelletier, François Petitjean & Geoffrey I. Webb
IRISA, UMR CNRS 6074, University of Bretagne Sud, Campus de Tohannic, BP 573, 56 000, Vannes, France
Charlotte Pelletier

Authors

Ahmed Shifaz
View author publications
You can also search for this author in PubMed Google Scholar
Charlotte Pelletier
View author publications
You can also search for this author in PubMed Google Scholar
François Petitjean
View author publications
You can also search for this author in PubMed Google Scholar
Geoffrey I. Webb
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ahmed Shifaz.

Additional information

Responsible editor: Eamonn Keogh.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

See Tables 3 and 4.

Table 3 Complexities of the methods mentioned in Sect. 2

Full size table

Table 4 Accuracy of leading TSC classifiers on 85 UCR datasets

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shifaz, A., Pelletier, C., Petitjean, F. et al. TS-CHIEF: a scalable and accurate forest algorithm for time series classification. Data Min Knowl Disc 34, 742–775 (2020). https://doi.org/10.1007/s10618-020-00679-8

Download citation

Received: 25 June 2019
Accepted: 24 February 2020
Published: 05 March 2020
Issue Date: May 2020
DOI: https://doi.org/10.1007/s10618-020-00679-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

TS-CHIEF: a scalable and accurate forest algorithm for time series classification

Abstract

Access this article

Similar content being viewed by others

Proximity Forest: an effective and scalable distance-based classifier for time series

HIVE-COTE 2.0: a new meta ensemble for time series classification

TS-QUAD: A Smaller Elastic Ensemble for Time Series Classification with No Reduction in Accuracy

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

TS-CHIEF: a scalable and accurate forest algorithm for time series classification

Abstract

Access this article

Similar content being viewed by others

Proximity Forest: an effective and scalable distance-based classifier for time series

HIVE-COTE 2.0: a new meta ensemble for time series classification

TS-QUAD: A Smaller Elastic Ensemble for Time Series Classification with No Reduction in Accuracy

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation