Abstract
Considering that most deep learning based trackers capture accurate locations for targets at the expense of consuming much time in training phrase, in this paper we present a new powerful tracker using the Siamese network which can be implemented with low computation resource. Our proposed tracker can track targets accurately by a fine-tuned model which is convenient to train. During the tracking, we apply a new sampling method that is independent of training called action-selection to conduct selective and flexible sampling step by step with a variable stride, by which we can get bounding boxes with varied aspect radio. By verifying its performance on online tracking benchmarks, it turns out that our tracker achieves higher accuracy than most traditional trackers. In addition, our tracker operates at frame-rates beyond real-time.
Similar content being viewed by others
References
Yi, W., Lim, J., Yang, M.H.: Online object tracking: a benchmark. In: Computer Vision and Pattern Recognition (2013)
Porikli, F.: Achieving real-time object detection and tracking under extreme conditions. J. Real-Time Image Process. 1(1), 33–40 (2006)
Razavian, A.S., Azizpour, H., Sullivan, J., Carlsson, S.: Cnn features off-the-shelf: An astounding baseline for recognition (2014)
Lee, S.H., Jang, W.D., Kim, C.S.: Tracking-by-segmentation using superpixel-wise neural network. IEEE Access 99, 1–1 (2018)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems (2012)
Bromley, J., Guyon, I., Lecun, Y., Shah, R.: Signature verification using a “siamese” time delay neural network. In: International Conference on Neural Information Processing Systems (1993)
Taigman, Y., Ming, Y., Ranzato, M., Wolf, L.: Deepface: Closing the gap to human-level performance in face verification. In: IEEE Conference on Computer Vision and Pattern Recognition (2014)
Zagoruyko, S., Komodakis, N.: Learning to compare image patches via convolutional neural networks (2015)
Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with siamese region proposal. In: IEEE conference on computer vision and pattern recognition (2018)
Li, B., Wu, W., Wang, Q., Zhang, F., Yan, J.: Siamrpn++: Evolution of siamese visual tracking with very deep networks (2018)
Tao, R., Gavves, E., Smeulders, A.W.M.: Siamese instance search for tracking. In: IEEE conference on computer vision and pattern recognition (2016)
Kai, C., Tao, W.: Convolutional regression for visual tracking. IEEE Trans. Image Process. 99, 1–1 (2016)
Smeulders, A.W.M., Chu, D.M., Rita, C., Simone, C., Afshin, D., Mubarak, S.: Visual tracking: an experimental survey. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1442–1468 (2014)
Ning, G., Zhi, Z., Chen, H., He, Z., Wang, H.: Spatially supervised recurrent convolutional neural networks for visual object tracking. In: IEEE International Symposium on Circuits and Systems (2017)
Hare, S., Saffari, A., Torr, P.H.S.: Struck: Structured output tracking with kernels. In: International Conference on Computer Vision (2011)
Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: Fully-convolutional siamese networks for object tracking. In: European Conference on Computer Vision (2016)
Guo, Q., Wei, F., Zhou, C., Rui, H., Song, W.: Learning dynamic siamese network for visual object tracking. In: IEEE international conference on computer vision (2017)
Wang, N., Yeung, D.Y.: Learning a deep compact image representation for visual tracking. In: International Conference on Neural Information Processing Systems (2013)
Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking (2015)
Chao, M., Huang, J.B., Yang, X., Yang, M.H.: Hierarchical convolutional features for visual tracking. In: IEEE International Conference on Computer Vision (2016)
Wang, L., Ouyang, W., Wang, X., Lu, H.: Visual tracking with fully convolutional networks. In: IEEE international conference on computer vision (2016)
Fan, H., Ling, H.: Parallel tracking and verifying: A framework for real-time and high accuracy visual tracking. In: 2017 IEEE International Conference on Computer Vision (ICCV) (2017)
Chong, S., Lu, H., Yang, M.H.: Learning spatial-aware regressions for visual tracking (2018)
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Kai, C., Tao, W.: Once for all: a two-flow convolutional neural network for visual tracking. IEEE Trans. Circ. Syst. Video Technol. 99, 1–1 (2016)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Computer Science (2014)
Held, D., Thrun, S., Savarese, S.: Learning to track at 100 fps with deep regression networks (2016)
Huang, C., Lucey, S., Ramanan, D.: Learning policies for adaptive tracking with deep feature cascades (2017)
Song, Y., Chao, M., Gong, L., Zhang, J., Yang, M.H.: Crest: Convolutional residual learning for visual tracking. In: IEEE International Conference on Computer Vision (2017)
Real, E., Shlens, J., Mazzocchi, S., Xin, P., Vanhoucke, V.: Youtube-boundingboxes: a large high-precision human-annotated data set for object detection in video (2017)
Girshick, R.: Fast r-cnn. Computer Science (2015)
Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: A unified embedding for face recognition and clustering. In: Computer Vision and Pattern Recognition (2015)
Yi, W., Jongwoo, L., Ming-Hsuan, Y.: Object tracking benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1834–1848 (2015)
Kristan, M., Eldesokey, A., Xing, Y., Fan, Y., Zheng, Z., Zhang, Z., He, Z., Fernandez, G., Garciamartin, A., Muhic, A.: The visual object tracking vot2017 challenge results. In: IEEE International Conference on Computer Vision Workshop (2017)
Yang, L., Liu, R., Zhang, D., Lei, Z.: Deep location-specific tracking. In: ACM on Multimedia Conference (2017)
Zhang, J., Ma, S., Sclaroff, S.: Meem: Robust tracking via multiple experts using entropy minimization (2014)
Martin, D., Gustav, H., Fahad, S.K., Michael, F.: Accurate scale estimation for robust visual tracking. In: British machine vision conference (2014)
Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. 37(3), 583–596 (2015)
Yang, M.H., Lu, H., Wei, Z.: Robust object tracking via sparsity-based collaborative model. In: Computer Vision and Pattern Recognition (2012)
Hare, S., Golodetz, S., Saffari, A., Vineet, V., Cheng, M.M., Hicks, S.L., Torr, P.H.S.: Struck: Structured output tracking with kernels. IEEE Trans. Pattern Anal. Mach. Intell. 38(10), 2096–2109 (2015)
Funding
This work was supported by the Natural Science Foundation of Jiangsu Province under (Grant no. BK20151102), Natural Science Foundation of China (Grant no. 61673108), Ministry of Education Key Laboratory of Machine Perception, Peking University (Grant no. K-2016-03), Open Project Program of the Ministry of Education Key Laboratory of Underwater Acoustic Signal Processing, Southeast University (Grant no. UASP1502) and Natural Science Foundation of China (Grant no. 61802058).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhang, Z., Zhang, Y., Cheng, X. et al. Siamese network for real-time tracking with action-selection. J Real-Time Image Proc 17, 1647–1657 (2020). https://doi.org/10.1007/s11554-019-00922-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11554-019-00922-6