Abstract
For multi-camera human action recognition methods, there is often a trade-off between classification accuracy and computational efficiency. Methods that generate 3D models or query all of the cameras in the network for each target are often computationally expensive. In this paper, we present an action recognition method that operates in a multi-camera environment, but dynamically selects a single camera at a time. We learn the relative utility of a particular viewpoint compared with switching to a different available camera in the network for future classification. We cast this learning problem as a Markov Decision Process, and incorporate reinforcement learning to estimate the value of the possible view-shifts. On two benchmark multi-camera action recognition datasets, our method outperforms approaches that incorporate all available cameras in both speed and classification accuracy.
Similar content being viewed by others
References
Ben-Hur, A., Horn, D., Siegelmann, H.T., Vapnik, V.: Support vector clustering. J. Mach. Learn. Res. 2, 125–137 (2002)
Bobick, A.F., Davis, J.W.: The recognition of human movement using temporal templates. IEEE Trans. Pattern Anal. Mach. Intell. 23(3), 257–267 (2001)
Chaaraoui, A.A., Climent-Pérez, P., Flórez-Revuelta, F.: Silhouette-based human action recognition using sequences of key poses. Pattern Recogn. Lett. 34(15), 1799–1807 (2013)
Cheema, S., Eweiwi, A., Thurau, C., Bauckhage, C.: Action recognition by learning discriminative key poses. In: IEEE International Conference on Computer Vision Workshops, pp. 1302–1309 (2011)
Cilla, R., Patricio, M.A., Berlanga, A., Molina, J.M.: Fusion of single view soft k-nn classifiers for multicamera human action recognition. In: Hybrid Artificial Intelligence Systems, pp. 436–443. Springer (2010)
Farhadi, A., Tabrizi, M., Endres, I., Forsyth, D.: A latent model of discriminative aspect. In: IEEE International Conference on Computer Vision, pp. 948–955 (2009)
Gkalelis, N., Kim, H., Hilton, A., Nikolaidis, N., Pitas, I.: The i3dpost multi-view and 3d human action/interaction database. In: Visual Media Production, 2009. CVMP’09. Conference for, pp. 159–168. IEEE (2009)
Holte, M.B., Chakraborty, B., Gonzalez, J., Moeslund, T.B.: A local 3-d motion descriptor for multi-view human action recognition from 4-d spatio-temporal interest points. IEEE J. Sel. Top. Signal Process. 6(5), 553–565 (2012)
Holte, M.B., Moeslund, T.B., Nikolaidis, N., Pitas, I.: 3d human action recognition for multi-view camera systems. In: 3D Imaging, Modeling, Processing, Visualization and Transmission (3DIMPVT), 2011 International Conference on, pp. 342–349. IEEE (2011)
Iosifidis, A., Tefas, A., Pitas, I.: Multi-view human action recognition under occlusion based on fuzzy distances and neural networks. In: Signal Processing Conference (EUSIPCO), 2012 Proceedings of the 20th European, pp. 1129–1133. IEEE (2012)
Iosifidis, A., Tefas, A., Pitas, I.: View-independent human action recognition based on multi-view action images and discriminant learning. In: IVMSP Workshop, 2013 IEEE 11th, pp. 1–4 (2013)
Jiang, Z., Zhang, G., Davis, L.S.: Submodular dictionary learning for sparse coding. In: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pp. 3418–3425. IEEE (2012)
Kläser, A., Marszałek, M., Schmid, C.: A spatio-temporal descriptor based on 3d-gradients. In: British Machine Vision Conference, pp. 995–1004 (2008)
Laptev, I.: On space-time interest points. Int. J. Comput. Vision 64(2–3), 107–123 (2005)
Liu, J., Shah, M., Kuipers, B., Savarese, S.: Cross-view action recognition via view knowledge transfer. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3209–3216 (2011)
Liu, L., Shao, L., Rockett, P.: Boosted key-frame selection and correlated pyramidal motion-feature representation for human action recognition. Pattern Recogn. 46(7), 1810–1818 (2013)
Määttä, T., Härmä, A., Aghajan, H.: On efficient use of multi-view data for activity recognition. In: Proceedings of the Fourth ACM/IEEE International Conference on Distributed Smart Cameras. ICDSC ’10, pp. 158–165. ACM, New York, NY, USA (2010)
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval, vol. 1. Cambridge university press, Cambridge (2008)
Parrigan, K., Souvenir, R.: Aggregating low-level features for human action recognition. In: Advances in Visual Computing, Lecture Notes in Computer Science, pp. 143–152 (2010)
Poppe, R.: A survey on vision-based human action recognition. Image Vision Comput. 28(6), 976–990 (2010)
Rudoy, D., Zelnik-Manor, L.: Viewpoint selection for human actions. Int. J. Comput. Vision 97(3), 243–254 (2012)
Schindler, K., Van Gool, L.: Action snippets: how many frames does human action recognition require? In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008)
Shen, C., Zhang, C., Fels, S.: A multi-camera surveillance system that estimates quality-of-view measurement. In: Image Processing, 2007. ICIP 2007. IEEE International Conference on, vol. 3, pp. III–193. IEEE (2007)
Souvenir, R., Babbs, J.: Learning the viewpoint manifold for action recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–7 (2008)
Spurlock, S., Souvenir, R.: Multi-view action recognition one camera at a time. In: IEEE Winter Conference on Applications of Computer Vision (WACV) (2014)
Srivastava, G., Iwaki, H., Park, J., Kak, A.C.: Distributed and lightweight multi-camera human activity classification. In: Distributed Smart Cameras, 2009. ICDSC 2009. Third ACM/IEEE International Conference on, pp. 1–8. IEEE (2009)
Tishby, N., Slonim, N.: Data clustering by markovian relaxation and the information bottleneck method. In: Advances in Neural Information Processing Systems, pp. 640–646 (2000)
Tran, D., Sorokin, A.: Human activity recognition with metric learning. In: Proceedings of the 10th European Conference on Computer Vision: Part I, pp. 548–561. Springer-Verlag (2008)
Tran, D., Sorokin, A.: Human activity recognition with metric learning. In: European Conference on Computer Vision, pp. 548–561 (2008)
Turaga, P., Veeraraghavan, A., Chellappa, R.: Statistical analysis on stiefel and grassmann manifolds with applications in computer vision. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008)
Wang, X.: Intelligent multi-camera video surveillance: a review. Pattern Recogn. Lett. 26, 1–25 (2015)
Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)
Weinland, D., Boyer, E., Ronfard, R.: Action recognition from arbitrary views using 3d exemplars. In: Proceedings of International Conference on Computer Vision, pp. 1–7 (2007)
Weinland, D., Özuysal, M., Fua, P.: Making action recognition robust to occlusions and viewpoint changes. In: Computer Vision–ECCV 2010, pp. 635–648. Springer (2010)
Weinland, D., Ronfard, R., Boyer, E.: Free viewpoint action recognition using motion history volumes. Comput. Vision Image Underst. 104(2), 249–257 (2006)
Weinland, D., Ronfard, R., Boyer, E.: A survey of vision-based methods for action representation, segmentation and recognition. Comput. Vision Image Underst. 115(2), 224–241 (2011)
Wu, C., Khalili, A.H., Aghajan, H.: Multiview activity recognition in smart homes with spatio-temporal features. In: Proceedings of the Fourth ACM/IEEE International Conference on Distributed Smart Cameras, pp. 142–149. ACM (2010)
Wu, X., Xu, D., Duan, L., Luo, J.: Action recognition using context and appearance distribution features. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 489–496 (2011)
Yan, P., Khan, S.M., Shah, M.: Learning 4d action feature models for arbitrary view action recognition. In: Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pp. 1–7. IEEE (2008)
Zhao, Z., Elgammal, A.M.: Information theoretic key frame selection for action recognition. In: Proceedings of the British Machine Vision Conference, pp. 1–10 (2008)
Zheng, J., Jiang, Z.: Learning view-invariant sparse representations for cross-view action recognition. In: Proceedings of International Conference on Computer Vision, pp. 3176–3183. IEEE (2013)
Zheng, J., Jiang, Z., Phillips, P.J., Chellappa, R.: Cross-view action recognition via a transferable dictionary pair. In: Proceedings of the British Machine Vision Conference, p. 7 (2012)
Zhu, F., Shao, L., Lin, M.: Multi-view action recognition using local similarity random forests and sensor fusion. Pattern Recogn. Lett. 33, 438–445 (2012)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Spurlock, S., Souvenir, R. Dynamic view selection for multi-camera action recognition. Machine Vision and Applications 27, 53–63 (2016). https://doi.org/10.1007/s00138-015-0715-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00138-015-0715-9