Abstract
Classification-driven dictionary has been successfully used in pattern recognition and computer vision in recent years. In this paper, a discriminative dictionary is constructed by concatenating all class specific sub-dictionaries and one sub-dictionary containing the common patterns. To further enhance the discriminative power, we also propose to use group sparse priors in the coding stage of the dictionary learning process. A kernel dictionary is learned to solve the same direction distribution problem existing in the traditional dictionary learning framework. Actually, the kernel dictionary is learned in a linearized manner by using virtual features. We evaluate our method on three public action datasets including facial expression, Hand Gesture and UCF Sports. Experimental results demonstrate that our method can achieve the better or at least competitive performance when compared with other action recognition methods.
Similar content being viewed by others
References
Fernandez-Caballero, A., Castillo, J.C., Rodriguez-Sanchez, J.M.: Human activity monitoring by local and global finite state machines. Expert Syst. Appl. 39(8), 6982–6993 (2012)
Bian, Z.P., Hou, J.H., Chau, L.P., Magnenat-Thalmann, N.: Fall detection based on body part tracking using a depth camera. IEEE J. Biomed. Health Inform. 19(2), 430–439 (2015)
Barnachon, M., Bouakaz, S., Boufama, B., Guillou, E.: Ongoing human action recognition with motion capture. Pattern Recognit. 47(1), 238–247 (2014)
Kong, Y., Jia, Y.D., Fu, Y.: Interactive phrases: Semantic descriptions for human interaction recognition. IEEE Trans. Pattern Anal. Mach. Intell. 36(9), 1775–1788 (2014)
Yan, Y., Yang, Y., Meng, D.Y., Liu, G.W., Tong, W., Hauptmann, A.G., Sebe, N.: Event oriented dictionary learning for complex event detection. IEEE Trans. Image Process. 24(6), 1867–1878 (2015)
Weinland, D., Ronfard, R., Boyer, E.: A survey of vision-based methods for action representation, segmentation and recognition. Comput. Vis. Image Underst. 115(2), 224–241 (2011)
Aggarwal, J.K., Ryoo, M.S.: Human activity analysis: a review. ACM Comput. Surv. 43(3), 1–43 (2011)
Dawn, D.D., Shaikh, S.H.: A comprehensive survey of human action recognition with spatio-temporal interest point (STIP) detector. Vis. Comput. 32(3), 289–306 (2016)
Niebles, J.C., Wang, H., Li, F.F.: Unsupervised learning of human action categories using spatial-temporal words. Int. J. Comput. Vis. 79(3), 299–318 (2008)
Junejo, I.N., Dexter, E., Laptev, I., Perez, P.: View-independent action recognition from temporal self-similarities. IEEE Trans. Pattern Anal. Mach. Intell. 33(1), 172–185 (2011)
Wright, J., Yang, A.Y., Ganesh, A., Sastry, S.S., Ma, Y.: Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 31(2), 210–227 (2009)
Zhu, Y., Zhao, X., Fu, Y., Liu, Y.: Sparse coding on local spatial-temporal volumes for human action recognition. In: Asian Conference on Computer Vision (ACCV) (2010)
Aharon, M., Elad, M., Bruckstein, A.: K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. Signal Process. 54(11), 4311–4322 (2006)
Ramirez, I., Sprechmann, P., Sapiro,G.: Classification and clustering via dictionary learning with structured incoherence and shared features. In: IEEE conference on computer vision and pattern recognition (CVPR) (2010)
Zhang, Q., Li, B.: Discriminative K-SVD for dictionary learning in face recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2010)
Jiang, Z., Lin, Z., Davis, L.S.: Label consistent K-SVD: learning a discriminative dictionary for recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(11), 2651–2664 (2013)
Schölkopf, B., Smola, A., Müller, K.R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 10(5), 1299–1319 (1998)
Mika, S., Ratsch, G., Weston, J., Schölkopf, B., Müller, K.R.: Fisher discriminant analysis with kernels. In: IEEE Signal Processing Society Workshop on Neural Networks for Signal Processing, pp. 41–48, (1999)
Gao, S., Tsang, I.W.-H., Chia, L.-T.: Kernel sparse representation for image classification and face recognition. In: European Conference on Computer Vision (ECCV) (2010)
Yin, J., Liu, Z., Jin, Z., Yang, W.: Kernel sparse representation based classification. Neurocomputing 77(1), 120–128 (2012)
Nguyen, H.V., Patel, V.M., Nasrabadi, N.M., Chellappa, R.: Kernel dictionary learning. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2012)
Zhang, L., Zhou, W.D., Chang, P.C., Liu, J., Yan, Z., Wang, T., Li, F.Z.: Kernel sparse representation-based classifier. IEEE Trans. Signal Process. 60(4), 1684–1695 (2012)
Liu, B.D., Shen, B., Gui, L., Wang, Y.X., Li, X., Yan, F., Wang, Y.J.: Face recognition using class specific dictionary learning for sparse representation and collaborative representation. Neurocomputing 204, 198–210 (2016)
Golts, A., Elad, M.: Linearized kernel dictionary learning. IEEE J. Sel. Top. Signal Process. 10(4), 726–739 (2016)
Suo, Y., Dao, M., Tran, T., Mousavi, H., Srinivas, U., Monga,V.: Group structured dirty dictionary learning for classification. In: 2014 IEEE International Conference on Image Processing (ICIP) (2014)
Agahian, S., Negin, F., Köse, C.: Improving bag-of-poses with semi-temporal pose descriptors for skeleton-based action recognition. Vis. Comput. (2018). https://doi.org/10.1007/s00371-018-1489-7
Li, Y., Ye, J.Y., Wang, T.Q., Huang, S.J.: Augmenting bag-of-words: a robust contextual representation of spatiotemporal interest points for action recognition. Vis. Comput. 31, 1383–1394 (2015)
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Neural Information Processing Systems (NIPS) (2014)
Feichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition(CVPR) (2016)
Guha, T., Ward, R.K.: Learning sparse representations for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 34(8), 1576–1588 (2012)
Mairal, J., Bach, F., Ponce, J., Sapiro, G., Zisserman, A.: Supervised dictionary learning. In: Neural Information Processing Systems (NIPS) (2008)
Yang,M., Zhang,L., Feng, X., Zhang, D.: Fisher discrimination dictionary learning for sparse representation. In: International Conference on Computer Vision (ICCV) (2011)
Chi,Y.T., Ali,M. Rajwade,A. Ho,J.: Block and group regularized sparse modeling for dictionary learning. n: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2013)
Nesterov, Y.: Gradient methods for minimizing composite functions. Math. Program. 140(1), 125–161 (2013)
Daubechies, I., Defrise, M., De Mol, C.: An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Commun. Pure Appl. Math. 57(11), 1413–1457 (2004)
Dollar, P., Rabaud, V. Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65–72 (2005)
Kim, T.K., Cipolla, R.: Canonical correlation analysis of video volume tensors for action categorization and detection. IEEE Trans. Pattern Anal. Mach. Intell. 31(8), 1415–1428 (2009)
Rodriguez,M.D., Ahmed,J., Shah,M.: Action MACH a spatio-temporal maximum average correlation height filter for action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2008)
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Funding
This study was funded by the Hebei Province Science and Technology Support Program (No. 15220324).
Conflict of Interest
The authors declare that they have no conflict of interest.
Rights and permissions
About this article
Cite this article
Fan, C., Hu, C. & Liu, B. Linearized kernel dictionary learning with group sparse priors for action recognition. Vis Comput 35, 1797–1807 (2019). https://doi.org/10.1007/s00371-018-1603-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-018-1603-x